XES Coupling Facility Subchannel Tuning


This tip discusses a change in CF subchannel management that was introduced by APAR OW54796 whereby CF subchannels are automatically varied on and offline in response to CF Link contention.


z/OS uses special types of channels called CF Links to communicate with attached Coupling Facilities. Each CF Link that is used to connect an OS/390 or z/OS LPAR to a CF LPAR has a number of link buffers associated with it. For peer mode links (only available on zSeries CPCs), each link has seven link buffers. For compatibility mode links, each link has two link buffers associated with it. From a z/OS perspective, each link has either 2 (compatibility mode) or 7 (peer mode) subchannels associated with it. So, for example, if you had 2 z/OS LPARs sharing 1 peer mode CF Link, you would have 14 subchannels (7 in each LPAR) and 7 link buffers. When z/OS wants to start a request that is in a subchannel, it must find a free link buffer.
    When a link is shared between LPARs, it is possible that the link buffers will be driven to high utilizations, resulting in the LPARs getting a path busy condition when they try to start the request (a path busy indicates that there were no free link buffers when the request was started). When this happens, the LPAR that got path busy will spin, waiting for a link buffer to be freed up. If the link utilization is very high, it is possible that the time spent spinning will become significant. This applies to both synchronous and asynchronous requests.

    When a link is dedicated to an LPAR, there is a one-to-one mapping of subchannels to link buffers, so you would never receive a path busy condition - if a subchannel is free, then the associated link buffer will also be free.

    APAR OW54796 introduced a change to XES in z/OS 1.2 and later, whereby XES will temporarily vary a subchannel offline if the number of path busy conditions exceeds a threshold. The effect of this should be that when a request is started, rather than being given a subchannel and then spinning waiting for a link buffer to become available, the request may receive subchannel busy (because fewer subchannels are available) and the request will be converted to an asynchronous request. When this happens, z/OS will queue the request, waiting for an available subchannel, and go off and process other work rather than spinning. This may result in longer response times for those requests that can't get a subchannel; however, it should result in less CPU overhead due to waiting for busy links. In effect, z/OS is trading CPU-eating spin processing for more efficient software queue processing.
      When the contention on the CF Link subsequently decreases, z/OS will bring the subchannel back online again.

      All of this processing is automatic. Once the PTF for the APAR is applied, you do not need to do anything to enable it, nor can you stop it from behaving in this manner.

      You may notice this happening if you look at an RMF report, and the number of subchannels reported as GEN is more than the number reported as being IN USE. (This would also happen if you had taken links offline yourself; however, it would be unusual for someone to do this.) Similarly, if you issue a D CF command on the console, you may see that some subchannels are reported as OPERATIONAL/NOT IN USE, and the number of subchannels in this state will fluctuate over time.

      Taking the following RMF report as an example, you will notice that systems MVSA and MVSW (which are in the same CPC and sharing 3 CF Links) had a total of about 16794 K requests and about 6021 K path busy conditions over the interval. The rule of thumb is that path busy should not exceed 10% of the number of requests, and in this case path busy represents about 36% of requests. You will see that the number of subchannels genned for system MVSA is 6; however, the number currently in use is only 4. And for system MVSW, once again 6 subchannels are genned; however, only 2 are in use.

      Because MVSW is receiving a higher percentage of path busy conditions (about 400%!), XES on MVSW has varied 4 of its subchannels offline in an effort to reduce the amount of time it is receiving path busy when it tries to use one of the links. MVSA has also varied subchannels offline in an effort to reduce path busy. However, because the percentage of path busy conditions is lower for MVSA, it has only varied 2 subchannels offline.

      In the report below, the links shared between systems MVSA and MVSW are severely overutilized. However, until additional links can be added, you can see that the new subchannel tuning algorithm will have eliminated a significant amount of time that the CP would have been spinning while waiting for a link to become free.

      Similarly, if you look at systems MVSC and MVSD, which share CF Links between them, and systems MVSB and MVSK, which share CF Links, you will see that the system with the higher percentage of path busy conditions has a smaller number of subchannels currently in use. XES will never go lower than two online subchannels in an LPAR.

      RMF Report showing number of subchannels as GEN is more than the number reported as IN USE

      If you find this is happening on your system, review the RMF report for the number of PATH BUSY events. The number of such events should be less than 10% of the total number of requests. If CF Link utilization (as indicated by the percent of PATH BUSY events) is a problem, you have three choices:
      • Install additional shared CF Links between the system and the CF that are currently encountering this situation.
      • Rather than using shared links, move to dedicated links for each LPAR. Note that you should not have fewer than two CF Links per LPAR/CF pair for availability reasons.
      • Redistribute the CF workload. If the high PATH BUSY events are primarily on the links to one CF, consider moving some of the busier structures from that CF to one with lower utilization.
      • If the processors support peer links, but you are currently running in compatibility mode, consider converting over to peer mode. Peer mode links run at least twice as fast as compatibility mode links, which will help by reducing the amount of time the link buffers are busy.
      If you install the PTF for APAR OW54796, ensure that you also APPLY the PTF for the supporting RMF APAR, OW55586.

      Special Notices

      This material has not been submitted to any formal IBM test and is published AS IS. It has not been the subject of rigorous review. IBM assumes no responsibility for its accuracy or completeness. The use of this information or the implementation of any of these techniques is a client responsibility and depends upon the client's ability to evaluate and integrate them into the client's operational environment.


      Publish Date
      20 June 2003

      (based on 1 review)


      IBM Form Number