IBM DB2 12 for z/OS Technical Overview

Readers' comments

Readers' comments (47) 

lockedThis discussion is now locked


Posted by Hari Shanmugadhasan on 19 December 2016 at 10:14

It would be helpful to mention that the "3.3 Buffer pool simulation" was added to DB2 11 in 2014 via
http://www-01.ibm.com/support/docview.wss?uid=swg1PI22091 and APARS for related products, so you can start using it now, using the more detailed information in the DB2 11 manuals. http://www.ibm.com/support/knowledgecenter/SSEPEK_11.0.0/perf/src/tpc/db2z_bpsimulationperform.html
It would be helpful to state whether there has been any change or improvement in this area with DB2 12.

Posted by Hari Shanmugadhasan on 19 December 2016 at 10:29

Was the test scenario in "3.3 Buffer pool simulation" pages 37-39, just a case of making BP1 big enough to pin the test case data that uses BP1? Fig 3-7 makes it appear that no BP1 synch I/O activity occurred which makes it appear as if the all of the BP1 test objects were preloaded into BP1 before the validation test measurement was started.

Posted by Hari Shanmugadhasan on 19 December 2016 at 10:39

It would be helpful if "3.3 Buffer pool simulation" pages 37-39 recorded the benefits of reducing GBP activity in this test case.

Posted by Hari Shanmugadhasan on 19 December 2016 at 10:51

It would be helpful to explain why the test scenario in "3.3 Buffer pool simulation" pages 37-39 was only measured for 6 minutes (360 seconds) instead of following the p38 Note "enough samples (2 - 3 hours) per SPSIZE" or even the p37 Fig 3.5 "(> 1 hour)." Was measurement startede when BP1 started empty of test case data?

Posted by Hari Shanmugadhasan on 19 December 2016 at 11:12

It would be helpful to compare the results of running the test scenario in "3.3 Buffer pool simulation" pages 37-39 with using the BP AUTOSIZE option.

Posted by Hari Shanmugadhasan on 19 December 2016 at 12:04

I couldn't find any mention of the avoidance of scheduling of unnecessary dynamic prefetch. Has that enhancement been dropped?
http://www.ibm.com/support/knowledgecenter/en/SSEPEK_12.0.0/wnew/src/tpc/db2z_12_avoidprefetch.html

Posted by Hari Shanmugadhasan on 19 December 2016 at 12:20

It would be helpful to explain the significance of
"SYNC READ I/O (S) =81181," in Fig 3-6, given the tiny size of BP1 (8MB) with a relatively high "ASYNC READ I/O =15470503," especially for a presumably OLTP workload. Is it driven by the prereqs for starting dynamic prefetch?

Posted by Hari Shanmugadhasan on 19 December 2016 at 12:42

It would be helpful to point out that the DB2 12 explanation
"SYNC READ I/O (S) . . . Sequential synchronous read I/Os occur when prefetch is disabled" just gives one possible reason, one which seems unlikely to apply to this Fig 3-6 test case. http://www.ibm.com/support/knowledgecenter/SSEPEK_12.0.0/perf/src/tpc/db2z_monitortunebufferpoolonlinecommands.html

Posted by Hari Shanmugadhasan on 20 December 2016 at 12:44

It would be helpful if "3.3 Buffer pool simulation" provided a link providing a full description of the "IBM Fictional Brokerage online transaction workload" and test environment, along with full test reports for the base test, the simulation run, and the run after the BP changes were made, for both DB2 11 and DB2 12. "IBM Fictional Brokerage" appears once in this Redbook, doesn't provide a hit at ibm.com, and just provides this Redbook as a hit in Google.

Posted by Hari Shanmugadhasan on 20 December 2016 at 12:54

It would be helpful if Fig 3-9 in "3.3 Buffer pool simulation" explained the difference for the test workload between "per commit" and "per transaction," or better yet, picked one of them to report Synch I/O and CPU. Stating if "DB2 CPU" only means non-zIIP Class 2 DB2 CPU as reported in the Accounting Report would help provide some business value context.

Posted by Hari Shanmugadhasan on 20 December 2016 at 13:08

It would be helpful if "3.3 Buffer pool simulation" indicated the business value of using almost 4GB of memory for this workload:, documenting the % increase in ETR, the % decrease in user observed response time, the % decrease in SW charge driving CPU etc. Assuming 3000 tps before and after, the simulation indicates a potential saving per transaction of 32.7 milliseconds, which seems relatively small and unnoticeable to the average end user. Why not add half the memory or increase GBP1?

Posted by Hari Shanmugadhasan on 20 December 2016 at 13:23

It would be helpful if "3.3 Buffer pool simulation" explained the meaning and significance of "PAGES MOVED INTO SIMULATED BUFFER POOL =53668641" since about 53.7 million pages being moved into less than half a million pages would mean constant flushing which seems unlikely given the I/O activity being eliminated, which could mean this count is at most 496,000 (the 6 minute test working set).

Posted by Hari Shanmugadhasan on 20 December 2016 at 13:42

It would be helpful if the discussions of REGISTER NO for UNLOAD and RUNSTATS mentioned that these utilities can then miss already committed data, reading old unregistered pages instead. It also appears that subsequent UR applications can also miss this same committed data.

Posted by Hari Shanmugadhasan on 20 December 2016 at 16:17

It would be helpful if "3.3 Buffer pool simulation" explained how it can be used to more easily increase business value compared to using BP AUTOSIZE by listing the major scenarios where AUTOSIZE isn't appropriate.

Posted by Hari Shanmugadhasan on 21 December 2016 at 12:03

It would be helpful if "3.3 Buffer pool simulation" explained that in its test case "PAGES MOVED INTO SIMULATED BUFFER POOL =53668641" probably provides an estimate of the upper bound of the additional GETPAGES that could have been satisfied without I/O with the larger pool and thus a way to derive an upper bound estimate of the average reuse per simulated page per second: 0.30 or per transaction: 1.0*10**-4. Testing with SPSIZE(50000) and SPSIZE(100000) might show more cost effective sizes.

Posted by Hari Shanmugadhasan on 21 December 2016 at 12:20

The discussion of UNLOAD REGISTER NO might want to warn data sharing sites migrating to DB2 12 with UNLOAD SHRLEVEL CHANGE ISOLATION UR workloads that REGISTER NO is the default for non-LOB, non-XML objects so they may miss data committed by other members, which may be a problem. In contrast, for RUNSTATS SHRLEVEL CHANGE, the default is REGISTER YES.

Posted by Hari Shanmugadhasan on 21 December 2016 at 12:26

The discussion of RUNSTATS REGISTER NO might want to warn sites that NO will be ignored for DB2 catalog objects with application of this closed APAR
http://www-01.ibm.com/support/docview.wss?uid=swg1PI72344

Posted by Hari Shanmugadhasan on 21 December 2016 at 13:09

It would be helpful if "3.3 Buffer pool simulation" discussed how the simulation and counters are affected by MRU workloads using the PGSTEAL(LRU) BP being simulated.

Posted by Hari Shanmugadhasan on 22 December 2016 at 11:28

It would be helpful to correct Fig 3-7 to show the correct Synch I/O/sec avoidance value for the simulation. The p38 calculation shows 70958. The 3-D bar chart shows the rate as under 70,000. Even better, replace Fig 3-7 with the delta %s since 3D charts are hard to interpret. "Simulation" delta<1%? "Expanded" delta -99.9%? Also, you might consider rounding to 70,951 since the calculated rate is 70958.78611.

Posted by Hari Shanmugadhasan on 22 December 2016 at 11:39

It would be helpful to add the specific numbers for CPU, Synch I/O, commits, and transactions to Fig 3.8 and the delta % for CPU and I/O. The bar chart is easier to interpret than a 3D chart, but still extra guessing.

Posted by Hari Shanmugadhasan on 22 December 2016 at 11:54

It would be helpful to explain Fig 3-8 contradicting the p38 calculation since assuming 3000 tps (p37) there should have been a BP1 Synch I/O rate reduction of about 23.7/sec with close to 0/sec BP1 Synch I/O, yet Fig 3-8 shows 45+/sec drop from almost 60/sec to just over 10/sec (which might be from other BPs). If more than one commit/trxn then the discrepancy in drop is even greater. Can you have 2 txns per commit?

Posted by Hari Shanmugadhasan on 22 December 2016 at 12:16

It would be helpful if the CPU units used in Fig 3-8 were explained, along with explaining the source of the benefit since a Synch I/O rate reduction of 23.7/sec suggests about 4.7*10**(-7) CPU secs per transaction on an old zEC12 processor.

Posted by Hari Shanmugadhasan on 23 December 2016 at 8:04

I should have ended my previous post with "a Synch I/O rate reduction of roughly 23.7/txn indicates a roughly 1.42*10**(-4) CPU sec saving per transaction on an old zEC12 processor."

Posted by Hari Shanmugadhasan on 23 December 2016 at 8:09

It would be helpful if "3.3 Buffer pool simulation" provided the DSNB431I message for the test case.

Posted by Hari Shanmugadhasan on 23 December 2016 at 8:26

It would be helpful if "3.3 Buffer pool simulation" suggested how the calculated average "AVOIDABLE SYNC I/O DELAY" might be used.

Posted by Hari Shanmugadhasan on 23 December 2016 at 12:52

It would be helpful if "3.3 Buffer pool simulation" indicated the page update activity leading up to the six minute test, as well as the page update activity during the six minutes.

Posted by Hari Shanmugadhasan on 23 December 2016 at 12:57

You might consider explaining this DB2 12 position: "The most important statistic from the simulation is that a large number of synchronous read I/O operations for randomly accessed pages could have been avoided. Given this information, your next step would be to increase the SPSIZE value to see whether you can significantly decrease the numbers for the avoidable operations." http://www.ibm.com/support/knowledgecenter/SSEPEK_11.0.0/perf/src/tpc/db2z_bpsimulationresults.html

Posted by Hari Shanmugadhasan on 23 December 2016 at 13:07

It would be helpful if "3.3 Buffer pool simulation" compared its simulation with the use of the IBM DB2 Buffer Pool Analyzer for z/OS. What would that tool have reported for the BP1 test case?

Posted by Hari Shanmugadhasan on 23 December 2016 at 13:22

A minor point, but since this is an American English publication, it would be helpful if decimal points were used instead of commas (Fig 3-8), and comma separators between thousands (Fig 3-7 and the p38 calculation). Some might find 25,463,982 easier to understand than the undifferentiated string of 25463982. Since DB2 has NLS support the DSNB432I counts could all use comma separators.

Posted by Hari Shanmugadhasan on 23 December 2016 at 13:42

Relating to "Figure 3-6 Display output from –DIS BPOOL (BP1) DETAIL", has anyone submitted a request to have DSNB432I supplement all the counts with per second averages, with the interval start, stop, and duration reported in the header? Per second averages are much more useful when making comparisons. Plus all the counts and averages displayed using decimal points and comma thousand separators when US English is selected as the NLS option? Plus max and avg Sync I/O delay avoided DSNB455I-like?

Posted by Hari Shanmugadhasan on 24 December 2016 at 9:34

It would be helpful if the "3.3 Buffer pool simulation" test case source code, Explains, and Access Path graphs, were added to the "Additional Materials." This is especially true if they are coded in SQL PL since examples of proven high performance, well coded DB2 for z/OS SQL PL source code seem hard to find. It might also provide hints as to why so much prefetch occurs in this OLTP test, in BP1 alone, without considering the other BPs.

Posted by Hari Shanmugadhasan on 24 December 2016 at 9:59

If the p37 3000 TPS is the total for both members and the Fig 3-6 counts apply to the second member, as well as the first, then Fig 3-7 and Fig 3-8 are for a single member processing 1,500 TPS, not the 3,000 TPS that I used in my "22 December 2016 at 11:54" and other posts. This would mean a BP1 Synch I/O rate reduction of about 47.3/txn which is about what Fig 3-7 shows as the drop. This implies that #commits=#trxns and that other of my calculations should use 47.3/txn instead of 23.7/txn.

Posted by Hari Shanmugadhasan on 24 December 2016 at 10:35

It would be helpful if "3.3 Buffer pool simulation" explained the relatively poor DASD cache performance given that Fig 3-6 indicates an average BP1 Synch I/O delay avoidance of 1.38 milliseconds for a test case that would seem to have very high page reuse from DASD cached given the tiny original 8 MB BP1 BP and the effectiveness of a 2GB BP in a test environment.

Posted by Dr. Fabio Massimo Ottaviani on 8 January 2017 at 4:35

I think IFCID 002: Package level pipe wait information in page 262 is wrong. It should be IFCID 239

Posted by Dr. Fabio Massimo Ottaviani on 8 January 2017 at 4:39

In Appendix A it could be useful to specify the IFCID QWAC_WORKFILE_MAX and QWAC_WORKFILE_CURR fields are added to. I think it's IFCID 003

Posted by Hari Shanmugadhasan on 6 February 2017 at 12:24

It would be helpful if the page 47 BACKOUT YES section noted that the described function was delivered in V10, as well as V11, in July 2015, via http://www-01.ibm.com/support/docview.wss?uid=swg1PI08421 and
http://www-01.ibm.com/support/docview.wss?uid=swg1PI43529

Posted by Hari Shanmugadhasan on 6 February 2017 at 12:25

It would be helpful if the page 47 BACKOUT YES section explicitly stated what happened to indexes with BACKOUT YES since the current description says nothing about indexes being available and matching the now backed out table space.

Posted by Hari Shanmugadhasan on 6 February 2017 at 12:25

It would be helpful if the page 47 BACKOUT YES section provided planning and usage guidance based on V10, V11, as well as V12 ESP customer experiences.

Posted by Hari Shanmugadhasan on 6 February 2017 at 12:27

It would be helpful if the page 47 BACKOUT YES section provided guidance on estimating the elapsed time of BACKOUT. Does the BACKOUT take two, three, four or more times as long as the elapsed time to get to the first error? Depending on what factors?

Posted by Hari Shanmugadhasan on 6 February 2017 at 12:30

It would be helpful if the page 47 BACKOUT YES section provided some perf data. For example, a million record input file adding evenly across the 10 partitions of a PBR with 200 million rows, when it has a PK, another unique key, 2 FKs, 10 Check constraints, and 100 columns, when the LOAD RESUME gets its first error on the last RI check in the ENFORCE phase. And specifically comparing the CPU and elapsed time up to the first error to the time that the table space and all indexes are available.

Posted by Hari Shanmugadhasan on 6 February 2017 at 12:33

It would be helpful if the page 47 BACKOUT YES section explained what use cases and RFEs are satisfied by fail and BACKOUT on the first error of any kind.

Posted by Hari Shanmugadhasan on 6 February 2017 at 12:37

It would be helpful if the page 47 BACKOUT YES section noted that superior function, to that described, that was delivered in V10 and V11 in July 2016 via http://www-01.ibm.com/support/docview.wss?crawler=1&uid=swg1PI54658 which does the BACKOUT only when DISCARDS n is hit, instead of on the first error of any kind.

Posted by Hari Shanmugadhasan on 6 February 2017 at 12:38

It would be helpful if the page 47 BACKOUT YES section replaced its current description with the description of the more flexible DISCARDS n support provided for DB2 12 in Sept 2016 via http://www-01.ibm.com/support/docview.wss?uid=swg1PI67558
and noted if there were any enhancements or differences compared to the V10 or V11 implementation, including performance.

Posted by Hari Shanmugadhasan on 6 February 2017 at 12:42

It would be helpful if the page 47 BACKOUT YES section then described the use cases and RFEs that the added BACKOUT DISCARDS n function addresses, and why it doesn't satisfy RFE 28434 "No RECP state after abnormal termination with LOAD RESUME YES and no rows loaded" https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=28434
Or does "abnormal termination" here mean something other than hitting DISCARDS n? Or is it for the unsupported PART level LOAD RESUME?

Posted by Hari Shanmugadhasan on 6 February 2017 at 12:51

With the Continuous Delivery strategy, it would be helpful if descriptions of such DB2 12 enhancements were added to this Redbook on an ongoing basis and that those new sections addressed the underlying issues that motivated my suggests to improve the current BACKOUT YES and BP Simulation Redbook sections since they were originally added via V10 and V11 or just V11 APAR.

Posted by Hari Shanmugadhasan on 6 February 2017 at 14:32

The example 200 million row PBR in my "6 February 2017 at 12:30" post should have two unique indexes for the two unique keys, 2 NPIs to support the two FKs, and an additional 3 NPIs (so 7 indexes in total) since indexes are typically an important part of LOAD processing. If other types of indexes (DPSI, partitioning, expression based etc.) result in sharply different elapsed time ratios between going forward and backing out then that should be reported.

Posted by Hari Shanmugadhasan on 6 February 2017 at 16:51

It would be helpful to add a section on what the Oct 4, 2016 DB2 12 GA webcast calls "Online LOAD REPLACE – non-disruptive refresh of reference tables" or LOAD REPLACE SHRLEVEL REFERENCE. It was said that it would be delivered "this autumn" and it was via V11 LOAD REPLACE SHRLEVEL REFERENCE via
https://www-01.ibm.com/support/docview.wss?uid=swg1PI67793
The section should indicate any differences between the DB2 12 support and the earlier V11 APAR support.


Profile

Publish Date
13 December 2016


Rating: Not yet rated


Author(s)

ISBN-10
0738442305

IBM Form Number
SG24-8383-00