Subsystem and Transaction Monitoring and Tuning with DB2 11 for z/OS

Readers' comments

Readers' comments (22) 

lockedThis discussion is now locked


Posted by Hari Shanmugadhasan on 29 November 2013 at 15:51

The Nov 24 draft of SG24-8182 states:

"DB2 11 has added a new capability of asynchronously removing pseudo-empty indexes pages and pseudo-deleted index entries by DB2 system tasks independently from SQL DELETE transaction, this function removes needs for REORG."

It seems to contradict page 86 of this the Nov 4 draft of the SG24-8180 Redbook which states:

"Attention: This function is not a substitute for the REORG INDEX utility."

It would be helpful if the two Redbooks were consistent.

Posted by Hari Shanmugadhasan on 29 November 2013 at 15:53

Page 95 of the Nov 24 draft of SG24-8182 states:

"When the SQL query reads a pseudo deleted index entry . . . which could cause a deadlock situation."

Yet page 85 of the Nov 4 SG24-8180 draft Redbook states a different scenario:

" . . . result in time-outs and deadlocks for applications that insert data into tables with unique indexes."

It would be helpful if both Redbooks were consistent, correct and complete in describing the problem scenarios caused by pseudo-deleted index entries.

Posted by Hari Shanmugadhasan on 29 November 2013 at 16:19

It would be helpful if the Redbook described INDEXCLEANUP_THREADS and how the threads are used. If the value is 128 will this mean that 128 partitions of a PI or DPSI can be processed in parallel?

It would also help if the prefetch and BP activity of these threads were described. Does each thread do 64 or 128 page sequential prefetches using FIFO or MRU (assuming 4K pages)?

Posted by Hari Shanmugadhasan on 29 November 2013 at 16:24

Page 111 states:

"Be aware of INDEXCLEANUP_THREADS affect on zIIP
capacity"

It would be helpful if the Redbook described how one can monitor the affect not just on zIIP capacity but on everything else. How do you identify and monitor this activity and individual threads? How do you connect them back to the index they are processing? Are there any WLM controls on the dispatching priority relative to other workload?

Posted by Hari Shanmugadhasan on 30 November 2013 at 16:29

Page 235 states:

"If forced log writes, triggered by taking, a completely empty, index page off the IX chain for GBP-dependent objects, is really hurting your business when during massive delete
operations"

It would be helpful if the Redbook showed how you determine if this problem is occurring and whether it is excessive.

Posted by Hari Shanmugadhasan on 30 November 2013 at 16:34

Page 235 states:

"If forced log writes, triggered by taking, a completely empty, index page off the IX chain for GBP-dependent objects, is really hurting your business when during massive delete operations . . . DB2 11 implements the pseudo deleted cleanup asyncronous [sic] engines"

This seems to imply that DB2 11 cleanup helps, but the description of the problem with "massive delete" seems to imply the problem occurs concurrently with the delete so won't be helped. Please clarify.

Posted by Hari Shanmugadhasan on 30 November 2013 at 16:37

It would be helpful of the Redbook discussed how to monitor the pseudo delete cleanup threads and the impact of pseudo deleted RIDs and pseudo empty index pages.

Posted by Hari Shanmugadhasan on 30 November 2013 at 16:43

It would be helpful if the limitations and dangers of using IFCID 377 were discussed along with what reporting is available, especially summary and exception reporting for effective management without high CPU overhead.

http://publib.boulder.ibm.com/infocenter/tivihelp/v15r1/index.jsp?topic=%2Fcom.ibm.omegamon.xe.pe_db2.doc_5.2.0%2Fko2rr%2Fr0377out.htm

Posted by Hari Shanmugadhasan on 30 November 2013 at 16:53

Page 95 states:

". . . SQL has to wait for the lock, which could cause a deadlock situation."

It would be helpful to show how you can easily determine whether the deadlock was caused by a pseudo deleted RID. Also timeouts and just excessive lock wait time should be mentioned as potential problems along with the methods to identify whether PD RIDs are the cause of the problem.

Posted by Hari Shanmugadhasan on 2 December 2013 at 14:42

Page 95 of Nov 30 draft states:

"It will first try to use commit LRSN checking, but if that
fails, it needs to get an S lock on the pseudo deleted RID."

This seems incorrect since there is supposed to also be PUNC bit checking for the RID and there aren't supposed to be any locks on the index RIDs.

It seems unfortunate that when lock avoidance fails but it is determined that the change is committed that neither is the pseudo-deleted RID cleaned up nor its PUNC bits reset.

Posted by Hari Shanmugadhasan on 2 December 2013 at 14:56

My comment #1 has been addressed by changes in Nov 30 draft of SG24-8182 and the Dec 1 draft of SG24-8180.

#2 still stands since page 95 still doesn't mention timeouts and long lock wait delays as possible situations.

My pre-Dec 2 comments #3-9 still seem valid.

Posted by Hari Shanmugadhasan on 4 December 2013 at 14:45

It would be helpful if the DDF section showed how to monitor the DB2 11 Autocommit performance enhancement.

Also it would be helpful to explain the page 162 Figure Example 11-10 numbers. Why are more blocks sent than rows sent? This would seem impossible. Yet the Redbook states:

"Example 11-10 shows an example of tracing a workload where each SQL is returning small numbers of rows."

ROW SENT 135.1K BLOCKS SENT 270.2K so 2 blocks per row! Why so many empty blocks?

Posted by Hari Shanmugadhasan on 4 December 2013 at 14:52

It would be helpful to explain why page 162 Example 11-10 shows:

405.4K SQL STATEMENTS received

yet only

ROW SENT 135.1K

Which implies at least 270.3K SQL statements with no rows returned which seems to contradict:

"Example 11-10 shows an example of tracing a workload where each SQL is returning small numbers of rows."

Posted by Hari Shanmugadhasan on 4 December 2013 at 15:05

It would be helpful to explain why page 162 states:

"Because block fetch groups the rows, compare the ROWS field with MESSAGES field to determine the efficiency."

That text seems to imply it takes 473.4K message pairs to send only 135.1K rows, which seems incredibly inefficient.

Even if it were true, it would be good to explain what one can do about it.

It would seem to be important to know what kind of SQL is being received and how many rows are returned by each SQL if any, to judge.

Posted by Hari Shanmugadhasan on 4 December 2013 at 15:24

It would be helpful if the pages 156-157 Example 11-3 showed a much heavier workload, something like 50K concurrent connections with 1K transactions per second and then explained how to tune DB2 10 high performance DBATs with appropriate before and after reports.

Discussing the monitoring and tuning of pooling and concentrating at the various tiers in a 3-tier application to optimize high performance DBAT performance would also be helpful.

Posted by Hari Shanmugadhasan on 5 December 2013 at 12:24

It would be helpful to explain why page 162 Example 11-10 shows:

"MESSAGES IN BUFFER N/A"

It would help to say that the SQL was deliberately written to prevent block fetch (if this was the case), so "compare the ROWS field with MESSAGES field to determine the efficiency" in the text is misleading for this example.

Also, in comments #12 and 13, I should have written "ROWS" not "ROW".

Posted by Hari Shanmugadhasan on 5 December 2013 at 12:32

It would be helpful to replace the sentence just before Example 11-10 with:

"Example 11-10 shows an example of tracing a workload where only some SQLs opens cursors. In addition, all of those cursor SQLs completely fail to block fetch"

And then explain how block fetch was suppressed and why there were still two blocks per row returned.

Or replace Example 11-10 with a more helpful report and with supporting explanation.

Posted by Hari Shanmugadhasan on 9 December 2013 at 12:05

In the Dec 9 draft, no progress was made on the above comments. Comments #2-10 and #12-17 remain unresolved.

In particular, p162 Example 11-10 with its very peculiar numbers and page 156-157 Example 11-3 with its very light workload seem unexplained, uncorrected, and unchanged.

Ex. 11-10 seems to illustrate a problem with incorrect trace data, incorrect reporting or both. It calls into question all the numbers in the report. It would be helpful to show how to deal with errors like these.

Posted by Hari Shanmugadhasan on 9 December 2013 at 12:16

Page 161 Example 11-9 states:

"CUR QU INACT CONNS (TYPE 2) 0.03"
"CUR ACTIVE AND DISCON DBATS 15.82"
"CUR DISCON DBATS NOT IN USE 3.67"

These are very peculiar numbers since they should all be non-negative integers since they are purely counts.

They make the whole report seem suspect and suggest that there is a problem with the trace data, the reporting or both.

It would be helpful to explain how to deal with such problems. What APARs open/closed are needed?

Posted by Hari Shanmugadhasan on 10 December 2013 at 15:14

Page 161 states:

"To maintain your performance for DRDA transactions . . .You can find such information relates to DRDA messaging
and number of requests from this particular trace block."

Yet the referred to Example 11-10 on page 162 starts with:

"TRANSACTIONS N/A N/A"

It seems very odd that TRANSACTIONS should be Not Applicable for both SENT and RECEIVED. How can one know about the performance of "DRDA transactions"?

The Redbook uses "transaction" repeatedly.

Posted by Hari Shanmugadhasan on 7 May 2014 at 18:31

The March 28 update of the final version doesn't address any of the outstanding issues as of the Dec 9 draft.

Some pages shifts have occurred, so that p235 is now 239, p162 is 164, the first p161 reference is now on 163 while the second is 164. The updated form of comment #2 will be discussed next.

Posted by Hari Shanmugadhasan on 7 May 2014 at 18:44

With the March 28 update, Comment #2 changes to:

Page 95 says:

"When the SQL query scans a pseudo deleted index entry . . . which could cause a deadlock situation."

Yet p389 of the April 23 SG24-8180 has an incomplete and slightly different scenario:

" . . . result in time-outs and deadlocks for applications that insert data into tables with unique indexes."

See related SG24-8182 p95 comments.

It would help if both were consistent, correct and complete in describing the problem scenario.


Profile

Publish Date
28 February 2014

Last Update
31 August 2022


Rating:
(based on 3 reviews)


Author(s)

ISBN-10
0738439126

IBM Form Number
SG24-8182-00