Posted by Hari Shanmugadhasan on 29 December 2012 at 14:02
The eligibility of IN-list predicates for PTC was added 2011 to V9 via APAR, in 2011.
PM35321: ENABLE PREDICATE TRANSITIVE CLOSURE FOR IN LISTS
which lists the severe limitations on IN-list predicates eligible for both V9 and V10. In particular, host variables are not allowed.
Please see the APAR text for more details on restrictions and a few examples.
The V9 zparm INLIST_PTC doesn't apply to the V10 case.
DB2 10 In-list PTC performance is discussed in this Redbook.
Posted by Hari Shanmugadhasan on 29 December 2012 at 14:22
Continuing with DB2 10 IN-list PTC support:
PM70395: ABEND04E RC00E70005 DSNXGINB M621 ON REBIND PACKAGE
relates to cases with IN-list PTC or predicate push-down.
It states: ". . . for a query that contains an INLIST predicate that includes hostvars. The column is nullable.
The failure occurs due to predicate transitive closure or predicate push down, when DB2 internally generates a new INLIST predicate . . ."
Hostvar IN-list OK for PTC?
Posted by Hari Shanmugadhasan on 29 December 2012 at 14:36
In contrast, the DB2 Technical Preview on page 542 shows an IN-list PTC example with IN-list host variables.
FROM T1, T2
WHERE T1.C1 = T2.C1
AND T1.C1 IN (?, ?, ?):
AND T2.C1 in (?, ?, ?):
Now, an index on T2.C1 provides DB2 with more access plan options.
Predicate transitive closure already applies to the following values:"
Posted by Hari Shanmugadhasan on 29 December 2012 at 14:42
Just above, I meant the DB2 10 Technical Preview Redbook.
In contrast, the Oct 2012 "How DB2 modifies IN predicates" section of the DB2 10 Managing Performance manual has a list of PTC restrictions, contained in change bars and PSPI indicators, that includes:
"The values in the list on the right side of the IN keyword must be simple constants. For example, the list must not include expressions involving constants, parameter markers, host variables, or subselects."
Posted by Hari Shanmugadhasan on 5 January 2013 at 7:41
The page 278 RUNSTATS TABLESAMPLE test results demonstrate the very poor list prefetch results that are obtained with less than current DASD configurations.
List prefetch performance is especially important with DB2 10 since it makes increased use of list prefetch in several areas.
Posted by Hari Shanmugadhasan on 5 January 2013 at 7:49
It appears that poor list prefetch performance is responsible for the high RUNSTATS TABLESAMPLE results.
With DB2 10 5% page sampling, RUNSTATS is only barely faster than the No Sampling scenario that processes 20 times the pages and 20 times rows of the 30 GB test table, see page 278, figure 9-11. It does use less CPU, but not 1/20th.
Also, compare its results to the traditional sampling that samples from every page.
Posted by Hari Shanmugadhasan on 5 January 2013 at 7:57
The July 2012 Redpaper "DB2 for z/OS and List Prefetch Optimizer" has an interesting set of performance tests comparing different DASD HW configurations.
It looks like the RUNSTATS TABLESAMPLE tests here were run against something like the FICON 10K RPM disk configuration used in the Redpaper.
Various test scenarios, besides RUNSTATS, are used to show the poor list prefetch performance that it provides. The much better performance results with newer alternatives are provided.
Posted by Hari Shanmugadhasan on 5 January 2013 at 8:00
Page 8 of the List Prefetch Redpaper:
"So, the combination of zHPF and SSD raises the throughput from 2.7 MBps all the way to 90 MBps. That is a profoundly big difference."
Compared with the much higher MB/sec with sequential and dynamic prefetch it causes one to be suspicious of V9 access paths using list prefetch that fall into the slow scenarios. Suppressing list prefetch might slash some elapsed times.
Posted by Hari Shanmugadhasan on 5 January 2013 at 8:01
Posted by Hari Shanmugadhasan on 5 January 2013 at 8:22
Unfortunately, the Section 9.2 Runstats doesn't say anything about the effectiveness of any of the sampling techniques, let alone the new DB2 10 TABLESAMPLE SYSTEM option.
It is vital that a RUNSTATS sampling technique provide sufficiently accurate statistics that the DB2 optimizer makes good decisions, yet no tests are indicated.
It is of little value to save some CPU on RUNSTATS if your queries become 10 times worse!
Posted by Hari Shanmugadhasan on 5 January 2013 at 8:29
Section 9.2 is also vague about how TABLESAMPLE SYSTEM attempts to maintain accuracy.
From what little description is provided it sounds as if a form of Bernoulli sampling is applied at a page level.
This would make it similar to the system sampling implemented by various database vendors. It seems similar to the system sampling using by DB2 for LUW.
Posted by Hari Shanmugadhasan on 5 January 2013 at 8:32
Unfortunately, there are some issues with such a sampling technique. The 2007 book, "Physical Database Design : The Database Professional's Guide to Exploiting Indexes, Views, Storage, and More" states:
"However, if the data has any significant clustering in the table a system sample can lead to seriously erroneous results."
Posted by Hari Shanmugadhasan on 5 January 2013 at 8:43
The subsection of "Physical Database Design" book entitled "The Power of Sampling," pages 184-194, has an interesting discussion of RUNSTATS related sampling issues. Two of the three authors having an IBM Toronto Lab, DB2 for LUW, background. Page 191 states:
"I/O sampling on a table with significant clustering (i.e., having a good cluster ratio and therefore storing similar data together on the same or nearby pages on disk) can lead to dramatic inaccuracies."
It provides a published example.
Posted by Hari Shanmugadhasan on 5 January 2013 at 8:47
Overall, the "Physical Database Design" book is very DB2 for LUW feature oriented. Not surprising given the author backgrounds.
But with all the appropriate cautions, it does provide some ideas and questions that are applicable to other databases and platforms. Sometimes a different perspective is useful, even if it only causes you reconsider standard paths of thought.
Posted by Hari Shanmugadhasan on 5 January 2013 at 8:59
Way back in 1997, the DB2 V5 Performance Topics Redbook published test results showing the impact of various row sampling rates for RUNSTATS on query response times. These tests appear starting on page 95 in http://www.redbooks.ibm.com/redbooks/pdfs/sg242213.pdf
Much has changed in the DB2 Optimizer since 1997, but at the very least, the same tests could have been run with DB2 10 for both the old row sampling and new V10 page sampling. The guidance for row sampling might need to change.