Join optimization

A join operation is a complex function that requires special attention in order to achieve good performance. This section describes how DB2 UDB for AS/400 implements inner join queries and how optimization choices are made by the query optimizer. It also describes design tips and techniques which help avoid or solve performance problems.

The optimization for other types of joins, LEFT OUTER JOIN or EXCEPTION JOIN (OPNQRYF JDFTVAL(*YES) or JDFTVAL(*ONLYDFT) parameter), is similar except that the join order is always the same as the order of the tables specified in the FROM clause (OPNQRYF FILE parameter). Information about these types of joins will not be detailed here, but most of the information and tips in this section also apply to joins of this type.

Nested loop join implementation

DB2 UDB for AS/400 provides a nested loop join method. For this method, the processing of the tables in the join are ordered. This order is called the join order. The first table in the final join order is called the primary table. The other tables are called secondary tables. Each join table position is called a dial. During the join, DB2 UDB for AS/400:

  1. Accesses the first primary table row selected by the predicates local to the primary table.
  2. Builds a key value from the join columns in the primary table.
  3. Uses index scan-key positioning to locate the first row that satisfies the join condition for the first secondary table using an index with keys matching the join condition or local row selection columns of the secondary table.
  4. Applies bitmap selection, if applicable.
  5. Determines if the row is selected by applying any remaining selection local to the first secondary dial.

    If the secondary dial row is not selected then the next row that satisfies the join condition is located. Steps 1 through 5 are repeated until a row that satisfies both the join condition and any remaining selection is selected from all secondary tables

  6. Returns the result join row.
  7. Processes the last secondary table again to find the next row that satisfies the join condition in that dial.

    During this processing, when no more rows that satisfy the join condition can be selected, the processing backs up to the logical previous dial and attempts to read the next row that satisfies its join condition.

  8. Ends processing when all selected rows from the primary table are processed.

Nested loop join characteristics:

Note the following characteristics of a nested loop join:

Hash join

The hash join method is similar to nested loop join. Instead of using indexes to locate the matching rows in a secondary table, however, a hash temporary result table is created that contains all of the rows selected by local selection against the table. The structure of the hash table is such that rows with the same join value are loaded into the same hash table partition (clustered). The location of the rows for any given join value can be found by applying a hashing function to the join value.

Advantages of hash joins over nested loop joins

Hash join has several advantages over nested loop join:

Queries that cannot use hash join

Hash join cannot be used for queries that:

Hash join and parallel processing

The query attribute DEGREE, which can be changed by using the Change Query attribute CL command (CHGQRYA), does not enable or disable the optimizer from choosing to use hash join. However, hash join queries can use SMP parallelism if the query attribute DEGREE is set to either *OPTIMIZE, *MAX, or *NBRTASKS.

Types of queries that can effectively use hash join

Hash join is used in many of the same cases where a temporary index would have been built. Join queries which are most likely to be implemented using hash join are those where either:

Example of hash join that processes all rows

The following is an example of a join query that would process all of the rows from the queried tables:

    SELECT *
      FROM EMPLOYEE, EMP_ACT
     WHERE EMPLOYEE.EMPNO = EMP_ACT.EMPNO
   OPTIMIZE FOR 99999999 ROWS

OPNQRYF example :

OPNQRYF FILE((EMPLOYEE EMP_ACT)) FORMAT(FORMAT1)
   JFLD((1/EMPNO 2/EMPNO *EQ)) 
   ALWCPYDTA(*OPTIMIZE)

This query is implemented using the following steps:

  1. A temporary hash table is built over table EMP_ACT with a key of EMPNO. This occurs when the query is opened.
  2. For each row retrieved from the EMPLOYEE table, the temporary hash table will be probed for any matching join values.
  3. For each matching row found, a result row is returned.

The messages created by the PRTSQLINF CL command to describe this hash join query in an SQL program would appear as follows:

SQL402A  Hashing algorithm used to process join.
SQL402B  Table EMPLOYEE used in hash join step 1.
SQL402B  Table EMP_ACT used in hash join step 2.

Example of hash join on query that is limited by local selection

The following is an example of a join query that would have the queried tables of the join queried significantly reduced by local selection:

    SELECT EMPNO, LASTNAME, DEPTNAME
      FROM EMPLOYEE, DEPARTMENT
     WHERE EMPLOYEE.WORKDEPT = DEPARTMENT.DEPTNO
       AND EMPLOYEE.HIREDATE BETWEEN 1996-01-30 AND 1995-01-30
       AND DEPARTMENT.DEPTNO IN ('A00', 'D01', 'D11', 'D21', 'E11')
  OPTIMIZE FOR 99999999 ROWS

OPNQRYF example:

OPNQRYF FILE((EMPLOYEE DEPARTMENT))
   FORMAT(FORMAT2)
   QRYSLT('1/HIREDATE *EQ %RANGE(''1996-01-30'' ''1995-01-30'')
   *AND 2/DEPTNO *EQ %VALUES(''A00'' ''D01'' ''D11'' ''D21''
   ''E11''') 
   JFLD((1/WORKDEPT 2/DEPTNO *EQ))
   ALWCPYDTA(*OPTIMIZE) 

This query is implemented using the following steps:

  1. A temporary hash table is built over table DEPARTMENT with key values of DEPTNO containing rows matching the selection predicate, DEPTNO IN ('A00', 'D01', 'D11', 'D21', 'E11'). This occurs when the query is opened.
  2. For each row retrieved from the EMPLOYEE table matching the selection predicate, HIREDATE BETWEEN 1996-01-30 and 1995-01-30, the temporary hash table will be probed for the matching join values.
  3. For each matching row found, a result row is returned.

The messages created by the PRTSQLINF CL command to describe this hash join query in an SQL program would appear as follows:

   SQL402A  Hashing algorithm used to process join.
   SQL402B  Table EMPLOYEE used in hash join step 1.
   SQL402B  Table DEPARTMENT used in hash join step 2.

Example of hash join on query where ordering, grouping, non-equal selection, or result columns are selected

When ordering, grouping, non-equal selection specified with operands derived from columns of different tables, or result columns are derived from columns of different tables, the hash join processing will be done and the result rows of the join will be written to a temporary table. Then, as a second step, the query will be completed using the temporary table.

The following is an example of a join query with selection specified with operands derived from columns of different tables:

    SELECT EMPNO, LASTNAME, DEPTNAME
      FROM EMPLOYEE, DEPARTMENT
     WHERE EMPLOYEE.WORKDEPT = DEPARTMENT.DEPTNO
           AND EMPLOYEE.EMPNO > DEPARTMENT.MGRNO
   OPTIMIZE FOR 99999999 ROWS

OPNQRYF example:

OPNQRYF FILE((EMPLOYEE DEPARTMENT)
   FORMAT(FORMAT2)
   JFLD((1/WORKDEPT 2/DEPTNO *EQ) (1/EMPNO 2/MGRNO
   *GT))

This query is implemented using the following steps:

  1. A temporary hash table is built over table DEPARTMENT with a key of DEPTNO. This occurs when the query is opened.
  2. For each row retrieved from the EMPLOYEE table, the temporary hash table will be probed for the matching join values.
  3. For each matching row found, a result row is written to a temporary table.
  4. After all of the join result rows are written to the temporary table, rows that are selected by EMPNO > MGRNO are read from the temporary table and returned to the application.

The messages created by the PRTSQLINF CL command to describe this hash join query in an SQL program would appear as follows:

   SQL402A  Hashing algorithm used to process join.
   SQL402B  Table EMPLOYEE used in hash join step 1.
   SQL402B  Table DEPARTMENT used in hash join step 2.
   SQL402C  Temporary result table created for hash join query.

Join optimization algorithm

The query optimizer must determine the join columns, join operators, local row selection, index usage, and dial ordering for a join query.

The join columns and join operators depend on the:

Join specifications which are not implemented for the dial are either deferred until they can be processed in a later dial or, if an inner join was being performed for this dial, processed as row selection.

For a given dial, the only join specifications which are usable as join columns for that dial are those being joined to a previous dial. For example, for the second dial the only join specifications that can be used to satisfy the join condition are join specifications which reference columns in the primary dial. Likewise, the third dial can only use join specifications which reference columns in the primary and the second dials and so on. Join specifications which reference later dials are deferred until the referenced dial is processed.

For any given dial, only one type of join operator is normally implemented. For example, if one inner join join specification has a join operator of '=' and the other has a join operator of '>', the optimizer attempts to implement the join with the '=' operator. The '>' join specification is processed as row selection after a matching row for the '=' specification is found. In addition, multiple join specifications that use the same operator are implemented together.
Note:For OPNQRYF, only one type of join operator is allowed for either a left outer or an exception join.

When looking for an existing index to access a secondary dial, the query optimizer looks at the left-most key columns of the index. For a given dial and index, the join specifications which use the left-most key columns can be used. For example:

    DECLARE BROWSE2 CURSOR FOR
     SELECT * FROM EMPLOYEE, EMP_ACT
      WHERE EMPLOYEE.EMPNO = EMP_ACT.EMPNO
        AND EMPLOYEE.HIREDATE = EMP_ACT.EMSTDATE
     OPTIMIZE FOR 99999 ROWS

OPNQRYF example:

OPNQRYF FILE((EMPLOYEE, EMP_ACT)) FORMAT(FORMAT1)
   JFLD((1/EMPNO 2/EMP_ACT *EQ)(1/HIREDATE 2/EMSTDATE
   *EQ))

For the index over EMP_ACT with key columns EMPNO, PROJNO, and EMSTDATE, the join operation is performed only on column EMPNO. After the join is processed, row selection is done using column EMSTDATE.

The query optimizer also uses local row selection when choosing the best use of the index for the secondary dial. If the previous example had been expressed with a local predicate as:

    DECLARE BROWSE2 CURSOR FOR
     SELECT * FROM EMPLOYEE, EMP_ACT
      WHERE EMPLOYEE.EMPNO = EMP_ACT.EMPNO
        AND EMPLOYEE.HIREDATE = EMP_ACT.EMSTDATE
        AND EMP_ACT.PROJNO = '123456'
     OPTIMIZE FOR 99999 ROWS

OPNQRYF example:

OPNQRYF FILE((EMPLOYEE, EMP_ACT)) FORMAT(FORMAT2)
   QRYSLT('2/PROJNO *EQ ''123456''')
   JFLD((1/EMPNO 2/EMP_ACT *EQ)(1/HIREDATE 2/EMSTDATE
   *EQ))

the index with key columns EMPNO, PROJNO, and EMSTDATE are fully utilized by combining join and selection into one operation against all three key columns.

When creating a temporary index, the left-most key columns are the usable join columns in that dial position. All local row selection for that dial is processed when selecting keys for inclusion into the temporary index. A temporary index is similar to the index created for a select/omit keyed logical file. The temporary index for the previous example would have key columns of EMPNO and EMSTDATE.

Since the OS/400 query optimizer attempts a combination of join and local row selection when determining access path usage, it is possible to achieve almost all of the same advantages of a temporary index by use of an existing index. In the above example, using either implementation, an existing index may be used or a temporary index may be created. A temporary index would have been built with the local row selection on PROJNO applied during the index's creation; the temporary index would have key columns of EMP_ACT and EMSTDATE (to match the join selection). If, instead, an existing index was used with key columns of EMP_ACT, PROJNO, EMSTDATE (or PROJNO, EMP_ACT, EMSTDATE or EMSTDATE, PROJNO, EMP_ACT or ...) the local row selection could be applied at the same time as the join selection (rather than prior to the join selection, as happens when the temporary index is created).

The implementation using the existing index is more likely to provide faster performance because join and selection processing are combined without the overhead of building a temporary index. However, the use of the existing index may have just slightly slower I/O processing than the temporary index because the local selection is run many times rather than once. In general, it is a good idea to have existing indexes available with key columns for the combination of join columns and columns using equal selection as the left-most keys.

Join order optimization

The join order is fixed if any join logical files are referenced or a left outer or an exception join is used to implement any of the dials of the join. The join order is also fixed if the OPNQRYF JORDER(*FILE) parameter is specified or the query options file (QAQQINI) FORCE_JOIN_ORDER parameter is *YES. Otherwise, the following join ordering algorithm is used to determine the order of the tables:

  1. Determine an access method for each individual table as candidates for the primary dial.
  2. Estimate the number of rows returned for each table based on local row selection.

    If the join query with row ordering or group by processing is being processed in one step, then the table with the ordering or grouping columns is the primary table.

  3. Determine an access method, cost, and expected number of rows returned for each join combination of candidate tables as primary and first secondary tables.

    The join order combinations estimated for a four table join would be:


    1-2 2-1 1-3 3-1 1-4 4-1 2-3 3-2 2-4 4-2 3-4 4-3

  4. Choose the combination with the lowest join cost.

    If the cost is nearly the same, then choose the combination which selects the fewest rows.

  5. Determine the cost, access method, and expected number of rows for each remaining table joined to the previous secondary table.
  6. Select an access method for each table that has the lowest cost for that table.
  7. Choose the secondary table with the lowest join cost.

    If the cost is nearly the same, choose the combination which selects the fewest rows.

  8. Repeat steps 4 through 7 until the lowest cost join order is determined.

When a join logical file is referenced, a left outer or an exception join, or the join order is forced to the specified table order, the query optimizer loops through all of the dials in the order specified, and determines the lowest cost access methods.

Cost estimation and index selection for join secondary dials

In step 3 and in step 5, the query optimizer has to estimate a cost and choose an access method for a given dial combination. The choices made are similar to those for row selection except that an index must be used.

As the query optimizer compares the various possible access choices, it must assign a numeric cost value to each candidate and use that value to determine the implementation which consumes the least amount of processing time. This costing value is a combination of CPU and I/O time and is based on the following assumptions:

The main factors of the join cost calculations for secondary dials are the number of rows selected in all previous dials and the number of rows which match, on average, each of the rows selected from previous dials. Both of these factors can be derived by estimating the number of matching rows for a given dial.

When the join operator is something other than equal, the expected number of matching rows is based on the following default filter factors:

For example, when the join operator is less-than, the expected number of matching rows is .33 * (number of rows in the dial). If no join specifications are active for the current dial, the cartesian product is assumed to be the operator. For cartesian products, the number of matching rows is every row in the dial, unless local row selection can be applied to the index.

When the join operator is equal, the expected number of rows is the average number of duplicate rows for a given value.

The AS/400 performs index maintenance (insertion and deletion of key values in an index) and maintains a running count of the number of unique values for the given key columns in the index. These statistics are bound with the index object and are always maintained. The query optimizer uses these statistics when it is optimizing a query. Maintaining these statistics adds no measurable amount of overhead to index maintenance. This statistical information is only available for indexes which:

Average number of duplicate values statistics are maintained only for the first 4 left-most keys of the index. For queries which specify more than 4 join columns, it might be beneficial to create multiple additional indexes so that an index can be found with average number of duplicate values statistics available within the 4 left-most key columns. This is particularly important if some of the join columns are somewhat unique (low average number of duplicate values).

Figure 1. Average number of duplicate values of a 3 key index


Average number of duplicate values of a 3 key index

These statistics are maintained as part of index rebuild and creation.

Using the average number of duplicate values for equal joins or the default filter value for the other join operators, we now have the number of matching rows. The following formula is used to compute the number of join rows from previous dials.

NPREV =  Rp * M2 * FF2 * ..... *Mn * FFn .....

NPREV
The number of join rows from all previous dials.

Rp
The number of rows selected from the primary dial.

M2
The number of matching rows for dial 2.

FF2
Filtering reduction factor for predicates local to dial 2 that are not already applied using M2 above.

Mn
The number of matching rows for dial n.

FFn
Filtering reduction factor for predicates local to dial n that are not already applied using Mn above.
Note:Multiply the pair of matching rows (Mn) and filter reduction filter factors (FFn) for each secondary dial preceding the current dial.

Now that it has calculated the number of join rows from previous dials, the optimizer is ready to generate a cost for the access method.

Temporary index or hash temporary result table from table

The first access method choice analyzed by the query optimizer is building a temporary index or hash temporary result table from the table. The basic formula for costing access of a join secondary dial through a temporary index built from the table or hash table follows:

JSCOST  =  CRTDSI +
           NPREV *((MATCH * FF * KeyAccess)
           + (MATCH * FF * FCost)) *
           FirstIO

JSCOST
Join Secondary cost

CRTDSI
Cost to build the temporary index or a hash temporary result table

NPREV
The number of join rows from all previous dials

MATCH
The number of matching rows (usually average duplicates)

KeyAccess
The cost to access a key in an index or a hash table

FF
The filtering factor for local predicates of this dial (excluding selection performed on earlier dials because of transitive closure)

FCost
The cost to access a row from the table

FirstIO
A reduction ratio to reduce the non-startup cost because of an optimization goal to optimize for the first buffer retrieval. For more information, see "Cost estimation for queries".

This secondary dial access method is used if no usable index is found or if the temporary index or hash table performs better than any existing index. This method can be better than using any existing index because the row selection is completed when the index or hash table is created if any of the following are true:

Temporary index or hash table from index

The basic cost formula for this access method choice is the same as that of using a temporary index or hash table built from a table, with one exception. The cost to build the temporary index, CRTDSI, is calculated to include the selection of the rows through an existing index. This access method is used for join secondary dial access for the same reason. However, the creation from an index might be less costly.

Use an existing index

The final access method is to use an existing index. The basic formula for costing access of a join secondary dial through an existing index is:

JSCOST  =  NPREV *((MATCH * KeyAccess)
         + (MATCH * FCost)) *
           FirstIO
 

JSCOST
Join Secondary cost

NPREV
The number of join rows from all previous dials

MATCH
The number of matching keys which will be found in this index (usually average duplicates)

KeyAccess
The cost to access a key in an index

FCost
The cost to access a row from the table

FirstIO
A reduction ratio to reduce the non-startup cost because of an optimization goal to optimize for the first buffer retrieval. For more information, see "Cost estimation for queries".

If I/O optimization is used (that is, OPNQRYF OPTIMIZE(*FIRSTIO)), this is a likely access method because the entire cost is reduced. Also, if the number of join rows from all previous dials (NPREV), and the number of matching keys (MATCH) is low, this may be the most efficient method.

The query optimizer considers using an index which only has a subset of the join columns as the left-most leading keys when:

Predicates generated through transitive closure

For join queries, the query optimizer may do some special processing to generate additional selection. When the set of predicates that belong to a query logically infer extra predicates, the query optimizer generates additional predicates. The purpose is to provide more information during join optimization.

Example of predicates being added because of transitive closure
SELECT * FROM EMPLOYEE, EMP_ACT
  WHERE EMPLOYEE.EMPNO = EMP_ACT.EMPNO
    AND EMPLOYEE.EMPNO = '000010'

The optimizer will modify the query to be:

SELECT * FROM EMPLOYEE, EMP_ACT
  WHERE EMPLOYEE.EMPNO = EMP_ACT.EMPNO
    AND EMPLOYEE.EMPNO = '000010'
    AND EMP_ACT.EMPNO  = '000010' 

OPNQRYF example:

OPNQRYF FILE((EMPLOYEE EMP_ACT)) FORMAT(FORMAT1)
   QRYSLT('1/EMPNO *EQ ''000010''')
   JFLD((1/EMPNO 2/EMPNO *EQ))
 

The optimizer will modify the query to be:

OPNQRYF FILE((EMPLOYEE EMP_ACT)) FORMAT(FORMAT1)
   QRYSLT('1/EMPNO *EQ ''000010'' *AND
   2/EMPNO *EQ ''000010''')
   JFLD((1/EMPNO 2/EMPNO *EQ)) 

The following rules determine which predicates are added to other join dials:

The query optimizer generates a new predicate, whether or not a predicate already exists in the WHERE clause (OPNQRYF QRYSLT parameter).

Some predicates are redundant. This occurs when a previous evaluation of other predicates in the query already determines the result that predicate provides. Redundant predicates can be specified by you or generated by the query optimizer during predicate manipulation. Redundant predicates with predicate operators of =, >, >=, <, <=, or BETWEEN (OPNQRYF *EQ, *GT, *GE, *LT, *LE, or %RANGE) are merged into a single predicate to reflect the most selective range.

Multiple join types for a query

Even though multiple join types (inner, left outer and exception) can be specified in the query using the JOIN syntax, the AS/400 Licensed Internal Code can only support one join type for the entire query. This requires the optimizer to determine what the overall join type for the query should be.
Note:This section does not apply to OPNQRYF.

The optimizer will evaluate the join criteria along with any row selection that may be specified in order to determine the join type for each dial and for the entire query. Once this information is known the optimizer will generate additional selection using the relative row number of the tables to simulate the different types of joins that may occur within the query.

Since null values are returned for any unmatched rows for either a left outer or an exception join, any isolatable selection specified for that dial, including any additional join criteria that may be specified in the WHERE clause, will cause all of the unmatched rows to be eliminated (unless the selection is for an IS NULL predicate). This will cause the join type for that dial to be changed to an inner join (or an exception join) if the IS NULL predicate was specified.

In the following example a left outer join is specified between the tables EMPLOYEE and DEPARTMENT. In the WHERE clause there are two selection predicates that also apply to the DEPARTMENT table.

    SELECT EMPNO, LASTNAME, DEPTNAME, PROJNO
     FROM CORPDATA.EMPLOYEE XXX LEFT OUTER JOIN CORPDATA.DEPARTMENT YYY
             ON XXX.WORKDEPT = YYY.DEPTNO
           LEFT OUTER JOIN CORPDATA.PROJECT ZZZ
             ON XXX.EMPNO = ZZZ.RESPEMP
     WHERE XXX.EMPNO = YYY.MGRNO AND
             YYY.DEPTNO IN ('A00', 'D01', 'D11', 'D21', 'E11')

The first selection predicate, XXX.EMPNO = YYY.MGRNO, is an additional join condition that will be added to the join criteria and evaluated as an "inner join" join condition. The second is an isolatable selection predicate that will eliminate any unmatched rows. Either one of these selection predicates will cause the join type for the DEPARTMENT table to be changed from a left outer join to an inner join.

Even though the join between the EMPLOYEE and the DEPARTMENT table was changed to an inner join the entire query will still need to remain a left outer join to satisfy the join condition for the PROJECT table.
Note:Care must be taken when specifying multiple join types since they are supported by appending selection to the query for any unmatched rows. This means that the number of resulting rows that satisfy the join criteria can become quite large before any selection is applied that will either select or omit the unmatched rows based on that individual dial's join type.

For more information on how to use the JOIN syntax see either Joining Data from More Than One Table in the SQL Programming Concepts book or the SQL Reference book.

Sources of join query performance problems

The optimization algorithms described above benefit most join queries, but the performance of a few queries may be degraded. This occurs when:

Tips for improving the performance of join queries

If you are looking at a join query which is performing poorly or you are about to create a new application which uses join queries, the following checklist may be useful.

Table 2. Checklist for Creating an Application that Uses Join Queries
What to Do How It Helps
Check the database design. Make sure that there are indexes available over all of the join columns and/or row selection columns. If using CRTLF, make sure that the index is not shared. This gives the query optimizer a better opportunity to select an efficient access method because it can determine the average number of duplicate values. Many queries may be able to use the existing index to implement the query and avoid the cost of creating a temporary index.
Check the query to see whether some complex predicates should be added to other dials to allow the optimizer to get a better idea of the selectivity of each dial. Since the query optimizer does not add predicates for predicates connected by OR or non-isolatable predicates, or predicate operators of LIKE or IN, modifying the query by adding these predicates may help.
Create an index which includes Select/Omit specifications which match that of the query using CRTLF CL command. This step helps if the statistical characteristics are not uniform for the entire table. For example, if there is one value which has a high duplication factor and the rest of the column values are unique, then a select/omit index allows the optimizer to skew the distribution of values for that key and make the right optimization for the selected values.
Specify ALWCPYDTA(*OPTIMIZE) or ALWCPYDTA(*YES) If the query is creating a temporary index, and you feel that the processing time would be better if the optimizer only used the existing index, specify ALWCPYDTA(*YES).

If the query is not creating a temporary index, and you feel that the processing time would be better if a temporary index was created, specify ALWCPYDTA(*OPTIMIZE).

Alternatively, specify the OPTIMIZE FOR n ROWS to inform the optimizer of the application has intention to read every resulting row. To do this set n to a large number. You could also set n to a small number before ending the query.

For OPNQRYF, specify OPTIMIZE(*FIRSTIO) or OPTIMIZE(*ALLIO) If the query is creating a temporary index and you feel that the processing time would be better if it would only use the existing index, then specify OPTIMIZE(*FIRSTIO). If the query is not creating a temporary index and you feel that the processing time would be better if a temporary index was created then specify OPTIMIZE(*ALLIO).
Use a join logical file or use the query options file (QAQQINI) FORCE_JOIN_ORDER parameter of *YES. OPNQRYF users can specify JORDER(*FILE). A join in which one table is joined with all secondary tables consecutively is sometimes called a star join. In the case of a star join where all secondary join predicates contain a column reference to a particular table, there may be performance advantages if that table is placed in join position one. In Example A, all tables are joined to table EMPLOYEE. The query optimizer can freely determine the join order. The query should be changed to force EMPLOYEE into join position one by using the query options file (QAQQINI) FORCE_JOIN_ORDER parameter of *YES or OPNQRYF JORDER(*FILE) as shown in example B. Note that in these examples the join type is a join with no default values returned (this is an inner join.). The reason for forcing the table into the first position is to avoid random I/O processing. If EMPLOYEE is not in join position one, every row in EMPLOYEE could be examined repeatedly during the join process. If EMPLOYEE is fairly large, considerable random I/O processing occurs resulting in poor performance. By forcing EMPLOYEE to the first position, random I/O processing is minimized.

Example A: Star join query

DECLARE C1 CURSOR FOR 
  SELECT * FROM DEPARTMENT, EMP_ACT, EMPLOYEE,
  PROJECT
  WHERE DEPARTMENT.DEPTNO=EMPLOYEE.WORKDEPT 
  AND EMP_ACT.EMPNO=EMPLOYEE.EMPNO
  AND EMPLOYEE.WORKDEPT=PROJECT.DEPTNO

Example B: Star join query with order forced via

FORCE_JOIN_ORDER
  DECLARE C1 CURSOR FOR 
  SELECT * FROM EMPLOYEE, DEPARTMENT, EMP_ACT,
  PROJECT
  WHERE DEPARTMENT.DEPTNO=EMPLOYEE.WORKDEPT 
  AND EMP_ACT.EMPNO=EMPLOYEE.EMPNO
  AND EMPLOYEE.WORKDEPT=PROJECT.DEPTNO
  OPNQRYF

Example A: Star join query

OPNQRYF FILE((DEPARTMENT EMP_ACT EMPLOYEE
  PROJECT)) FORMAT(FORMAT1)
  JFLD((1/DEPTNO 3/WORKDEPT *EQ)
  (2/EMPNO 3/EMPNO *EQ)
  (3/WORKDEPT 4/DEPTNO *EQ))

Example B: Start join query with JORDER(*FILE) parameter

OPNQRYF FILE((EMPLOYEE DEPARTMENT EMP_ACT
  PROJECT)) FORMAT(FORMAT1)
  JFLD((2/DEPTNO 1/WORKDEPT *EQ)
  (3/EMPNO 1/EMPNO *EQ)
  (1/WORKDEPT 4/DEPTNO *EQ))
  JORDER(*FILE)
Note:Specifying columns from EMPLOYEE in the ORDER BY clause (OPNQRYF KEYFLD parameter) may also have the effect of placing EMPLOYEE in join position 1. This allows the query optimizer to choose the best order for the remaining tables.
Specify ALWCPYDTA(*OPTIMIZE) to allow the query optimizer to use a sort routine. In the cases where ordering is specified and all key columns are from a single dial, this allows the query optimizer to consider all possible join orders.
Specify join predicates to prevent all of the rows from one table from being joined to every row in the other table. This improves performance by reducing the join fan-out. Every secondary table should have at least one join predicate that references on of its columns as a 'join-to' column.

Tips for improving performance when selecting data from more than two tables

If the select-statement you are considering accesses two or more tables, all the recommendations suggested in "Using indexes to speed access to large tables" apply. The following suggestion is directed specifically to select-statements that access several tables. For joins that involve more than two tables, you might want to provide redundant information about the join columns. If you give the optimizer extra information to work with when requesting a join. It can determine the best way to do the join. The additional information might seem redundant, but is helpful to the optimizer. For example, instead of coding:

  EXEC SQL
    DECLARE EMPACTDATA CURSOR FOR
    SELECT LASTNAME, DEPTNAME, PROJNO, ACTNO
         FROM CORPDATA.DEPARTMENT, CORPDATA.EMPLOYEE,
                CORPDATA.EMP_ACT
        WHERE DEPARTMENT.MGRNO = EMPLOYEE.EMPNO
               AND EMPLOYEE.EMPNO = EMP_ACT.EMPNO
  END-EXEC.

When using the OPNQRYF command, specify:

OPNQRYF FILE(CORPDATA/DEPARTMENT CORPDATA/EMPLOYEE CORPDATA/EMP_ACT)
   FORMAT(FORMAT1)
   JFLD((1/MGRNO 2/EMPNO *EQ) (2/EMPNO 3/EMP_ACT *EQ))

Provide the optimizer with a little more data and code:

  EXEC SQL
    DECLARE EMPACTDATA CURSOR FOR
    SELECT LASTNAME, DEPTNAME, PROJNO, ACTNO
         FROM CORPDATA.DEPARTMENT, CORPDATA.EMPLOYEE,
                CORPDATA.EMP_ACT
         WHERE DEPARTMENT.MGRNO = EMPLOYEE.EMPNO
                AND EMPLOYEE.EMPNO = EMP_ACT.EMPNO
                AND DEPARTMENT.MGRNO = EMP_ACT.EMPNO
  END-EXEC.

When using the OPNQRYF command, specify:

OPNQRYF FILE(CORPDATA/DEPARTMENT CORPDATA/EMPLOYEE CORPDATA/EMP_ACT)
   FORMAT(FORMAT1)
   JFLD((1/MGRNO 2/EMPNO *EQ) (2/EMPNO 3/EMP_ACT *EQ)
   (1/MGRNO 3/EMPNO *EQ))


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]