Hashing access method

The hashing access method provides an alternative method for those queries (groupings and joins) that must process data in a grouped or correlated manner. Indexes are used to sort and group the data and are effective in some cases for implementing grouping and join query operations. However, if the optimizer had to create a temporary index for that query, extra processor time and resources are used when creating this index before the requested query can be run.

Where the hashing access method is most effective

The hashing access method can complement indexes or serve as an alternative. For each selected row, the specified grouping or join value in the row is run through a hashing function. The computed hash value is then used to search a specific partition of the hash table. A hash table is similar to a temporary work table, but has a different structure that is logically partitioned based on the specified query. If the row's source value is not found in the table, then this marks the first time that this source value has been encountered in the database table. A new hash table entry is initialized with this first-time value and additional processing is performed based on the query operation. If the row's source value is found in the table, the hash table entry for this value is retrieved and additional query processing is performed based on the requested operation (such as grouping or joining). The hash method can only correlate (or group) identical values; the hash table rows are not guaranteed to be sorted in ascending or descending order.

Where this method can be used

The hashing method can be used only when the ALWCPYDTA(*OPTIMIZE) option has been specified unless a temporary result is required, since the hash table built by the database manager is a temporary copy of the selected rows.

How this method works

The hashing algorithm allows the database manager to build a hash table that is well-balanced, given that the source data is random and distributed. The hash table itself is partitioned based on the requested query operation and the number of source values being processed. The hashing algorithm then ensures that the new hash table entries are distributed evenly across the hash table partitions. This balanced distribution is necessary to guarantee that scans in different partitions of the hash tables are processing the same number of entries. If one hash table partition contains a majority of the hash table entries, then scans of that partition are going to have to examine the majority of the entries in the hash table. This is not very efficient.

Since the hash method typically processes the rows in a table sequentially, the database manager can easily predict the sequence of memory pages from the database table needed for query processing. This is similar to the advantages of the table scan access method. The predictability allows the database manager to schedule asynchronous I/O of the table pages into main storage (also known as pre-fetching). Pre-fetching enables very efficient I/O operations for the hash method leading to improved query performance.

In contrast, query processing with a keyed sequence access method causes a random I/O to the database table for every key value examined. The I/O operations are random since the keyed-order of the data in the index does not match the physical order of the rows in the database table. Random I/O can reduce query performance because it leads to unnecessary use of I/O and processor unit resources.

An index can also be used by the hash method to process the table rows in keyed order. The index can significantly reduce the number of table rows that the hash method has to process. This can offset the random I/O costs associated with indexes.

The hash table creation and population takes place before the query is opened. Once the hash table has been completely populated with the specified database rows, the hash table is used by the database manager to start returning the results of the queries. Additional processing might be required on the resulting hash table rows, depending on the requested query operations.

Since blocks of table rows are automatically spread, the hashing access method can also be performed in parallel so that several groups of rows are being hashed at the same time. This shortens the amount of time it takes to hash all the rows in the database table.

If the DB2 SMP feature is installed, the hashing methods can be performed in parallel.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]