Maximizing Performance of IBM DB2 Backups

IBM Redbooks Analytics Support Web Doc

Published 21 October 2016

More options

Rate and comment

Authors: Jeff Andersen

Abstract

This IBM® Redbooks® Analytics Support Web Doc describes how to maximize the performance of IBM DB2® backups. Backing up a database is a critical part of any disaster recovery plan. Making sure that the backup completes in a reasonable time frame means that there is always a valid backup available in case of emergency. This document describes the following elements that are critical to good backup performance:


  • The backup process model
  • Basic backup performance concepts
  • Backup performance considerations
  • Monitoring backup performance with the DB2_BAR_STATS registry variable


This document applies to all versions of DB2 for Linux, UNIX, and Windows.

Contents

This IBM® Redbooks® Analytics Support Web Doc describes how to maximize the performance of IBM DB2® backups.

Backing up a database is a critical part of any disaster recovery plan. Making sure that the backup completes in a reasonable time frame means that there is always a valid backup available in case of emergency. This document describes the following elements that are critical to good backup performance:

  • The backup process model
  • Basic backup performance concepts
  • Backup performance considerations
  • Monitoring backup performance with the DB2_BAR_STATS registry variable

This document applies to all versions of DB2 for Linux, UNIX, and Windows.


The backup process model

The backup process model consists of the engine dispatchable units (EDUs) db2agent, db2bm, and db2med. If the backup uses a storage manager, an additional db2vend process is created. The db2agent EDU does most of its work during the start of the backup, and then waits for the backup to complete. The tasks that it performs include the following actions:
  • Calculate and create an optimal number of db2bm and db2med EDUs
  • Allocate an optimal number of optimally sized backup buffers
  • Display the selected tunable values to the db2diag.log
  • Allocate message queues for inter-EDU communication
  • Create a list of table spaces that are sorted by decreasing size
  • Coordinate the backup process by sending control messages to the db2bm and db2med EDUs

The db2bm EDU acquires a backup buffer from the empty queue, and then fills the buffer by reading data from its assigned table space. The db2bm then places the backup buffer on the full queue to be written to the backup device by the db2med EDU.

The db2med EDU is responsible for acquiring a backup buffer from the full queue and writing it to the backup device. If the backup is being written to disk, it handles the input/output (I/O) directly. If the backup is being written to a storage manager, it transfers the full buffer to the db2vend process. After confirmation that the db2vend process successfully saved the contents of the buffer, the db2med EDU returns the backup buffer to an empty queue for reuse by the next db2bm EDU.

The db2vend process is created by the db2med EDU if a storage manager is being used. The db2vend process receives a full buffer from the db2med, and calls the appropriate vendor application programming interface (API) for sending the data to the storage manager. A separate process is used so that any crash or abnormal failure during interaction with the storage manager terminates only the db2vend process, and does not negatively affect the stability of the DB2 instance.

Figure 1 is a graphical representation of the backup process model.

Backup process showing table spaces, parallelism, backup buffers, and sessions
Figure 1. Backup process model


Basic backup performance concepts

The following concepts are critical to designing and implementing a successful backup strategy:
  • Balanced table spaces are the primary factor

    The primary way to ensure good backup performance is to make sure that the data is evenly distributed between table spaces. During the backup, DB2 assigns one db2bm EDU to back up each table space. If most of the data in a database is in one table space, the backup is serialized while only the one table space is being backed up. Having data equally distributed between multiple table spaces ensures maximum parallelism throughout the backup.

  • Difference between system managed space (SMS) and database managed space (DMS) table spaces

    When backing up an SMS table space, every page in the table is included in the backup image. When backing up a DMS table space, only extents that are marked as used at the beginning of the backup are included in the backup image.

  • What pages are included in an incremental or delta backup

    An incremental backup includes only pages that were modified since the last full backup. Every page must have its page header scanned to determine whether it belongs in the backup. All large object (LOB), long varchar, and long vargraphic (LONG) pages are included in the backup image, because they do not have page headers. A delta backup includes all of the pages that were modified since the last backup of any type. The delta backup also includes all LOB and LONG pages.

    When performing incremental backups, DB2 keeps track of whether table spaces have been modified since the last backup. If a table space has not been modified since the last backup, the scanning of that table space can be skipped. However, if even one page has been modified in a table space, every page in the table space must be scanned to find the modified page. This can result in the db2med EDU being idle for a long time, and can potentially cause storage managers to think that the backup is hung and to close down the connection to the db2vend process. This results in a failed backup. Make sure to set idle timeout values with the storage manager appropriately.

    Incremental and delta backups are designed to be smaller than full database backups, not faster. If the incremental or delta backup can skip the scanning of some table spaces, the backup might be faster than a full database backup. If all table spaces must be scanned, the extra scanning of the page headers to determine whether a page belongs in the backup image might cause the backup to take longer than a full database backup.


Backup performance considerations

Consider several items to maximize the performance of database backups:
  • Buffer size and number of buffers

    DB2 automatically picks an optimized value for the number and size of the backup buffers. The buffers are allocated from the utility heap, which is controlled by the util_heap_sz database configuration setting. The minimum buffer size is 8 (4 KB) pages and the maximum buffer size is 16384 (4 KB) pages. The ideal size of all combined buffers is 90% of the utility heap for offline backups, and 50% of available space in the utility heap for online backups. The number of buffers is dependent on the number of db2bm and db2med EDUs selected for the backup. Default values for the number of buffers and the size of buffers can be overwritten by specifying 'WITH num-buffers BUFFERS' and 'BUFFER buffer-size' in the BACKUP DATABASE command. The size of the backup buffer is +1 of a multiple of the extent size. This plus one page is to accommodate the object header that precedes each block of user data in the backup image.

  • Parallelism

    At the start of the backup, each db2bm EDU is assigned to back up an individual table space. The table spaces are backed up in order of decreasing size. The size is determined by multiplying page size by the number of used pages. When a db2bm has completed one table space, it processes the next largest table space in the list. If there are no remaining table spaces, the db2bm is idle until the backup is completed. The number of db2bm EDUs created is based on the number of processors (CPUs) available, and can be manually adjusted using the PARALLELISM parameter of the BACKUP DATABASE command.

    At the start of the backup, one or more db2med EDUs are started. If the database is being backed up to disk, the number of db2med EDUs is based on the number of output directories that are listed in the TO clause of the BACKUP DATABASE command. If the database is being backed up to a storage manager, the number of db2med EDUs is based on the 'OPEN <N> SESSIONS' option of the BACKUP DATABASE command. Each db2med EDU writes to its own output device.

  • Prefetching

    When a table space has a single container, the reads for the table space are performed directly by the db2bm EDU. When a table space has multiple containers, the read operations are performed by the db2pfchr (prefetcher) EDU instead of directly by the db2bm. This can enable faster reads because parallel I/Os can be issued across multiple containers at the same time.

    For a single container table space, prefetching can still be used by setting the registry variable DB2_PARALLEL_IO=* and altering the table space to make prefetch size be a multiple of extent size. The instance must be restarted for the DB2_PARALLEL_IO registry variable to take effect.

    When using prefetching for backups, ensure that the NUM_IOSERVERS database configuration parameter is greater than or equal to the number of db2bm EDUs created.
  • Order that table spaces are backed up

    The table spaces are backed up in order of decreasing size. The size is determined by multiplying the page size of the table space by the number of used pages in the table space. User or system temporary table spaces are never included in the backup image.

  • Fragmentation of free extents

    When backing up a DMS table space, the db2bm EDU reads blocks of contiguous used extents, up to the size of the backup buffer. If there are free extents below the high water mark, the block of data must be broken up into smaller blocks.

    Each block of data must be read in exclusive mode. Concurrent disk reads from online activity can conflict with the reads of the backup. It is critical to minimize the number of reads to maximize backup performance. This can be done both by reducing the number of free extents below the high water mark, and by performing an online backup during times of lighter user activity.

    Viewing the output of the db2pd -tablespaces command is an easy way to look for free space fragmentation:

    db2pd -db sample -tablespace 2

    Database Member 0 -- Database SAMPLE -- Active -- Up 0 days 00:03:37 -- Date 2016-08-25-12.07.39.893090

    Tablespace 2 Configuration:

    AddressTypeContentPageSzExtentSz<…>RSEName
    0x00007FECA5A64080DMSLarge409632<…>YesUSERSPACE1

    Tablespace 2 Statistics:
    AddressTotalPgsUsablePgsUsedPgs<…>FreePgsHWM<…>
    0x00007FECA5A6408011552641155232805376<…>3498561150496<…>

    In this example, used pages (UsedPgs) is 805376 and the high water mark (HWM) is 1150496. 1150496 - 805376 indicates 345120 free pages below the HWM. Dividing the number of free pages below the HWM by the HWM (345120 / 1150496) indicates that 30% of the pages below the HWM are free.

    If reclaimable storage is enabled for the table space, consider using the ALTER TABLESPACE command with the REDUCE or LOWER HIGH WATER MARK parameter to consolidate all of the used pages at the beginning of the table space, and potentially reduce the size of the table space. There are two ways to check whether reclaimable storage is enabled:
    • In DB2 V10.5 and later, run the db2pd -db <database> -tablespaces command and look at the RSE column to determine if reclaimable storage is enabled for a particular table space. The possible values are Yes and No.
    • Issue the following command (one line):
      db2 "select varchar(tbsp_name, 20) as tbsp_name, reclaimable_space_enabled from table(MON_GET_TABLESPACE('',-2)) as t"

      A value of 0 indicates that reclaimable storage is not enabled. A value of 1 indicates that reclaimable storage is enabled.

    Reclaimable storage is enabled for all table spaces that are created in DB2 V9.7 and later.
  • Compression

    The COMPRESS parameter of the BACKUP DATABASE command causes the data in the backup buffer to be compressed before being written to the storage device. If a custom compression library is not specified, DB2 uses a default compression algorithm. This default compression algorithm is CPU-intensive, and can cause the backup to slow down, especially if the system is already at or near its CPU capacity. The compress function is called by the db2bm EDU each time it is ready to place a backup buffer on the full queue. When using compression, make sure to monitor for a shortage of backup buffers. See the DB2_BAR_STATS section for monitoring details.

    An alternative to using the CPU-intensive compress parameter is to enable compression at the table level. See the following website for a discussion about the various types of compression available:
    http://www.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.dbobj.doc/doc/t0055284.html

    Any of these compression options can significantly reduce the size of the table, and therefore reduce the size of the backup image.

  • Throttling

    The BACKUP DATABASE command supports throttling of the backup utility by using the UTIL_IMPACT_PRIORITY parameter. The backup runs unthrottled by default. When throttling is enabled, the EDUs involved in the backup periodically invoke the sleep system call. The sleep call yields the CPU to the next thread that is ready to run. The additional sleep calls cause the throttled backup to run longer than an unthrottled backup. If throttling is being used to reduce the run queue at the operating system level, consider adding more CPU to the system.



Monitoring backup performance with the DB2_BAR_STATS registry variable

Starting with DB2 V10.1 FixPack 2, every successful backup or restore operation prints statistics in the db2diag.log file. This feature also exists in DB2 V9.7, but must be enabled by using the DB2_BAR_STATS registry variable.

The meanings of the various columns are as follows:
  • BM. The db2bm EDU ID.
  • Total. Length of time that each EDU existed.
  • I/O. Time that was spent performing read or write I/O.
  • MsgQ. Time that was spent waiting for an I/O buffer.
  • WaitQ. Time that was spent waiting for a state machine control message.
  • Buffers. Number of I/O buffers that were processed.
  • Kbytes. Quantity of data that was processed.
  • MC. The db2med EDU ID.

If the backup was performed using the COMPRESS option, two additional columns are present:
  • Compr. Time that was spent performing the compression operation.
  • Compr Bytes. Quantity of uncompressed data that was compressed.

Example 1

The following example depicts a scenario in which the db2bm EDUs spend more than optimal time waiting on a free buffer:

Parallelism = 10
Number of buffers = 10
Buffer size = 16781312 (4097 4kB pages)

BM#
Total
I/O
MsgQ
WaitQ
Buffers
kBytes
---
--------
--------
--------
--------
--------
--------
000
11.66
8.17
3.02
0.37
31
460544
001
11.66
8.59
2.22
0.78
29
460544
002
11.66
8.21
3.36
0.00
35
567440
003
11.66
7.94
3.29
0.37
29
460544
004
11.66
7.00
3.30
1.27
29
460544
005
11.66
7.52
3.29
0.77
29
460544
006
11.66
7.12
3.55
0.91
29
460544
007
11.66
7.57
3.02
0.98
29
460544
008
11.66
7.78
3.20
0.60
29
461120
009
11.66
7.18
3.65
0.77
29
460928
---
--------
--------
--------
--------
--------
--------
TOT
116.66
77.12
31.95
6.87
298
4713296
MC#
Total
I/O
MsgQ
WaitQ
Buffers
kBytes
---
--------
--------
--------
--------
--------
--------
000
13.65
7.68
3.99
0.00
299
4867268
---
--------
--------
--------
--------
--------
--------
000
13.65
7.68
3.99
0.00
299
4867268

In this example, the db2bm EDUs spend 27% (31.95/116.66) of the time waiting on a free buffer (MsgQ). There are two potential options to address this issue:
  • Increase the number of buffers. The downside of this option is that it reduces the size of the buffers, and potentially result in more, smaller I/O operations.
  • Allocate more db2med EDUs, either by specifying more devices in the TO clause of the BACKUP DATABASE command, or by increasing the SESSIONS value.

Example 2

The following example depicts a scenario in which the db2bm EDUs spend more than optimal time compressing data:

Parallelism = 5
Number of buffers = 10
Buffer size = 16781312 (4097 4kB pages)
Compr
BM#
Total
I/O
Compr
MsgQ
WaitQ
Buffers
kBytes
kBytes
---
--------
--------
--------
--------
--------
--------
--------
--------
000
10.71
3.90
6.76
0.00
0.00
6
567440
569940
0
001
10.71
3.62
5.35
0.00
1.69
3
460544
460544
002
10.71
3.60
5.36
0.00
1.68
3
461504
461504
003
10.71
3.80
5.34
0.00
1.51
3
460544
460544
004
10.71
3.54
5.44
0.00
1.68
3
460544
460544
---
--------
--------
--------
--------
--------
--------
--------
--------
TOT
53.58
18.47
28.28
0.00
6.58
18
2410576
2413076
MC#
Total
I/O
MsgQ
WaitQ
Buffers
kBytes
---
--------
--------
--------
--------
--------
--------
000
10.79
0.37
10.36
0.00
19
278628
---
--------
--------
--------
--------
--------
--------
TOT
10.79
0.37
10.36
0.00
19
278628

In this example, the db2bm EDUs are spending 53% (28.28/53.58) of the time compressing data. One alternative is to use table-level compression so that the tables are stored in compressed format in the database. This option eliminates the need to do compression during every backup.


Example 3

The following example depicts a scenario in which the db2bm EDUs spend too much time in WaitQ:

Parallelism = 3
Number of buffers = 3
Buffer size = 16781312 (4097 4kB pages)

BM#
Total
I/O
MsgQ
WaitQ
Buffers
kBytes
---
--------
--------
--------
--------
--------
--------
000
12.13
0.05
0.00
12.03
3
576
001
12.12
11.59
0.10
0.03
175
2859008
002
12.12
0.58
0.22
11.29
7
106864
---
--------
--------
--------
--------
--------
--------
TOT
36.38
12.23
0.32
23.36
185
2966448
MC#
Total
I/O
MsgQ
WaitQ
Buffers
kBytes
---
--------
--------
--------
--------
--------
--------
000
13.45
4.47
7.67
0.00
186
3015424
---
--------
--------
--------
--------
--------
--------
TOT
13.45
4.47
7.67
0.00
186
3015424

In this example, 2 out of 3 db2bm EDUs spend almost all of the time in WaitQ. A high WaitQ value indicates that the db2bm EDU was idle for most of the backup. The number of Buffers processed is also heavily skewed to one db2bm EDU, which indicates that there was one table space in this database that was significantly larger than the rest. Redistributing data so that the table spaces are equally sized improves backup performance.


Related information

For more information, see the following web page:

IBM DB2 V10.5 for Linux, UNIX, and Windows documentation
https://www.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.kc.doc/welcome.html

Others who read this publication also read



Special Notices

This material has not been submitted to any formal IBM test and is published AS IS. It has not been the subject of rigorous review. IBM assumes no responsibility for its accuracy or completeness. The use of this information or the implementation of any of these techniques is a client responsibility and depends upon the client's ability to evaluate and integrate them into the client's operational environment.

Follow IBM Redbooks

Follow IBM Redbooks