Journal Caching: Understanding the Risk of Data Loss
Published 27 September 2006, updated 02 October 2006
Authors: Hernando Bedoya
This Technote answers the following questions that you might have about the relationship between remote journal and journal caching:
- What do they have in common?
- Can they coexist?
- Which combinations make sense?
- How do performance and risk of data loss calculate into the trade-offs?
Written by Larry Youngren, Software Engineer
IBM Systems &Technology Group, Development
Development iSeries Journaling
To cache or not to cache
You may know that performance can be improved by enabling the IBM i5/OS™ version of journal caching. You might also realize that when caching is enabled, both the journal entries and the matching database changes linger in main memory slightly longer and introduce a modest increased risk of loss if the machine crashes. Since you are concerned about high availability, can you count on the remote journal function to mitigate that risk of loss in a cached environment?
This is a common question that many users ask. To explain the answer, we must first provide some background.
The remote journal function has two modes of operation:
- Sync mode attempts to refrain from caching, thereby assuring that newly arriving journal entries are sent to the target side as rapidly as possible. In doing so, it sacrifices some performance.
- Async mode welcomes caching and even has a “super-bundling” mode that is aimed at attempting to improve performance by bunching together larger quantities of journal entries that are headed to a remote target machine.
Journal caching is a behavior that you can specify for a local journal via the JRNCACHE(*YES) keyword on the Change Journal (CHGJRN) command, provided that you have installed Option 42 of i5/OS.
The presence of journal caching attempts to refrain from emptying main memory until a sufficient sized string of consecutive journal entries has been assembled. The wider this string of adjacent entries is, the more efficient the trip is to disk and the fewer total disk writes there are that must ensue. All of this helps to reduce the overhead associated with local journaling. What lingers in main memory until written to disk is also precisely what constitutes a journal bundle (a set of consecutive journal entries) and therefore what is sent in unison across a communication wire to a distant machine via the remote journal technology.
This caching behavior and associated reduction in quantity of disk writes is especially attractive during the execution of batch jobs, which tend to make a lot of database changes and produce a corresponding flood of journal entries at a torrid rate.
The new sweeper task
Such journal entries remain cached in main memory a bit longer than is generally true if caching was not enabled. Therefore, it is natural to consider the slight increased risk that a few such journal entries and matching database changes might be lost if the machine crashes without writing out the contents of main memory.
To address this concern, OS/400 V5R3 and i5/OS V5R4 rely upon a new background microcode sweeper task. This sweeper task helps to limit the quantity of time that cached journal images are allowed to linger in main memory. While V5R3 introduced this sweeping behavior, V5R4 makes the sweeping task more aggressive and effective by further limiting the quantity of time that such cached images can linger.
The purpose of sweeping is aimed at assuring that, during more dormant periods, when the rate of arrival of new journal entries has slowed, the operating system monitors the degree to which the surrounding cache buffer has been filled. The operating system also tracks the quantity of time that such entries have lingered in main memory (that is the new behavior that began in V5R3 and has been enhanced in V5R4).
By doing so, i5/OS assures that customers who elect to enable journal caching can experience the performance benefits afforded by caching during busy periods of the day without significantly increasing their latency and risk of loss during more dormant periods of the day. In effect, you can have the best of both worlds.
In addition, in a remote journal environment, such caching on the source side has a secondary potential impact: those journal entries that linger in the cache have obviously not yet left the source machine and hence are not yet on the target system. Some users think of such entries as “trapped” transactions. They are transactions that would not be visible (yet) on the target side if the source machine went down.
Which has more control
We must clear up some potential confusion. Journal caching influences the timeliness of writing journal entries to disk.
The variety of Remote Journal mode that you designate influences the timeliness of sending journal images from the source machine to the target machine. When both caching and remote journal are present, caching has more control.
If you choose to employ caching on the source side, you influence the timeliness of transport of journal entries to the target side. That is, until the cache on the source side is ready to be emptied, the remote journal layer of software does not even “see” the journal entries to be transported. Therefore, caching clearly has more control. This is not necessarily bad, because in an async type remote journal environment, caching is probably precisely the right choice.
Sync mode versus async mode
Caching makes the most sense for shops that can afford to re-enter a few recent transactions. These are the same shops that similarly might witness a few recent transactions that are not yet transported to the target machine via the async variety of remote journal. Thus both journal caching and the async mode of remote journal introduce the same sort of risk. Also both are aimed at a similar objective, which is to improve performance by allowing a set of recent journal entries to remain a bit longer in main memory.
By contrast, the sync mode of remote journal places immediate transmission at a higher priority than performance. Shops that absolutely need the sync mode of remote journal should probably forgo the performance benefits that caching affords and, therefore, refrain from employing journal caching. Shops that decide that performance is the most important and critical factor for them should instead enable journal caching and employ the async mode of remote journal.
Choosing the middle ground of using the async mode of remote journal support but refraining from using journal caching on the source machine is probably an unwise choice. It sacrifices both performance on the source side as well as timely transmission to the target side. Instead, those who elect to employ async flavored remote journal transmission should go the full extent and enable journal caching on the source side as well, thereby capitalizing on both performance enhancing features. The even more ludicrous choice is to select the sync mode of remote journal transmission coupled with journal caching on the source side. Doing so, only slows the remote journal transmission timeliness. Notice that we are talking about source side caching, not target side caching. Caching on the target side frequently makes sense in a remote journal environment regardless of which remote journal transmission choice is employed.
Therefore we recommend that you use either of the following choices as the wisest combination:
- Async Remote Journal with JRNCACHE(*YES) on the source side
- Sync Remote Journal with no such caching on the source side
This material has not been submitted to any formal IBM test and is published AS IS. It has not been the subject of rigorous review. IBM assumes no responsibility for its accuracy or completeness. The use of this information or the implementation of any of these techniques is a client responsibility and depends upon the client's ability to evaluate and integrate them into the client's operational environment.
Follow IBM Redbooks
Follow IBM Redbooks