V5R4 of IBM i5/OS provides a new choice for applications that need to enhance their transactions. It allows both batch jobs that employ commitment control and interactive transactions to jump on the super highway and capitalize on the performance benefits that are associated with journal caching. In the past, such caching benefits only applied to applications that refrained from employing commitment control. That restriction can now be lifted if you choose to employ the “soft” commit option. In some environments, the resulting performance boosts can be impressive.
Written by Larry Youngren, Software Engineer
IBM Systems &Technology Group, Development
Development iSeries Journaling
System i High Availability -
SLIC Journal and Commitment Control
A little history
A few years ago, an IBM Client engaged our help, in which they visited our lab and ran their batch job in the lab. In this particular case, the Client's batch job processed so many accounts and ran so long that restarting the batch job from the beginning, if trouble erupted half-way through, was highly undesirable. In an effort to avoid restarting the entire multi-hour batch job from the beginning, the client modified the batch job so that it grouped multiple database row updates into miniature commit transactions. By doing so, each completed transaction represented a recognizable restarting point. Having completed one transaction, the batch job moved on to the next mini-transaction.
The Client also capitalized on another somewhat underused transaction feature, the user-supplied commit description. This type of commit is a text string that becomes associated with the completed transaction and can be viewed from the underlying journal.
In effect, the commit description served as a marker, signifying progress. The Client gave each completed commit transaction a distinct and readily recognizable text string. If their batch job terminated prematurely, instead of restarting the batch job from the beginning, the Client examined the journal contents to deduce the most recent successfully completed transaction (as evidenced by the final commit description) and started the job anew from that point. This technique represented a creative approach and certainly served the Client's purpose from the perspective of not having to re-run the entire batch job if it ran into a data problem half-way through.
What is the problem?
As the Client's business grew, so did the quantity of the mini-transactions to be handled each night. Each transaction brought along with it a fair amount of performance overhead. Why? Because each time a transaction completed, the commitment control mechanism issued a corresponding request to write all tentative journal entries to disk.
Let us say that the Client had 10 million transactions to process and each of these scheduled a disk write that took 2 milliseconds to service. That means that they incurred an extra 5 and ½ hours of elapsed batch time.
At that point, the Client thought it might try one more trick. Knowing that the optimum for disk writes is 128 KB at a time, the Client turned to the software Journal caching option hoping for it to reduce the total quantity of disk writes that ensued by refraining from scheduling a disk write, despite the presence of commitment control, until a full 128 KB of journal entries was assembled in main memory (with hopes of ending up with multiple transactions strung together that would ride-share to disk).
The Client's excitement and expectation turned to disappointment when the use of the JRNCACHE(*Yes) option on Change Journal (CHGJRN) command did not deliver the performance boost that was anticipated. Why? The explanation takes some thought about the traditional semantics associated with ordinary commitment control. While commitment control was a good vehicle for achieving restartability for the batch job, it came at a price. That is because semantically commitment control has traditionally viewed itself as providing three properties: atomicity, consistency, and durability.
Atomicity, consistency, and durability
Atomicity refers to the sense that if one row within the designated transaction is present and survives, so do all the rest of the rows designated as part of the same transaction. Consistency refers to the idea that related changes, even if they span tables, are present (never have surviving detail records without the matching master record). Durability refers to the notion that if control returned to the job that issued the commit verb, then the transaction is safely out of memory and present on disk.
With these three properties in mind, we recommend that you consider sacrificing some durability if you can reap substantial performance benefits. In fact, for this Client, making such a sacrifice on durability was acceptable. It mattered little (once the data problem was corrected) whether the batch job was restarted at transaction #3255 or #3256, as long as there was confidence that all preceding transactions had atomicity and consistency.
The problem was that by instituting these periodic checkpoints, corresponding periodic synchronous waits for disk writes of the completed transaction were introduced into the batch job. As a consequence, the batch job now ran noticeably longer. With instantaneous durability enabled (how traditional “hard” commit behaves), the end of each commit transaction (no matter how small) waits for the matching set of tentative database changes to be written to disk. These waits were slowing down the batch job.
What are you 'waiting' for?
Let us think about what’s going on. When there is only one singular batch job and you issue a commit verb every 10 rows, then your job pauses and waits once out of every ten database updates. That is a performance penalty that you might want to forgo. Durability is the source of these waits.
For interactive jobs, these periodic waits for instantaneous durability are generally even more dramatic. Imagine that we had 10 interactive jobs that all ran at the same time and shared the same underlying journal. Each interactive job pauses every time it modifies 10 new rows. The fact that all of these jobs are pouring their journal entries intermixed into the same shared main memory journal buffer means that, on average, every time one of the ten jobs is willing to let tentative changes linger in main memory, a neighboring job has just counted to 10 and is insisting that the shared buffer be flushed to disk. The net is that journal caching (a powerful lever in a non-commit environment) finds that the presence of short duration commit cycles can thwart the benefits of journal caching. The problem is that the cache fills little before one of the jobs that shares the cache insists that its contents be written to disk. This frequent buffer flushing that is triggered by the durability property of traditional commitment control makes it difficult to gain any speed.
For that reason, based upon this Client's experience, we began to experiment with the idea of offering an option by which applications that are willing to sacrifice some instantaneous durability can do so. The result can often be an impressive performance gain, increasing the quantity of transactions that can be processed per minute.
Examples of performance gains
We recently set up a simulated retail sporting goods store environment on an IBM eServer iSeries model 840 and ran 100 000 transactions (a mixture of row inserts and updates to underlying SQL tables) with journal caching enabled. We then looked at the elapsed time to execute these back-to-back transactions with and without soft commit. The result? Soft commit definitely gave us reduced elapsed time, illustrating its performance benefit (see the following figure).
We then probed deeper to look at the quantity of scheduled disk writes that are requested both with and without soft commit. As shown in the following figure, the difference is even more startling.
While both graphs show encouraging results, why do they foretell different degrees of benefit? To understand the difference (a lot of reduction in asynchronous-type disk writes that are scheduled on behalf of journal or commit but a smaller degree of reduction in job duration), we must admit that there is a secondary factor at work that also plays a role in “softening” the disk-write intensive nature of short transactions. That second factor has a sufficient quantity of input/output adapter (IOA) write cache. Machines with the capacity to absorb additional disk writes see less benefit from use of soft commit because they are not (yet) up against the wall. Those systems that already are struggling to cope with a high degree of disk traffic might find use of soft commit more beneficial.
Soft commit to the rescue
This better performing commitment control choice is new for i5/OS in V5R4. It is enabled by employing an environment variable. Like most environment variables, it can be enabled either with a system-wide scope, which thereby affects all jobs, or more selectively scoped only to the currently executing job. The choice is yours.
The matching environment variable specification looks something like the following example:
ADDENVVAR ENVVAR(QIBM_TN_COMMIT_DURABLE) VALUE(*NO)
By specifying a value of *NO for COMMIT_DURABLE, you direct the operating system to be free to employ journal caching despite the presence of commit transactions. Therefore, if the underlying journal has a setting of JRNCACHE(*YES) and your durability property is set to *NO, the commit transactions become “soft”. That is, they allow the caching behavior to override the durability desire.
The result is that fewer disk writes are scheduled on behalf of the journal. Instead, multiple small commit transactions are cached in a shared main memory buffer until the 128 KB-wide cache becomes full or until sufficient time has elapsed (about 30 seconds) at which time all of the bundled commit transactions are written to disk in unison. As a result, each disk write becomes wider and more productive. And this leads generally to better batch job performance.
Obviously this soft commit technique is best for instances (such as the batch job described previously) where many back-to-back narrow commit transactions are arriving and waiting around a few extra seconds to ride-share to disk makes sense.
Remember that by choosing to sacrifice instantaneous durability, you are not giving up any atomicity or consistency. All surviving transactions are whole. What you are willing to risk is whether the last few transactions that were processed reach disk before your application moves on to the next transaction. Therefore, this technique is not for everyone. But where it makes sense (as for our Client's resumable batch job), it can be a powerful performance booster.
This material has not been submitted to any formal IBM test and is published AS IS. It has not been the subject of rigorous review. IBM assumes no responsibility for its accuracy or completeness. The use of this information or the implementation of any of these techniques is a client responsibility and depends upon the client's ability to evaluate and integrate them into the client's operational environment.