In this IBM® Redbooks® publication, we present guidelines for the development of highly efficient and scalable information integration applications with InfoSphere™ DataStage® (DS) parallel jobs.
InfoSphere DataStage is at the core of IBM Information Server, providing components that yield a high degree of freedom. For any particular problem there might be multiple solutions, which tend to be influenced by personal preferences, background, and previous experience. All too often, those solutions yield less than optimal, and non-scalable, implementations.
This book includes a comprehensive detailed description of the components available, and descriptions on how to use them to obtain scalable and efficient solutions, for both batch and real-time scenarios.
The advice provided in this document is the result of the combined proven experience from a number of expert practitioners in the field of high performance information integration, evolved over several years.
This book is intended for IT architects, Information Management specialists, and Information Integration specialists responsible for delivering cost-effective IBM InfoSphere DataStage performance on all platforms.
Table of contents
Chapter 1. Data integration with Information Server and DataStage
Chapter 2. Data integration overview
Chapter 3. Standards
Chapter 4. Job parameter and environment variable management
Chapter 5. Development guidelines
Chapter 6. Partitioning and collecting
Chapter 7. Sorting
Chapter 8. File Stage usage
Chapter 9. Transformation languages
Chapter 10. Combining data
Chapter 11. Restructuring data
Chapter 12. Performance tuning job designs
Chapter 13. Database Stage guidelines
Chapter 14. Connector Stage guidelines
Chapter 15. Batch data flow design
Chapter 16. Realtime data flow design
Appendix A. Runtime topologies for distributed transaction jobs
Appendix B. Standard practices summary
Appendix C. DataStage naming reference
Appendix D. Example job template
Appendix E. Understanding the parallel job score
Appendix F. Estimating the size of a parallel dataset
Appendix G. Environment variables reference
Appendix H. DataStage data types