Skip to main content

InfoSphere DataStage Parallel Framework Standard Practices

An IBM Redbooks publication

Note: This is publication is now archived. For reference only.


Published on 30 July 2010, updated 12 February 2013

  1. .EPUB (8.6 MB)
  2. .PDF (6.9 MB)

Google Play Books

Share this page:   

ISBN-10: 0738434477
ISBN-13: 9780738434476
IBM Form #: SG24-7830-00

Authors: Julius Lerm and Paul Christensen

    menu icon


    In this IBM® Redbooks® publication, we present guidelines for the development of highly efficient and scalable information integration applications with InfoSphere™ DataStage® (DS) parallel jobs.

    InfoSphere DataStage is at the core of IBM Information Server, providing components that yield a high degree of freedom. For any particular problem there might be multiple solutions, which tend to be influenced by personal preferences, background, and previous experience. All too often, those solutions yield less than optimal, and non-scalable, implementations.

    This book includes a comprehensive detailed description of the components available, and descriptions on how to use them to obtain scalable and efficient solutions, for both batch and real-time scenarios.

    The advice provided in this document is the result of the combined proven experience from a number of expert practitioners in the field of high performance information integration, evolved over several years.

    This book is intended for IT architects, Information Management specialists, and Information Integration specialists responsible for delivering cost-effective IBM InfoSphere DataStage performance on all platforms.

    Table of Contents

    Chapter 1. Data integration with Information Server and DataStage

    Chapter 2. Data integration overview

    Chapter 3. Standards

    Chapter 4. Job parameter and environment variable management

    Chapter 5. Development guidelines

    Chapter 6. Partitioning and collecting

    Chapter 7. Sorting

    Chapter 8. File Stage usage

    Chapter 9. Transformation languages

    Chapter 10. Combining data

    Chapter 11. Restructuring data

    Chapter 12. Performance tuning job designs

    Chapter 13. Database Stage guidelines

    Chapter 14. Connector Stage guidelines

    Chapter 15. Batch data flow design

    Chapter 16. Realtime data flow design

    Appendix A. Runtime topologies for distributed transaction jobs

    Appendix B. Standard practices summary

    Appendix C. DataStage naming reference

    Appendix D. Example job template

    Appendix E. Understanding the parallel job score

    Appendix F. Estimating the size of a parallel dataset

    Appendix G. Environment variables reference

    Appendix H. DataStage data types


    Others who read this also read