Data Integration in the Big Data World Using IBM InfoSphere Information Server

Web Doc


Published on March 30, 2015

  1. View in HTML
  2. .PDF (0.9 MB)

Share this page:   

IBM Form #: TIPS1265

Authors: Rob Utzschneider

    menu icon


    An Apache Hadoop infrastructure offers great promise for drastically reducing the costs of storing and processing large data volumes and increasing ROI. However, Hadoop cannot deliver the opportunities by using only its infrastructure. Success with Hadoop requires effectively managing data movement, data transformation and integration, data cleansing, data governance, data security, data privacy, and data analytics and reports.

    Many organizations are considering implementing a data lake solution. This is a set of one or more data repositories that have been created to support data discovery, analytics, ad hoc investigations, and reporting. Without proper management and governance, a data lake can quickly become a data swamp. IBM® proposes an enhanced data lake solution that is built with management, affordability, and governance at its core. This solution is known as a data reservoir. A data reservoir provides the right information about the sources of data available in the lake and their usefulness in supporting users who are investigating, reporting, and analyzing events and relationships in the reservoir.

    This IBM Redbooks® Solution Guide introduces IBM InfoSphere® Information Server, which provides an integrated set of tools that are built to handle the extreme throughput and governance required by today’s demanding business enterprises. It addresses the practical realities of managing the data integration tasks that are required for success with Hadoop. Managing these data integration tasks effectively in the Hadoop environment is one critical step in supporting a data reservoir instead of creating a data swamp.



    Others who read this also read

    Special Notices

    This material has not been submitted to any formal IBM test and is published AS IS. It has not been the subject of rigorous review. IBM assumes no responsibility for its accuracy or completeness. The use of this information or the implementation of any of these techniques is a client responsibility and depends upon the client's ability to evaluate and integrate them into the client's operational environment.