An Apache Hadoop infrastructure offers great promise for drastically reducing the costs of storing and processing large data volumes and increasing ROI. However, Hadoop cannot deliver the opportunities by using only its infrastructure. Success with Hadoop requires effectively managing data movement, data transformation and integration, data cleansing, data governance, data security, data privacy, and data analytics and reports.
Many organizations are considering implementing a data lake solution. This is a set of one or more data repositories that have been created to support data discovery, analytics, ad hoc investigations, and reporting. Without proper management and governance, a data lake can quickly become a data swamp. IBM® proposes an enhanced data lake solution that is built with management, affordability, and governance at its core. This solution is known as a data reservoir. A data reservoir provides the right information about the sources of data available in the lake and their usefulness in supporting users who are investigating, reporting, and analyzing events and relationships in the reservoir.
This IBM Redbooks® Solution Guide introduces IBM InfoSphere® Information Server, which provides an integrated set of tools that are built to handle the extreme throughput and governance required by today’s demanding business enterprises. It addresses the practical realities of managing the data integration tasks that are required for success with Hadoop. Managing these data integration tasks effectively in the Hadoop environment is one critical step in supporting a data reservoir instead of creating a data swamp.
An Apache Hadoop infrastructure can reduce the costs of storing and processing large data volumes. By investing in Hadoop, organizations can slash IT spending while creating significant return on investment (ROI) by using enhanced analytics and reporting that were not previously possible. However, the Hadoop infrastructure alone might not deliver the promised reduced cost or the increased ROI that flows from better reporting and analytics.
This material has not been submitted to any formal IBM test and is published AS IS. It has not been the subject of rigorous review. IBM assumes no responsibility for its accuracy or completeness. The use of this information or the implementation of any of these techniques is a client responsibility and depends upon the client's ability to evaluate and integrate them into the client's operational environment.