Skip to main content

IBM BigInsights Security Implementation: Part 1 Introduction to Security Architecture

Web Doc

thumbnail 

Published on 25 August 2016

  1. View in HTML
  2. .PDF (0.2 MB)

Share this page:   

IBM Form #: TIPS1340


Authors: Bharath Devaraju, Shankar Kuchibhotla and Nisanth Simon

    menu icon

    Abstract

    Big data analytics involves processing large amounts of data that cannot be handled by conventional systems. The IBM® BigInsights® platform processes large amounts of data by breaking the computation into smaller tasks that can be distributed onto several nodes. As this platform is shared by users in different roles (developers, analysts, data scientists, and testers), it introduces the challenge of provisioning access and authorization to the cluster and securing the data.

    Big data platforms are an amalgamation of several individual components that are still evolving, and are based on the challenges and requirements that are dictated by the open source community. These components are developed in isolation by independent teams with no forethought of integrating them in a secure way, which results in individual components defining and exposing their own security policies for data and access protection. This inherent lack of single security policy enforcement in big data platforms can be challenging and overwhelming.

    This IBM Redbooks® Analytics Support web doc introduces a reference security architecture for the IBM BigInsights solution that is in line with current industry practices. It can be used as a reference document for solution architects and solution implementers. This document applies to IBM BigInsights Version 4.2 and later.

    Contents

    Big data analytics involves processing large amounts of data that cannot be handled by conventional systems. The IBM BigInsights® platform processes large amounts of data by breaking the computation into smaller tasks that can be distributed onto several nodes. As this platform is shared by users in different roles (developers, analysts, data scientists, and testers), it introduces the challenge of provisioning access and authorization to the cluster and securing the data.

    Big data platforms are an amalgamation of several individual components that are still evolving, and are based on the challenges and requirements that are dictated by the open source community. These components are developed in isolation by independent teams with no forethought of integrating them in a secure way, which results in individual components defining and exposing their own security policies for data and access protection. This inherent lack of single security policy enforcement in big data platforms can be challenging and overwhelming.

    This IBM Redbooks® Analytics Support web doc introduces a reference security architecture for the IBM BigInsights solution that is in line with current industry practices. It can be used as a reference document for solution architects and solution implementers. This document applies to IBM BigInsights Version 4.2 and later.


    Security aspects to consider when designing the security architecture for IBM BigInsights

    Securing an IBM BigInsights cluster involves addressing four main security aspects, which are shown in Figure 1:

    • Secure Perimeter
    • Secure Data
    • Access Management
    • Audit

    This figure shows the four aspects of IBM BigInsights security
    Figure 1. Four aspects of IBM BigInsights security

    A secure perimeter can be enforced in the following ways:
    • By authenticating users against LDAP and Kerberos
    • By protecting HTTPS access through the Apache Knox security gateway
    • By isolating the data nodes in a secure private network

    Secure data can be accomplished in the following ways:
    • By using Hadoop transparent encryption with Apache KMS (Key Management Server)
    • By using IBM BigSQL data masking
    • By enabling SSL and TLS support for components to secure the data transfers

    Access management should be enforced at several levels:
    • At the job level by using Yet Another Resource Scheduler (YARN) job-queue-based access control.
    • By using SQL access privileges for SQL access of Hadoop data.
    • By using ACL- based access control for Hadoop Distributed File System (HDFS) files.

    Audits and reporting are provided by the following items:
    • By using light-weight monitoring that uses Java Management Extensions (JMX)
    • By using IBM Security Guardium® Data Activity Monitor

    Figure 2 shows a high-level design of a secure IBM BigInsights cluster. It highlights various components that are the building blocks of a big data cluster architecture design.

    This figure shows the IBM BigInsights high-level design
    Figure 2. IBM BigInsights high-level design

    Note the following items in Figure 2:
    • An IBM BigInsights cluster can span over two networks: Public and private networks. The communication between the two networks occurs through an edge node (1).
    • This edge node has the client components (2) for all the master services that are deployed in the cluster so that users can connect and perform analytics and administration. Access to administration and analytic tools is enforced through Ambari user management and the Knox gateway (3).
    • Data encryption protects user data from unauthorized access and enforces industry security standards. Data can be encrypted at rest and while it is being transferred over the network. Encryption at rest is performed in two ways: By using Hadoop transparent data encryption and by using IBM Security Guardium Data Encryption.

    • Hadoop transparent data encryption uses the key management server (KMS), which holds encryption and decryption keys. (4)
    • IBM Security Guardium Data Encryption deploys agents on all nodes to perform encryption and decryption of data. The IBM Security Guardium server monitors and enforces encryption and decryption policies and rules on the agents. (5)
    The data transfers over the network are secured by configuring services to use SSL and TLS certificates.
    • Similar to the Linux file system, Hadoop Distributed File System (HDFS) also provides fine-grained user access control by using file system access control lists (ACLs) (6). Big SQL and Hive provide Grant and Revoke commands to authorize users to perform certain operations (7).
    • IBM Security Guardium Data Activity Monitor provides monitoring and auditing capabilities that you can use to integrate seamlessly Hadoop data protection into your existing enterprise data security strategy. HDFS has its own auditing mechanism that captures all the file system activities (8).

    Designing the security architecture for IBM BigInsights products involves the features that are provided by individual components and a holistic approach that involves securing data, users, and functions from possible vulnerabilities.


    Acknowledgements

    Thanks to Mohan Dani, IBM BigInsights software developer, for his contributions to this project.


    Related publications

     

    Others who read this also read

    Special Notices

    The material included in this document is in DRAFT form and is provided 'as is' without warranty of any kind. IBM is not responsible for the accuracy or completeness of the material, and may update the document at any time. The final, published document may not include any, or all, of the material included herein. Client assumes all risks associated with Client's use of this document.