IBM FlashSystem in OLAP Database Environments

IBM Redbooks Solution Guide

Abstract

IBM® FlashSystem ™ family is an enterprise-class flash storage platform that is ideal for delivering extreme performance with low latency for OLAP (On-line Analytical Processing) environments. Analytical processing is typically characterized by a relatively low volume of transactions, but with complex queries that often involve aggregating data from within a data warehouse to perform decision making and analysis. For OLAP systems, a low response time is a strong measure of effectiveness. This IBM Redbooks Solution Guide explains how IBM flash storage systems drives extreme performance, cost efficiencies, and enterprise reliability to satisfy the requirements of an OLAP implementation.

Changes in the latest update as of 07/10/14:
-Updated to include the latest FlashSystem product functionality.

Special thanks to Ilya Krutov who wrote the first version of this Redbooks Solution Guide.

Contents


IBM® FlashSystem ™ storage delivers high performance and efficiency in an easy-to-integrate offering so that businesses can more readily compete in today's high velocity marketplace. Extreme performance, IBM MicroLatency™, macro efficiency, and enterprise grade reliability make IBM FlashSystem a powerful and cost effective tool for accelerating Online Analytical Processing (OLAP) systems and gaining competitive advantage. Perhaps more importantly, the extraordinary capabilities and capacity of FlashSystem arrays enable commercial and governmental enterprises to address multiple compute challenges in current 24/7/365 operational environments while at the same time empowering growth and innovation into the future.

Figure 1 illustrates the value of the IBM FlashSystem storage infrastructure.


Figure 1. IBM FlashSystem storage infrastructure value

FlashSystem storage transforms the data center environment and enhances performance and resource consolidation to gain the most from business processes and critical applications. Examples of such processes and applications include online transaction processing (OLTP), business intelligence (BI), OLAP, virtual desktop infrastructures, high-performance computing, and content delivery solutions (such as cloud storage and video on demand). This guide focuses on IBM FlashSystem solutions for OLAP database environments.

Did you know?

FlashSystem arrays are some of the highest density solutions on the market, offering dozens of terabytes of usable storage capacity in only a few rack units of space. While providing up to 1.1 million I/Os per second (IOPS), they draw as low as 625 watts of power, making them extremely power efficient. FlashSystem products also offer enterprise-level availability and reliability with no single point of failure, multiple layers of data correction, and redundant hot swap components.

IBM has invested one billion dollars and established worldwide Flash Centers of Competency to help customers architect and implement flash-based solutions. FlashSystem arrays provide industry-leading performance, reliability, and MicroLatency. FlashSystem Enterprise Performance Solutions add the full spectrum of enterprise grade data management and feature rich storage services.

For the latest FlashSystem product details, see the IBM FlashSystem family product page at: http://www.ibm.com/storage/flash.

Business value

Data warehouses are commonly used with OLAP workloads in decision support systems such as financial analysis. Unlike OLTP, where transactions are typically relatively simple and deal with small amounts of data, OLAP queries are more complex and process larger volumes of data.

OLAP databases are normally separated from OLTP databases and tend to consolidate historical and reference information from multiple sources. Queries are submitted to OLAP databases to analyze consolidated data from different points of view to make better business decisions in a timely manner.

For OLAP workloads, a fast response time is critical to ensure that strategic business decisions can be made quickly in dynamic market conditions. Delays can significantly increase business and financial risks. Usually, decision making is stalled or delayed because of a lack of accurate, real-time operational data for analytics, which means missed opportunities for the following reasons:
  • Inability to gain insight into a business
  • Inability to predict business outcomes
  • Explosion of volume, variety, and velocity of information.

FlashSystem storage can help make businesses more agile and analytics-driven by providing up-to-the-minute analytics based on real-time data, not yesterday’s news.

OLAP delays come primarily from slow batch data loads and performance issues due to handling heavy, complex queries that use I/O resources. A common performance bottleneck in OLAP environments is the I/O required for reading massive amounts of data (frequently referred to as big data) from storage for processing OLAP database servers. The servers' ability to process this data is usually a non-factor because they typically have significant amounts of RAM and processing power, parallelizing tasks across their computing resources.

In general, customers might experience the following challenges in OLAP environments:
  • Slow query execution and response times, which delay business decision making
  • Dramatic growth in data, which requires deeper analysis.


FlashSystem storage can help to address these challenges in the following ways:
  • Dramatically boosting the performance of OLAP workloads with distributed scale-out architecture, providing almost linear and virtually unlimited performance and capacity scalability
  • Significantly improving response time for better and timely decision making.


Solution overview

An OLAP solution with FlashSystem storage consists of the following components:
  • Database servers (IBM System x® or IBM Power Systems™) to run data management software such as IBM DB2®, Microsoft SQL Server, or Oracle databases
  • FlashSystem storage to host the entire data set or partitioned subsets of data
  • A private network (such as 10 Gb Ethernet or QDR/FDR InfiniBand) used to provide high-speed connectivity across database servers in a cluster
  • A storage area network (SAN) used to provide connectivity across database servers and storage systems.
    Supported FlashSystem interfaces include Fibre Channel(FC), iSCSI, InfiniBand(IB), and Fibre Channel over Ethernet (FCoE). For the product details of supported platforms and interfaces, see the IBM System Storage® Interoperation Center (SSIC): http://ibm.com/systems/support/storage/ssic.
IBM DB2 for Linux, UNIX, and Windows is the database of choice for robust, enterprise-wide solutions that handle high-volume workloads. It is optimized to deliver industry-leading performance while lowering costs. IBM servers that run DB2 are proven performance leaders. DB2 uses and optimizes multiple threads automatically, with no change to applications. The unique clustering design of DB2 provides near-linear scalability, continuous availability, and simplified management.

IBM System X6 servers featuring exclusive IBM X-Architecture® innovations and Intel® Xeon® Processor E7 v2 families can help to meet business challenges with revolutionary levels of processor and storage performance, memory capacity, scalability, and reliability. In organizations in every industry, big data and analytics workloads deliver the actionable insight needed to drive faster decision making. Systems such as IBM System x3850 x6 can be confidently deployed to run business mission-critical applications, decrease operating costs, and support cloud computing plans.

Ideally suited for compute-intensive workloads, IBM Power Systems deliver leadership performance and scalability. An integrated approach to the design, development, and testing of each IBM POWER® server, blade, or compute node ensures the resiliency required for today’s IT infrastructure. All Power Systems server models include innovative reliability, availability, and serviceability features that help avoid unplanned downtime. And with capacity on demand, hot-node add, and IBM Active Memory™ Expansion, Power Systems enterprise servers ensure that applications remain available, even as capacity is added to handle new business demands.

IBM FlashSystem solutions provide multiple options for addressing the low latency / high IOPS requirements of OLAP systems and increasing the effectiveness of analytics-driven computing environments.

Four key differentiators set IBM FlashSystem apart from other flash storage platforms:
  • First, FlashSystem architecture is designed with MicroLatency to speed response times, delivering data reads and writes in the 100 microsecond range.
  • Among the engineering objectives of FlashSystem is a focus on Extreme Performance. In addition to an obsession with low latency, IBM FlashSystem engineers also optimized IOPS and bandwidth.
    The resulting extreme performance ensures that as OLAP workloads increase, FlashSystem continues to scale performance without latency degradation. Whether supporting a single application that needs to handle high numbers of concurrent users or multiple applications with diverse workloads, FlashSystem extreme performance translates into performance scalability and better business results.
  • FlashSystem is optimized to provide Macro Efficiency through compact physical capacity, low energy consumption, and greater utilization of existing resources. The arrays are some of the highest density solutions on the market, offering dozens of terabytes of usable storage capacity in only a few rack units of space. While providing up to 1.1 million IOPS, they draw as low as 625 watts of power, making them extremely power efficient.
  • A key FlashSystem pillar is Enterprise Reliability. The system employs eMLC NAND flash plus two RAID dimensions. IBM’s patented Variable Stripe RAID™ at the flash module level as well as system-wide RAID, resulting in more data protection levels than are available from competing systems. FlashSystem has no single point of failure, plus its design enables rapid servicing because all hot swappable and redundant components (including flash modules, power supplies, fans, batteries, and canisters) are accessible from the front or back of the system. In addition, software and firmware updates can be completed with the system up and running.

Finally, FlashSystem Enterprise Performance Solutions offer a wide range of advanced storage services such as snapshots, data compression, and replication. And for business or governmental customers with data that requires an extra layer of protection for adherence to internal or regulatory requirements, FlashSystem supports AES 256 hardware-based encryption for data at rest. Figure 2 shows the latest FlashSystem family of products.


Figure 2. IBM FlashSystem Products

Solution architecture

IBM FlashSystem solutions for OLAP use a distributed server and storage scale-out approach. This approach satisfies high bandwidth requirements and provides unlimited performance and capacity growth capabilities, matching the growing volumes of data that is being processed.

In this solution, FlashSystem arrays are connected to the database hosting platforms by using a Fibre Channel SAN. Server hosts, or nodes, are interconnected with the isolated high-speed network (such as 10 Gb Ethernet with IBM RackSwitch™ G8124E) that is used for the inter-node data exchange. Each node runs a copy of the OLAP database application, and the analyzed data set is partitioned and distributed across the storage systems. Depending on the database management software that is used and its architecture, each node might have access to only a certain portion of data, or all nodes can have access to all data that is stored on the external storage. OLAP queries are distributed across nodes and processed in parallel. Figure 3 illustrates this architecture.


Figure 3. Scale-out OLAP architecture

This solution can scale seamlessly by adding more FlashSystem arrays and server nodes. In such cases, storage capacity and I/O bandwidth are incremented linearly with the increasing number of storage devices, which can help to eliminate storage I/O bottlenecks in OLAP workloads.


Usage scenarios

OLAP applications can be used for risk assessment, business intelligence and reporting, exploration and visualization, predictive analytics, and other similarly profiled industry and functional applications. For example, consider a data warehouse solution that can store and process 160 TB of data. Only a few FlashSystem arrays are needed to store the entire data set with plenty of bandwidth.

Integration

An excellent example of a scale-out approach in the OLAP environment involves IBM InfoSphere Warehouse. InfoSphere Warehouse is powered by the DB2 for Linux, UNIX, and Windows data server. With its massively scalable, shared-nothing architecture, DB2 provides high performance for mixed-workload query processing of relational and basic XML data. Such advanced features as database and table partitioning, compression, multidimensional clustering (MDC), materialized query tables (MQT), and OLAP capabilities make DB2 a powerful engine for operational warehousing.

InfoSphere Warehouse provides advanced capabilities for database partitioning, so that IT users have multiple ways to distribute data across servers for large-scale parallelism and linear scalability. The shared-nothing architecture of DB2 helps ensure that performance will not degrade as the warehouse grows. Also, because InfoSphere Warehouse can physically cluster data in multiple dimensions, order data by value range, and limit I/O to relevant data partitions, it helps reduce the work that is needed to resolve many queries.

The architecture of the InfoSphere Warehouse database server solution with FlashSystem arrays is shown in the following figure.


Figure 4. InfoSphere Warehouse database server solution with FlashSystem arrays

InfoSphere Warehouse transparently splits the database across multiple partitions stored on FlashSystem logical volumes, and it uses the computing power of multiple servers to satisfy requests for large amounts of information. SQL statements are automatically decomposed into subrequests that are run in parallel across each database partition. Results of the sub requests are joined to provide final results.

Supported platforms

IBM FlashSystem supports a wide range of operating systems (Windows Server 2008 and 2012, Linux, and IBM AIX®), hardware platforms (System x, Power Systems, and x86 servers not from IBM), HBAs, and SAN fabrics. For specific information, see the System Storage Interoperation Center (SSIC): http://ibm.com/systems/support/storage/ssic


Ordering information

For FlashSystem ordering information, see the following IBM Redbooks® Product Guides:
http://www.redbooks.ibm.com/abstracts/tips1158.html

    Related information

    For more information, see the following documents:
    http://www.redbooks.ibm.com/abstracts/sg248189.html?Open

    Special Notices

    This material has not been submitted to any formal IBM test and is published AS IS. It has not been the subject of rigorous review. IBM assumes no responsibility for its accuracy or completeness. The use of this information or the implementation of any of these techniques is a client responsibility and depends upon the client's ability to evaluate and integrate them into the client's operational environment. publib-b.boulder.ibm.com

    Profile

    Publish Date
    21 March 2013

    Last Update
    14 July 2014


    Rating:
    (based on 2 reviews)


    Author(s)

    IBM Form Number
    TIPS0974