File and Object Storage

 View Only

Hadoop Storage Tiering with IBM Spectrum Scale

By Pallavi Galgali posted Wed May 09, 2018 09:40 PM

IBM Spectrum Scale has been certified with Hortonworks Data Platform (HDP) since June 2017. Some of key reasons why enterprises are finding this integration invaluable are:
• Up to 60% smaller storage footprint.
• Best-in-class in-place analytics that allow both POSIX and HDFS access to the same data, enabling super-fast POSIX ingest into the data lake.
• Ability to grow storage independently of compute with proven enterprise storage features like active-active disaster recovery, backup etc.
• Ability to eliminate silos and build a single common data lake between Hadoop and non-Hadoop applications (like SAS grids, enterprise data warehouses etc.).

For customers adopting IBM Spectrum Scale and IBM Elastic Storage Server (pre-integrated solution powered by IBM Spectrum Scale software) with Hortonworks Hadoop/Spark solution, a key requirement has been the ability to add IBM Elastic Storage Server (ESS) into an existing HDP cluster. This eliminates the need to set up a separate HDP cluster to leverage the benefits of IBM ESS. We are now announcing Hadoop Storage Tiering with IBM Spectrum Scale to address this requirement. Enterprises that already have a standard HDP cluster with native HDFS can now add ESS as a storage tier in the same HDP cluster. This will help enterprises manage cluster sprawl by adding ESS-based shared storage to their existing HDP clusters.

Here are some ways to leverage this new feature:

As an ingest tier for faster ingest
Enterprises can use IBM Spectrum Scale POSIX support with flash-based IBM ESS to get super-fast ingests for their existing Hadoop data lakes.

As a secondary tier with shared storage
Enterprises can use IBM ESS as a secondary tier in their existing Hadoop data lakes. This enables them to grow storage independently of compute and also eliminates the need for three-way replication. The key benefit is the ability to run analytics directly on the secondary tier without having to bring the data into the primary tier.

For data sharing between clusters
If an enterprise wants to build a new analytics workflow on a new HDP cluster, but also needs access to the data from an existing HDP cluster, the tiering feature can enable this without having to create data copies. IBM ESS can be used as a secondary tier for the existing HDP cluster. And the same ESS can act as the storage for a new HDP cluster: for example, some of our customers are considering this scenario to introduce new IBM Power-based HDP clusters for demanding next gen analytics workflows.

Hadoop Storage Tiering with Spectrum Scale is supported with IBM Spectrum Scale 4.2.3+ and HDP 2.6+ versions and with all deployment models of IBM Spectrum Scale, which include IBM ESS and non-ESS based deployments. Please refer to IBM Spectrum Scale documentation for more information. Hadoop Storage Tiering with Spectrum Scale will be enhanced post HDP 3.0 to allow this tiering within the clusters enabled for HDFS federation as well.