File and Object Storage

Announcing HDP 3.0 support with IBM Spectrum Scale

By Pallavi Galgali posted Fri August 31, 2018 02:46 PM

We are excited to announce IBM Spectrum Scale support for Hortonworks Data Platform (HDP) 3.0. IBM Storage first announced its partnership with Hortonworks for IBM Spectrum Scale storage in Feb, 2017. Since then, we have continued our joint engineering efforts to certify all releases of IBM Spectrum Scale software and HDP to interoperate with each other.

The latest certification covers HDP 3.0 testing with the most current mod update of IBM Spectrum Scale 5.0. This certification is for Spectrum Scale software and hence applies to all the deployment models of Spectrum Scale, including Elastic Storage Server, which is a pre-integrated storage system based on IBM Spectrum Scale software. The certification applies to HDP 3.0, Ambari 2.7 and HDF 3.2 running on x86 or Power servers.

Here are some recent highlights related to Hortonworks integration with IBM Spectrum Scale Storage.

Namenode federation
HDP 3.0 supports multiple HDFS name nodes for improved scalability and availability.
IBM recently announced support for Hadoop Storage Tiering with IBM Spectrum Scale to be able to access IBM Spectrum Scale namespace from existing Hadoop cluster setup with native HDFS. This function allows you to have a native HDFS namespace and another IBM Spectrum Scale namespace within a single Hadoop cluster today.
Namenode federation support creates an opportunity to add multiple such namespaces within the same Hadoop cluster. We are currently working on leveraging name node federation to enhance experience of Hadoop storage tiering with Spectrum Scale.

Erasure Coding
HDP 3.0 offers improved data protection techniques with erasure coding in native HDFS. Erasure coding in native HDFS enables customers to avoid threefold replication and reduce storage footprint for their cold data. If you are comparing IBM Elastic Storage Server (ESS) with the native HDFS in the context of this new feature, here are the points to be noted:
• IBM ESS comes with its own erasure coding implementation which is not just meant for the cold or archive tier. But it allows you to run analytics directly on the erasure coded data avoiding 3-way replication completely.
• In addition, you may have to consider the complexities in rebalancing your data in your native HDFS environment between hot and cold tiers once you create a cold tier with erasure coding enabled. This type of tiering between different storage pools will be managed out-of-band if you use a shared storage like ESS freeing your compute cycles for actual Hadoop workloads.

HDFS encryption support
IBM Spectrum Scale offers in-built encryption support already. We are now announcing support for HDFS level encryption as well for IBM Spectrum Scale HDFS transparency connector. It is important to understand the difference between using HDFS level encryption vs in-built encryption with IBM Spectrum Scale. HDFS level encryption is per user based, and in-built encryption is per node based. So you get more fine-grained control at user level with HDFS level encryption, if that is what your use case demands. However, if you enable HDFS level encryption, you will not be able to get in-place analytics benefits such as accessing the same data with HDFS as well as POSIX/NFS. Thus, you now have more choice with the encryption implementation and can choose the type that best suits your requirements.

To summarize, IBM Spectrum Scale storage has been supporting Hortonworks customers on their analytics journey for more than a year now. With the growing adoption of containerization and increasing demand for different AI workloads, we see many of our clients considering IBM Spectrum Scale / ESS based shared data lake to run their Hadoop and non-Hadoop computing. HPC, Analytics and AI are the most common non-Hadoop workloads running alongside Hadoop today on this common data lake. These customers are benefitting tremendously from our in-place analytics capability with POSIX support that very few storage vendors are able to offer. POSIX is always preferred over NFS for these non-Hadoop workloads, while Hadoop workloads can access the same data using HDFS.

For more information on IBM Spectrum Scale support for Hortonworks, please refer to IBM Spectrum Scale documentation.