Federation was added to HDFS to improve the HDFS NameNode horizontal scaling. In HDFS transparency, federation is used to make IBM Spectrum Scale filesystems and HDFS filesystem coexist. The Hadoop applications can get input data from the native HDFS, analyze the input and write the output to the IBM Spectrum Scale filesystem. This feature is available in HDFS transparency version 2.7.0-2 (gpfs.hdfs-protocol-2.7.0-2) and later.
Also, the HDFS transparency federation can allow two or more IBM Spectrum Scale file systems act as one uniform file system for Hadoop applications. These file systems can belong to the same cluster or be part of different Spectrum Scale clusters. A typical scenario could be, you need to read data from an existing file system, analyze it, and write the results to a new IBM Spectrum Scale file system.
Note: If you want your applications running in clusterA to process the data in clusterB, only update the configuration for federation in clusterA. This is call federating clusterB with clusterA. If you want your applications running in clusterB to process data from clusterA, you need to update the configuration for federation in clusterB. This is called federating clusterA with clusterB.
This guide provides an overview of the HDFS Federation feature, configuration, and management of the federated cluster.
Mode A: IBM Spectrum Scale and Native HDFS federationPlease refer to
link for IBM Spectrum Scale and Native HDFS federation configuration details.
Mode B: IBM Spectrum Scale file systems federationPlease refer to
link for two Spectrum Scale HDFS Transparency federation configuration details.
Federation Known limits1. All the changes in /usr/lpp/mmfs/hadoop/etc/hadoop/core-site.xml and /usr/lpp/mmfs/hadoop/etc/hadoop/hdfs-site.xml must be updated in the configuration files used by the Hadoop distributions. However, Hadoop distributions manage their configuration and the management interface might not support the key used for federation, such as IBM BigInsights IOP takes Ambari and Ambari GUI does not support some property names (see Ambari-15455).
Similarly, HortonWorks HDP doesn’t support Federation officially yet.
If you want to set up federation, send an email to scale@us.ibm.com.
2. The native HDFS and HDFS transparency cannot be run over the same node because of the network port number conflict.
3. If you select to federate both native HDFS and HDFS transparency, configure the native HDFS cluster and make the native HDFS service function. Configure the federation for native HDFS and HDFS transparency.
For a new native HDFS cluster, while starting the service for the first time, DataNode registers itself with the NameNode. The HDFS Transparency NameNode does not accept any registration from the native HDFS DataNode. Therefore, an exception occurs if you configure a new native HDFS cluster, federate it with HDFS transparency, and then try to make both clusters (one native HDFS cluster and another HDFS Transparency cluster) function at the same time.
4. Start and stop the native HDFS cluster or the HDFS Transparency cluster separately if you want to maintain both of them.