HDFS Transparency supports running the Hadoop Map/Reduce workloads inside Docker containers. See the Docker website
for an overview of the Docker technology.
It is recommended that one physical node only belongs to one HDFS Transparency cluster. This section explains the steps to configure a set of physical nodes as HDFS Transparency cluster and this HDFS Transparency cluster will provide the data access for Hadoop running inside containers. If you have multiple Hadoop clusters running in different containers, you have to configure one HDFS Transparency cluster for each Hadoop cluster to isolate the data between the different Hadoop clusters. For example:
Physical node 1/2/3 configured as HDFS Transparency cluster1 for Hadoop cluster1.
Physical node 4/5/6 configured as HDFS Transparency cluster2 for Hadoop cluster2.
With HDFS Transparency, you can run Hadoop Map/Reduce jobs in Docker and use the IBM Spectrum Scale as a uniform data storage layer over the physical machines, as shown in Figure 1:
You can configure different Docker instances from different physical machines as one Hadoop cluster and run Map/Reduce jobs on the virtual Hadoop clusters. All Hadoop data is stored in the IBM Spectrum Scale file system over physical machines. The 172.17.0.x IP address over each physical machine is one network bridge adapter used for network communication among Docker instances from different physical machines. HDFS Transparency services must be configured to monitor the network bridge and process requests from Docker instances. After receiving the requests from Hadoop jobs running in Docker instances, HDFS Transparency handles the IO requests for the mounted IBM Spectrum Scale file system on the node.
Please refer to the Configuring the Docker instance and HDFS Transparency link
for more details.#Customerexperienceandengagement#Softwaredefinedstorage#IBMSpectrumScale#Softwaredefinedinfrastructure