File and Object Storage

 View Only

IBM Spectrum Scale HDFS Transparency Short Circuit Write Support

By Archive User posted Tue November 28, 2017 02:13 AM

  
Background
• Native HDFS supports short circuit read to read data directly from local file system when HDFS client and target datanode are co-located on the same physical node.
• 10% ~ 30% performance gain according to workloads and total reading data size
• HDFS Transparency supports short circuit read
• Short circuit read only solves the network cost for data reading
• HDFS Client writes the data directly into Spectrum Scale without going through RPC/lo if HDFS Client and HDFS Datanode are co-located on the same node.
• Native HDFS can’t provide short circuit write

Short Circuit Write support
Short circuit write is supported since HDFS Transparency 2.7.3-1 release. If HDFS Client and HDFS Transparency DataNode are located on the same node, when writing file from HDFS client, Short Circuit Write will write data directly into Spectrum Scale file system instead of writing data through RPC. This could reduce the RPC latency through the local loop network adapter and thus enhance the write performance.


In Figure 01, (A) is for the original logic for data write. With short circuit write enabled, the data write logic will be shown as (B). The data will be written directly into Spectrum Scale file system.

If you want to enable this feature, you need to refer the HDFS Transparency Guide to enable short circuit read first. By default, when short circuit read is enabled, short circuit write is enabled too. When short circuit read is disabled, short circuit write is also disabled.

When you want to disable short circuit write when short circuit read is enabled, you could disable short circuit write by adding the gpfs.short-circuit-write.enable property and set the value as false in the hdfs transparency configuration file gpfs-site.xml.

When you take HortonWorks HDP with Spectrum Scale integration mpack, add the property from Ambari GUI and restart Spectrum Scale service to make effective.

If you take open source Apache Hadoop, change this in /usr/lpp/mmfs/hadoop/etc/hadoop/gpfs-site.xml and run “/usr/lpp/mmfs/bin/mmhadoopctl connector syncconf /usr/lpp/mmfs/hadoop/etc/hadoop” to sync the change to all HDFS Transparency nodes.

Performance data shown that short circuit write can leverage Spectrum Scale Native Client and high performance network, such as RDMA over IB to improve I/O bandwidth and latency and reduce CPU resource.

#Workloadandresourceoptimization
#FPO
#Data-centricdesign
#shortcircuitread
#Softwaredefinedinfrastructure
#shortcircuitwrite
#hadoopworkload
#Softwaredefinedstorage
#Real-timeanalytics
0 comments
3 views

Permalink