Technical Service Bulletin 2021- 463, report from Cloudera

 View Only

Technical Service Bulletin 2021- 463, report from Cloudera 

Tue January 26, 2021 01:03 PM

HBase Performance Issue

The HDFS short-circuit setting dfs.client.read.shortcircuit is overwritten to disabled by hbase-default.xml, resulting in performance issues for HBase. HDFS short-circuit reads bypass access to data in HDFS by using a domain socket (file) instead of a network socket. This alleviates the overhead of TCP to read data from HDFS which can have a meaningful improvement on HBase performance (as high as 30-40%).
Users can restore short-circuit reads by explicitly setting dfs.client.read.shortcircuit in HBase configuration via the configuration management tool for their product (e.g. Cloudera Manager or Ambari)

Products affected: 

  • CDP
  • CDH
  • HDP

Releases affected: 

  • CDP 7.x 
  • CDH 6.x
  • HDP 3.x

Impact: 

HBase reads with high data-locality will not execute as fast as previously. HBase random read performance is heavily affected as random reads are expected to have low latency (e.g. Get, Multi-Get). Scan workloads would also be affected, but may be less impacted as latency of scans is greater.

Severity: 

  • High

Action required:

The following workaround can be taken to enable short-circuit read.
Cloudera Manager:
HBase -> Configurations -> HBase (Service-wide) -> HBase Service Advanced Configuration Snippet (Safety Valve) for hbase-site.xml -> 
dfs.client.read.shortcircuit=true
dfs.domain.socket.path=<Add same value which is configured in hdfs-site.xml>
 
Ambari:
HBase -> CONFIGS -> Advanced -> Custom hbase-site ->
dfs.client.read.shortcircuit=true
dfs.domain.socket.path=<Add same value which is configured in hdfs-site.xml>
 
After making these configuration changes, restart the HBase service.
 
Cloudera will continue to pursue product changes which may alleviate the need to make these configuration changes. 
For CDP 7.1.1.0 and newer, the metric shortCircuitBytesRead can be viewed for each RegionServer under the RegionServer/Server JMX metrics endpoint. When short circuit reads are not enabled, this metric will be zero. When short circuit reads are enabled and the data locality for this RegionServer is greater than zero, the metric should be greater than zero.



#Cloudera
#Hadoop
#OpenSourceOfferings

Statistics

0 Favorited
5 Views
0 Files
0 Shares
0 Downloads