Open Source Offerings

Technical Service Bulletin 2021- 463

  • 1.  Technical Service Bulletin 2021- 463

    Posted Tue January 26, 2021 01:06 PM

    HBase Performance Issue

    The HDFS short-circuit setting dfs.client.read.shortcircuit is overwritten to disabled by hbase-default.xml, resulting in performance issues for HBase. HDFS short-circuit reads bypass access to data in HDFS by using a domain socket (file) instead of a network socket. This alleviates the overhead of TCP to read data from HDFS which can have a meaningful improvement on HBase performance (as high as 30-40%).
    Users can restore short-circuit reads by explicitly setting dfs.client.read.shortcircuit in HBase configuration via the configuration management tool for their product (e.g. Cloudera Manager or Ambari)

    Products affected: 

    • CDP
    • CDH
    • HDP

    Releases affected: 

    • CDP 7.x 
    • CDH 6.x
    • HDP 3.x

    Impact: 

    HBase reads with high data-locality will not execute as fast as previously. HBase random read performance is heavily affected as random reads are expected to have low latency (e.g. Get, Multi-Get). Scan workloads would also be affected, but may be less impacted as latency of scans is greater.

    Severity: 

    • High

    Action required:

    The following workaround can be taken to enable short-circuit read.
    Cloudera Manager:
    HBase -> Configurations -> HBase (Service-wide) -> HBase Service Advanced Configuration Snippet (Safety Valve) for hbase-site.xml -> 
    dfs.client.read.shortcircuit=true
    dfs.domain.socket.path=<Add same value which is configured in hdfs-site.xml>
     
    Ambari:
    HBase -> CONFIGS -> Advanced -> Custom hbase-site ->
    dfs.client.read.shortcircuit=true
    dfs.domain.socket.path=<Add same value which is configured in hdfs-site.xml>
     
    After making these configuration changes, restart the HBase service.
     
    Cloudera will continue to pursue product changes which may alleviate the need to make these configuration changes. 
    For CDP 7.1.1.0 and newer, the metric shortCircuitBytesRead can be viewed for each RegionServer under the RegionServer/Server JMX metrics endpoint. When short circuit reads are not enabled, this metric will be zero. When short circuit reads are enabled and the data locality for this RegionServer is greater than zero, the metric should be greater than zero.

    https://community.ibm.com/community/user/hybriddatamanagement/viewdocument/technical-service-bulletin-2021-46?CommunityKey=99c4cc7a-4544-406c-b1b2-b74f2fcf3cba&tab=librarydocuments

    ------------------------------
    Lynn Chou
    Offering Manager, Cloudera Partnership
    IBM
    ------------------------------