Open Source Databases

Cloudera Technical Service Bulletin 2021-449

  • 1.  Cloudera Technical Service Bulletin 2021-449

    Posted Tue January 19, 2021 11:58 AM
    Edited by Lynn Chou Tue January 19, 2021 12:45 PM

    Kudu tablet server might crash in certain workflows where a tablet is dropped right after ALTER TABLE statement


    DDL and DML operations can accumulate in the Kudu tablet replica's write ahead log (WAL) during normal operation. Upon the shutdown of a tablet replica (for example, right before removing the replica), information on the accumulated operations (first 50) are printed into the tablet server's INFO log file.
    A bug was introduced with the fix for KUDU-2690. The code contains a flipped if-condition that results in de-referencing of an invalid pointer while reporting on a pending ALTER TABLE operation in the tablet replica's WAL. The issue manifests itself in kudu-tserver processes crashing with SIGSEGV (segmentation fault).
    The occurrence of the issue is limited to scenarios which result in accumulating at least one pending ALTER TABLE operation in the tablet replica's WAL at the time when the tablet replica is shut down. An example scenario is an ALTER TABLE request (for example, adding a column) immediately followed by a request to drop a tablet (for example, drop a range partition). Another example scenario is shutting down a tablet server while it's still processing an ALTER TABLE request for one of its tablet replicas. A slowness in file system operations increases the chances for the issue to manifest itself.

    Component affected:

    • Kudu

    Products affected: 

    • CDH

    Releases affected: 

    • CDH 6.2.0, 6.2.1
    • CDH 6.3.0, 6.3.1, 6.3.2, 6.3.3

    Users affected: 

    • Kudu clusters with the impacted releases

    Impact: 

    • In the worst case, multiple kudu-tserver processes can crash in a Kudu cluster, making data unavailable until the affected tablet servers are started back.

    Severity: 

    • High

    Action required:

    • Workaround
      • Avoid dropping range partitions and tablets right after issuing ALTER TABLE request. Wait for the pending ALTER TABLE requests to complete before dropping tablets or shutting down tablet servers.
    • Solution
      • Upgrade to CDH 6.3.4 or CDP

    https://community.ibm.com/community/user/hybriddatamanagement/viewdocument/technical-service-bulletin-2021-449?CommunityKey=99c4cc7a-4544-406c-b1b2-b74f2fcf3cba&tab=librarydocuments

    ------------------------------
    Lynn Chou
    Offering Manager, Cloudera Partnership
    IBM
    ------------------------------