Technical Service Bulletin 2021-449, repost from Cloudera

 View Only

Technical Service Bulletin 2021-449, repost from Cloudera 

Tue January 19, 2021 11:54 AM

Technical Service Bulletin 2021-449

Kudu tablet server might crash in certain workflows where a tablet is dropped right after ALTER TABLE statement


DDL and DML operations can accumulate in the Kudu tablet replica's write ahead log (WAL) during normal operation. Upon the shutdown of a tablet replica (for example, right before removing the replica), information on the accumulated operations (first 50) are printed into the tablet server's INFO log file.
A bug was introduced with the fix for KUDU-2690. The code contains a flipped if-condition that results in de-referencing of an invalid pointer while reporting on a pending ALTER TABLE operation in the tablet replica's WAL. The issue manifests itself in kudu-tserver processes crashing with SIGSEGV (segmentation fault).
The occurrence of the issue is limited to scenarios which result in accumulating at least one pending ALTER TABLE operation in the tablet replica's WAL at the time when the tablet replica is shut down. An example scenario is an ALTER TABLE request (for example, adding a column) immediately followed by a request to drop a tablet (for example, drop a range partition). Another example scenario is shutting down a tablet server while it's still processing an ALTER TABLE request for one of its tablet replicas. A slowness in file system operations increases the chances for the issue to manifest itself.

Component affected:

  • Kudu

Products affected: 

  • CDH

Releases affected: 

  • CDH 6.2.0, 6.2.1
  • CDH 6.3.0, 6.3.1, 6.3.2, 6.3.3

Users affected: 

  • Kudu clusters with the impacted releases

Impact: 

  • In the worst case, multiple kudu-tserver processes can crash in a Kudu cluster, making data unavailable until the affected tablet servers are started back.

Severity: 

  • High

Action required:

  • Workaround
    • Avoid dropping range partitions and tablets right after issuing ALTER TABLE request. Wait for the pending ALTER TABLE requests to complete before dropping tablets or shutting down tablet servers.
  • Solution
    • Upgrade to CDH 6.3.4 or CDP



#Cloudera
#Hadoop
#OpenSourceOfferings

Statistics

0 Favorited
5 Views
0 Files
0 Shares
0 Downloads