IBM Event Streams and IBM Event Automation

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

View Only

Back to Blog List

Kafka’s transactional.id – your first guess is probably wrong. It’s actually all about zombies

By Kim Clark posted Wed June 19, 2024 11:07 AM

Kafka’s “transactional.id” has a very subtle name. It sounds like it’s an ID created to reference a specific transaction. However, if it were, it would be called “transaction.id” – notice the difference in the spelling? Yeah, way too subtle. However, what a transactional.id really does for you turns out to be extremely useful, so it’s worth the extra brain cycles required to understand it properly.

What is the transactional.id?

The transactional.id doesn’t relate to a particular transaction. It is used to identify transactions from a specific producer. A producer that performs transactions is known in the Kafka documentation as a “transactional producer”, and this is what the transactional.id relates to. A producer could run for days, or months, performing millions of transactions, and it will use the same transactional.id throughout.

The transactional.id was introduced as part of a move to improve transactionality in connectors. It enables transaction recovery across multiple sessions of a single producer instance. The transactional.id helps avoid “zombie producers” – a producer that was thought to have failed, or been deliberately killed, but is in fact still running.

Producers without transaction.ids creating duplicates

Zombie producers would result in duplicate messages as they would be publishing the same events that any new replacement producer would be creating. This is a particular concern in Kubernetes environments, where the platform may make the decision to kill a producer if it thinks it is failing, but a replacement instance may be created before the old one is fully removed.

A way to avoid these potential duplicates is to use a transactional producer for the creation of all events. If a replacement producer is created, it should then use the same transactional.id as the original.

Producers with transaction.ids avoiding duplicates

Kafka transactions enforce that only one producer is allowed to use a given transactional.id at a time. As a result, when Kafka sees the replacement producer, with the same transactional.id, it will “fence out” all interactions from the old one, disarming this zombie producer. Put another way, any transactions that the old (now zombie) producer had started, will be rolled back, and the new producer will cleanly take over.

Setting the transactional.id

The transactional.id is not assigned by Kafka. It is instead a property that is set when a producer is created. It is up to the application design to ensure that this value is unique compared to other producers, and to ensure the same id is used on restarts.

If the transactional.id is blank, the producer cannot perform transactions at all. If provided, the transactional.id is used to establish which transactions belong to that producer.

A common use for transactional.id is to keep partitioned applications separate. A partitioned application is where multiple copies (instance) of an application are created to process data in parallel, generally to improve performance. Each application instance works with its own “shard” of the data. The transactional.id must be unique for each producer instance to enable them to safely produce events in parallel. As noted, it must also remain the same across restarts of that instance.

The transactional.id and idempotence

It’s worth being aware that if the transactional.id is set, idempotence is automatically enabled as well, with a number of other implications. This is nicely described in the documentation, but not easy to find, so we’ll repeat it here for convenience.

“If the transactional.id is set, idempotence is automatically enabled along with the producer configs which idempotence depends on. Further, topics which are included in transactions should be configured for durability. In particular, the replication.factor should be at least 3, and the min.insync.replicas for these topics should be set to 2. Finally, in order for transactional guarantees to be realized from end-to-end, the consumers must be configured to read only committed messages as well.”

Transactions and authorization

Since transactional.id is provided by the producer there is a risk that you could choose a transactional.id owned by a different producer by accident, or indeed maliciously. For this reason, it is possible to specify what transactional.id values a producer is allowed to use.

On a Kafka level, this is done via an access control list (ACL). Any application/user that connects to Kafka has a set of authorization rules specified in an ACL. These note what topics the application can work with, whether they can write events to the topic, or just read them, and whether they can join particular consumer groups that access that topic.

Additionally, there is an ACL rule that controls use of transactional.id. If no rule is in place the application won’t be able to perform transactions on Kafka at all. Adding the rule allows you to specify what values for transactional.id you can use.

In base Kafka, ACLs are set up by Kafka administrators via the command line interface.

If you are using IBM Event Streams, they can also be set via the event streams command line, by creating KafkaUser resource with the Kubernetes API, or via a user interface. More details can be found in the documentation.

As an example, when creating a Kafka user in the IBM Event Streams UI you will be presented with the following options:

• No transactional IDs: Application cannot perform Kafka transactions
• All transactional IDs: Application can perform Kafka transactions using any transactional.id
• A specific transactional ID: Application can only work with Kafka transactions associated with a specific ID.
• Transactional IDs with prefix: Application can only work with transactions associated with IDs that begin with a prefix.

Under the covers, the UI then creates the relevant ACL based on your decision.

For simple use cases just choosing All transactional IDs may suffice. However, if you do plan to use transactions on a Kafka cluster shared with many other applications, a common technique balancing both isolation and simplified administration is to choose Transactional IDs with prefix.

As an example of the prefixes approach, you could create a Kafka user for each logical application. You could then set up a prefix based ACL for transactional.id based on the application name (e.g. a prefix of “txnlid-app1-”). The prefix restriction will ensure that it won’t interfere with any other applications’ transactions. If you need to run multiple instances of the application, for performance reasons for example, you can use the same Kafka user credentials for all of them so long as each instance has a unique value for transactional.id (e.g. “txnlid-app1-1, txnlid-app1-2”). You do of course need to make sure that each instance retains the same transaction.id over restarts to avoid those pesky zombies! Kubernetes StatefulSets will of course do this naturally for you via pod names.

Surviving the apocalypse

So, in summary, transactional.id isn't an identifier of an individual transaction, it's an identifier for a specific transactional producer – hence the name. It remains the same across restarts to ensure continuity across failures of a producer.

Specifically, transactional.id helps to avoid the creation of duplicates from old producers (zombies) that may not have closed down correctly by the time a replacement has started up. It ensures that only the new instance of the producer can create events.

Remember, you can’t negotiate with a zombie. Your best defence…is a transactional.id.

0 comments

14 views

Permalink

https://community.ibm.com/community/user/blogs/kim-clark1/2024/06/19/kafka-transactionalid

IBM Event Streams and IBM Event Automation

IBM Event Streams and IBM Event Automation