IBM Event Streams and IBM Event Automation

IBM Event Streams and IBM Event Automation

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

 View Only

Using annotations to store info about Kafka topics in IBM Event Streams

By Dale Lane posted 27 days ago

  

In this post, I highlight the benefits of using Kubernetes annotations to store information about Kafka topics, and share a simplified example of how this can even be automated.

Managing Kafka topics as Kubernetes resources brings many benefits. For example, they enable automated creation and management of topics as part of broader CI/CD workflows, it gives a way to track history of changes to topics and avoid configuration drift as part of GitOps processes, and they give a point of control for enforcing policies and standards.

The value of annotations

Another benefit that I’ve been seeing increasing interest in recently is that they provide a cheap and simple place to store small amounts of metadata about topics.

For example, you could add annotations to topics that identify the owning application or team.

apiVersion: eventstreams.ibm.com/v1beta2
kind: KafkaTopic
metadata:
  name: some-kafka-topic
  annotations:
    acme.com/topic-owner: 'Joe Bloggs'
    acme.com/topic-team: 'Finance'

Annotations are simple key/value pairs, so you can add anything that might be useful to a Kafka administrator.

You can add links to team documentation.

apiVersion: eventstreams.ibm.com/v1beta2
kind: KafkaTopic
metadata:
  name: some-kafka-topic
  annotations:
    acme.com/documentation: 'https://acme-intranet.com/finance-apps/some-kafka-app'

You can add a link to the best Slack channel to use to ask questions about the topic.

apiVersion: eventstreams.ibm.com/v1beta2
kind: KafkaTopic
metadata:
  name: some-kafka-topic
  annotations:
    acme.com/slack: 'https://acme.enterprise.slack.com/archives/C2QSX23GH'

This can either be done as a convention or best-practice, or enforced through policies if you want to mandate it.

The annotations don’t impact the Topic Operator, as they are ignored (for functional Kafka purposes) but they are a useful way to store useful operational info without needing to bring in additional dependencies or tooling.

I’ve talked with several teams who have started adopting this, and are finding this cheap and simple practice to be very useful. Once you have multiple teams and applications using hundreds of topics on a shared Kafka cluster, these kinds of reminders and pointers become hugely valuable.

Moving from manual to automated annotations

The next step from this is to automate maintaining the most useful annotations, saving time and making sure that they hold the most up-to-date information about the topics.

In a recent proof-of-concept, I helped a team used this approach to help track the usage of topics in a large shared Kafka cluster. The challenge posed by the cluster administrator was that they had thousands of topics on their Kafka cluster, many of which they suspected were no longer used or needed, but that they couldn’t be confident were safe to delete.

Their first approach had been to annotate their KafkaTopic Kubernetes resources with contact details for topic owners as a way of enabling an automated way to send periodic “Do you still need this?” emails to topic owners. They were looking for a way to streamline this process.

An example automated Kafka topic annotator

What we explored was an automated “topics usage monitor” that would use Kafka metrics collected in Prometheus to determine if a topic had been used recently, and store that information in the KafkaTopic Kubernetes resource as an annotation.

This means they can now query the annotations for any of their topics and see when it was last used:

% check-usage.sh bikesharing-location
bikesharing-location last used at 2025-06-01 20:17:22

This is a very simple tool. I’ve shared it on Github at:
github.com/dalelane/kafka-topics-usage-monitor

The version I’ve shared is my simplified and anonymised first proof-of-concept but there is enough there to get you started if you want to do something similar.

For each topic, the Topics Usage Monitor:

  1. Read from Kubernetes
    Retrieves the topic annotations with the usage info that was stored the last time the Monitor checked
  2. Read from Prometheus
    Queries Prometheus for the latest usage info
  3. Write changes to Kubernetes
    Updates the annotations – only if the usage info has changed since the values retrieved in (1)

I included a timestamp for the Prometheus data so it is easy to know when the usage info has last changed

I won’t detail all of the discussions that went into this, but a couple of interesting aspects worth calling out:

Which metrics to use to identify if a topic is still being used?

To track usage from Kafka producers, I used:
kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,topic=([-.\w]+)
To track usage from Kafka consumers, I used:
kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec,topic=([-.\w]+)
(See the Kafka metrics documentation for more info.)

(These were JMX values that were already being captured for their existing cluster monitoring, but I’ve included an example of how to capture these specific metrics if you aren’t already.)

I don’t suggest that these metrics alone are definitive proof that a topic is or isn’t still useful. A topic with a long retention period might have valuable data that is only consumed infrequently and periodically, and this might not be well reflected by these two metrics alone.

The aim for this project was to reduce the number of “Is this topic still useful?” questions the administrators were asking application teams. Focusing these questions on owners of topics that have had no consumer or producer traffic for an extended period felt like a good place to start.

How often should the Monitor poll?

Every time the Topics Usage Monitor updates a KafkaTopic operand with a new annotation, the new annotation triggers the Topic Operator to reconcile the updated resource (checking for any changes that need to be submitted to Kafka). The worst case would be if every topic has updated metrics data since the last poll, in which case this will cause the Topic Operator to reconcile every KafkaTopic operand.

For context, by default the Topic Operator does a periodic reconciliation every 2 minutes. Even if the Topics Usage Monitor polls every 2 minutes it would (at worst) be doubling the amount of work for the Topic Operator.[1]

For the Topics Usage Monitor, I went with a poll interval of 1 hour. This means that annotations will not keep an up-to-the-minute record of the last time a topic was used – at worst case, the last-used timestamp will be up to an hour out of date.

This was partly an ultra-conservative choice made out of an abundance of caution. However, as the aim for this project was to identify abandoned and orphaned topics, there was really no need for any greater accuracy than an hour.

Summary

There are many benefits to automating storing additional metadata about your Kafka topics in KafkaTopic annotations. This could be because that metadata is ephemeral and could be lost if not captured. It could be because it is more convenient to keep it all in one place, alongside the definitions of the topics themselves.

The Topics Usage Monitor is a nice illustration of both of these aspects. An instantaneous view of Prometheus metrics data doesn’t clearly identify whether a topic has been used in the last couple of months, however comparing the current value with previous values enables insights from change-tracking. And doing this using KafkaTopic annotations avoids needing to being in additional complex systems to do this such as Elasticsearch.

If you swap Prometheus for any other source of metadata about how a Kafka topic is being used, the same pattern can still apply:

  • retrieve the most recent metadata from the KafkaTopic
  • get the current metadata from the external source
  • update the KafkaTopic if the metadata has changed


[1] – For this specific project, they had customised their Topic Operator config to increase this interval because of the very large number of topics they have. The two-minute default interval is useful context nevertheless.

0 comments
7 views

Permalink