Do you have a strategy for documenting the Apache Kafka topics used in your organisation?
I think you should. But before I give you a few reasons why, let me tell you a short story.
It's a normal day at work when your boss comes to talk to you.
They have a new project for you. An important project. An urgent project.
Your company's ACME Industries system is generating valuable data every few seconds, but nothing is being done with it. Critical opportunities are being missed.
They want you to create a system that will perform essential computations on every ACME value, and send a notification if the running total for any three-hour-window exceeds the threshold.
You start to think about how to tackle the project.
You're a Kafka developer, so this quickly sounds like a stream processing project to you.
You start to piece together the shape of a solution in your mind.
You need a Kafka Connect connector that can get the data out of the ACME system and onto a Kafka topic.
Then, it's a simple Kafka Streams application to perform the stream processing transformation and send the notification.
You start by looking for a Connector.
Sure enough, you find there are a few Connectors that say they work with your company's ACME system, so you pick one and start downloading it.
Two days later, you're still fighting with it.
The first two Connectors you found just don't seem to work at all. You burned hours trying to configure it to connect to your ACME system properly and deciphering the vague incomprehensible error messages.
None of the Connectors are documented clearly, so trying to get it to work with your ACME system is slow and frustrating.
This is an urgent project, and you've wasted two days just fighting with a Connector.
You make a start with the last of the Connectors that you found. And after another half-day of frustrating and confusing trial-and-error, you finally get it working. You've got a topic with a live stream of events from your ACME system!
Over a coffee, you start telling a colleague about the difficulties you've been having for the last few days.
"Why did you do that?" they say. "There's already a topic with a stream of events from the ACME system. You could've just used that!"
Your last few days were bad enough.
Now, it looks like they were a waste of time.
This is urgent. No time to feel sorry for yourself. It's time to move on.
You update your plan. You'll use the existing topic, and write the stream processing app based on it.
You start looking at the messages on the topic from the ACME system.
The messages are associated with a schema, but that just helps you deserialize the bytes on the topic - it's not enough to help you understand what the values mean.
The names of fields in the payload don't match what you were expecting. The field names are short and cryptic, and it's not at all clear how the data relates to what you were expecting from ACME.
You try to find out who is responsible for this topic, or who set up the feed of events from the ACME system into Kafka, but no-one seems to know. You can't find any information about what the values in the messages mean or how you should use them.
This urgent project is still dragging on.
I said at the start that I think you should have a strategy for documenting the Kafka topics used in your organisation.
Hopefully, this has helped illustrate why:
- How can you know what Kafka clusters and topics are available for you to use in your organisation?
- If you find a Kafka cluster and topic that could help with a project, how can you interpret the data that is on it?
- How can you get started on new projects as efficiently and productively as possible, without wasting time unnecessarily?
The new Event Endpoint Management capability in Cloud Pak for Integration can help you get started with this.
We'll write more posts on this soon, but in the meantime you can find more info in the Cloud Pak for Integration Knowledge Center or join our webinar, "Automate your integration - What's new in Cloud Pak for Integration 2021.1.1" on April 15th.