Cloud Pak for Integration

 View Only

Why should you document your Kafka topics?

By Dale Lane posted Fri March 26, 2021 09:12 AM


Do you have a strategy for documenting the Apache Kafka topics used in your organisation?

I think you should. But before I give you a few reasons why, let me tell you a short story.

"I have a new project for you..."

It's a normal day at work when your boss comes to talk to you.

They have a new project for you. An important project. An urgent project.

Your company's ACME Industries system is generating valuable data every few seconds, but nothing is being done with it. Critical opportunities are being missed.

They want you to create a system that will perform essential computations on every ACME value, and send a notification if the running total for any three-hour-window exceeds the threshold.

Thinking about using Kafka

You start to think about how to tackle the project.

You're a Kafka developer, so this quickly sounds like a stream processing project to you.

thinking about a possible architecture

You start to piece together the shape of a solution in your mind.

You need a Kafka Connect connector that can get the data out of the ACME system and onto a Kafka topic.

Then, it's a simple Kafka Streams application to perform the stream processing transformation and send the notification.


getting started with Kafka Connect

You start by looking for a Connector.

Sure enough, you find there are a few Connectors that say they work with your company's ACME system, so you pick one and start downloading it.

struggling to get a Connector working

Two days later, you're still fighting with it.

The first two Connectors you found just don't seem to work at all. You burned hours trying to configure it to connect to your ACME system properly and deciphering the vague incomprehensible error messages.

None of the Connectors are documented clearly, so trying to get it to work with your ACME system is slow and frustrating.

This is an urgent project, and you've wasted two days just fighting with a Connector.

sharing the experience with another developer

You make a start with the last of the Connectors that you found. And after another half-day of frustrating and confusing trial-and-error, you finally get it working. You've got a topic with a live stream of events from your ACME system!

Over a coffee, you start telling a colleague about the difficulties you've been having for the last few days.

"We already have a topic with events from the Acme system"

"Why did you do that?" they say. "There's already a topic with a stream of events from the ACME system. You could've just used that!"

depressed developer

Your last few days were bad enough.

Now, it looks like they were a waste of time.

re-planning the architecture, to remove the need to setup Kafka Connect

This is urgent. No time to feel sorry for yourself. It's time to move on.

You update your plan. You'll use the existing topic, and write the stream processing app based on it.

developing a stream processing app

You start looking at the messages on the topic from the ACME system.

They're confusing.

The messages are associated with a schema, but that just helps you deserialize the bytes on the topic - it's not enough to help you understand what the values mean.

The names of fields in the payload don't match what you were expecting. The field names are short and cryptic, and it's not at all clear how the data relates to what you were expecting from ACME.

even more depressed developer

You try to find out who is responsible for this topic, or who set up the feed of events from the ACME system into Kafka, but no-one seems to know. You can't find any information about what the values in the messages mean or how you should use them.

This urgent project is still dragging on.

questions that can be addressed through documenting Kafka topics

I said at the start that I think you should have a strategy for documenting the Kafka topics used in your organisation.

Hopefully, this has helped illustrate why:

  • How can you know what Kafka clusters and topics are available for you to use in your organisation?
  • If you find a Kafka cluster and topic that could help with a project, how can you interpret the data that is on it?
  • How can you get started on new projects as efficiently and productively as possible, without wasting time unnecessarily?

The new Event Endpoint Management capability in Cloud Pak for Integration can help you get started with this.

We'll write more posts on this soon, but in the meantime you can find more info in the Cloud Pak for Integration Knowledge Center or join our webinar, "Automate your integration - What's new in Cloud Pak for Integration 2021.1.1" on April 15th.