The Kafka protocol consists of many types of operations, known as API keys, that define the interactions a client may have with a broker. For example, consuming or producing a message, joining a group, committing an offset, and so on.
Each API key has a message format that is versioned independently of other API keys, and one of the first interactions a client has with a broker is the ApiVersions
operation. This is used to query the set of API versions offered by the broker, from which a client selects the ones it supports. allowing them both to communicate compatibly.
In each Kafka release, the Java client libraries provided by the Kafka project are kept up-to-date with the latest API versions defined in that release. However, there are quite a few third-party client libraries, over which the Kafka team have no control, and development on some of these has slowed or even stopped completely. Therefore, the API versions that they support often lag behind the latest provided by the brokers they are connecting to. Maintaining backward compatibility with old and outdated versions to serve these third-party clients has become a significant burden for the Kafka project developers. So, in Kafka 4.0, several of the API keys have had their support for the oldest versions pruned. See the table below for information about the pruned versions for each API key.
The Kafka team were careful to avoid impacting the most widely used third-party libraries where possible, but with Kafka 4.0, there is a possibility that an application that uses an older version of one of these libraries might not be able to find API versions that are compatible with the newer brokers. What is needed is a way for a broker, or an Event Gateway acting in its place, to identify clients that use older and potentially incompatible versions before it is upgraded to 4.0. This is the purpose of the new client_api_versions_gauge
metric introduced in version 11.6.3 of IBM Event Endpoint Management.
For each call from a client, the client_api_versions_gauge
metric records the API version used for the call, and includes the API key and client id as attributes. We can use a tool such as Prometheus to scan versions and filter by API key, client id, or both to find which clients might become incompatible after an upgrade to Kafka 4.0. The following table shows an example with raw data from the metric:

Each row in the table is an instance of the metric, showing the attributes for each metric, including the API key and client id. Ignore the others, exported_job
, instance
and job
, as they are injected by the reporting component. You can also see that in our case the metrics collector has prepended the component name eemgateway_
to the metric name. We will need to include this when building our queries.
The numbers on the right are the value of the metric for each combination of attributes. For example, in the second row, we can see client producer-1
was using version 4 of the ApiVersions
API key and, at the bottom, version 12 of the Metadata
API key.
We can display a simplified version of the table by adjusting the query:

This only displays the API key and client id, making things a little easier to inspect. The metric becomes more useful when we start to filter the data. For example, let’s look at what versions are being used by a particular client id:

Here, by adding the filter {clientId=”producer-1”}
to the query, we can see the versions for each API key used by an application using that client id. If we have clients that have ids matching a pattern, for example, because they belong to the same application or group of applications, we can use a regular expression to match ids as follows:

Note the use of =~
in the filter expression in the query. This defines a match by using regular expressions. Full details of the Prometheus query language is outside the scope of this article, but is available in the Prometheus documentation at https://prometheus.io/docs/prometheus/latest/querying/basics/
Now we come to the goal of this article: finding clients that might be using old API versions. First, let us find what versions of a particular API key the clients are using:

We can see some are operating at version 17, but there are two still using version 11. Just as an example, we might know there is a problem with clients operating at versions 11 or earlier, so how do we isolate those? The answer is by applying a filter to the API version in the query as follows:

By specifying <= 11
in our query, we are saying we only want results where the version of Fetch is within the problematic range. This is the information we require to pinpoint the applications that have to be upgraded with newer Kafka client libraries. Now we can use the client id to identify the application that needs updating, in this example, quotaControl
.
An important note is that we are relying on the client id value to identify such applications. It should be apparent that using meaningful client ids is very important for fully understanding Kafka client activity. Without that, the metric loses much of its usefulness.
Other uses
We can use the client_api_versions_gauge
metric for other purposes too. Because it shows the pattern of operations performed by a client, by observing the number of instances of each API key, we can get an idea of which clients are interacting efficiently with the gateway, and which are not. For instance, a badly written consuming client might perform a relatively expensive authentication and group allocation before each fetch of records. A well-written consumer would authenticate and join a group once, then perform all required fetches before finally disconnecting. We can analyse such patterns as follows:

In the previous table, we are using the aggregate function count_over_time()
to count instances of each API key and client id over a 5-minute period (note the time range specifier [5m]
). The client labelled goodConsumer
performs the authentication, setup and close operations only once for its 20 fetches. The client labelled badConsumer
performs all operations for every fetch, indicating possibly questionable design in the client application software, showing where we should focus our efforts if we want to improve our application efficiency and reduce network load.
Conclusion
Hopefully in this article I have shown you how to use the Event Gateway metrics to help you get a clear picture of what operations your Kafka client applications are performing, and what version levels they are operating at. This will help you keep your applications secure and up-to-date, and avoid unwanted and costly system failures.
Notes:
Summary of Kafka API keys and pruned versions
All ranges are inclusive
Table 1: Pruned Kafka API versions
API key
|
Pruned versions
|
Produce
|
V0-V2
|
Fetch
|
V0-V3
|
ListOffset
|
V0
|
Metadata
|
none
|
OffsetCommit
|
V0-V1
|
OffsetFetch
|
V0
|
FindCoordinator
|
none
|
JoinGroup
|
none
|
Heartbeat
|
none
|
LeaveGroup
|
none
|
SyncGroup
|
none
|
DescribeGroups
|
none
|
ListGroups
|
none
|
SaslHandshake
|
none
|
ApiVersions
|
none
|
CreateTopics
|
V0-V1
|
DeleteTopics
|
V0
|
DeleteRecords
|
none
|
InitProducerId
|
none
|
OffsetForLeaderEpoch
|
V0-V1
|
AddPartitionsToTxn
|
none
|
AddOffsetsToTxn
|
none
|
EndTxn
|
none
|
WriteTxnMarkers
|
none
|
TxnOffsetCommit
|
none
|
DescribeAcls
|
V0
|
CreateAcls
|
V0
|
DeleteAcls
|
V0
|
DescribeConfigs
|
V0
|
AlterConfigs
|
none
|
AlterReplicaLogDirs
|
V0
|
DescribeLogDirs
|
V0
|
SaslAuthenticate
|
none
|
CreatePartitions
|
none
|
CreateDelegationToken
|
V0
|
RenewDelegationToken
|
V0
|
ExpireDelegationToken
|
V0
|
DescribeDelegationToken
|
V0
|
DeleteGroups
|
none
|
Full details: https://cwiki.apache.org/confluence/display/KAFKA/KIP-896%3A+Remove+old+client+protocol+API+versions+in+Kafka+4.0