There is a growing requirement to allow access to internal Kafka clusters from outside the enterprise, whether that is with selective business partners or for wider public access. IBM Event Endpoint Management allows you to open up access to your Kafka clusters via an Event Gateway. However, allowing external access can mean that badly written applications, or those that do not follow your software development best practices can interact with the gateway. IBM Event Endpoint Management v11.3.0 provides new features to help with this.
External access topologies
In order to allow external access, the Event Gateway will need to be either deployed in the DMZ or have a TCP proxy such as nginx or IBM DataPower forward traffic onto the gateway. Whilst there are no issues with running the Event Gateway in the DMZ, it is only available in a containerised form factor which means that it is more likely there is already some form of network ingress in place.
The Event Gateway works at the Kafka protocol level and allows for finer grained more intelligent restrictions based on it's understanding of the protocol and Kafka behaviour.
Whether the Event Gateway is deployed in the DMZ or behind a TCP proxy, the options for securing the gateway remain the same in both scenarios.
Problematic applications
There are two broad categories of problematic application that an internet facing gateway needs to consider
- Badly behaved applications : Kafka applications typically have long lived connections and process events in batches over this connection and Kafka servers have been optimised for this. This requires the application developer and client library to follow this paradigm, but that isn't always the case. One of the worst examples of this would be an application that connects, sends a single message, disconnects, connects, sends a single message, disconnects and so on. This kind of behaviour could impact the performance of your Kafka cluster.
- Invalid applications : Applications can be misconfigured or enter an invalid state which causes them to behave in a way that could result in excessive resource usage in either the gateway or Kafka broker. This in turn can negatively impact other applications even to the point of the gateway becoming unavailable or unresponsive. One example of this is an application continually trying to login using invalid or expired credentials or attempting to produce or consume from topics it does not have permission to do so.
Configuring restrictions in the gateway
The gateway can apply restrictions to client applications and these are configured using the Custom Resource (CR) https://ibm.github.io/event-automation/eem/reference/api-reference/ for operator based deployments or using environment variables for container based / standalone deployments.
When any restrictions are applied, the following should be noted
Connection restrictions
Configuring connections helps control those applications that are making multiple short lived connections.
CR
|
Environment Variable
|
Description
|
Default Value
|
spec.security.connection.closeDelayMs
|
CONNECTION_CLOSE_DELAY_MS
|
The minimum delay in milliseconds after you close a connection
|
8000
|
spec.security.connection.closeJitterMs
|
CONNECTION_CLOSE_JITTER_MS
|
Additional delay in milliseconds after you close a connection
|
4000
|
spec.security.connection.perSubLimit
|
MAX_CONNECTIONS_PER_SUBSCRIPTION
|
The maximum allowed TCP connections for each subscription
|
-1 (off)
|
The close delay and jitter settings are intended to protect against applications that frequently open and close connections by introducing a delay before the gateway will close the connection. This delay will cause the application to wait until the close completes before it continues processing or attempting to re-connect. The jitter value is added to the base delay so that any potential re-connections from applications are spread out.
The maximum number of connections per subscription needs to be used with care as it will depend on how the Kafka client library manages its connections and how many brokers/partitions are serving a particular topic. For example, if a topic has partitions spread across 3 brokers then a client connection will need at least 3 connections, but the client library may also make additional connections in order to perform various lookup tables. This setting is gateway wide, so too low a setting will potentially not only impact current topics being access through the gateway but could also affect future ones
Authentication restrictions
Invalid or misconfigured applications can be locked out afer a certain number of failures. This feature can also be used to prevent brute force attempts to derive a valid username and password.
CR
|
Environment Variable
|
Description
|
Default Value
|
spec.security.authentication.maxRetries
|
AUTHN_MAX_RETRIES
|
The minimum delay in milliseconds after you close a connection
|
-1 (off)
|
spec.security.authentication.retryBackoffMs
|
AUTHN_BACKOFF_DELAY_INCREMENT_MILLIS
|
The backoff time in milliseconds between consecutive failed authentication attempts
|
0
|
spec.security.authentication.lockoutPeriod
|
AUTHN_LOCKOUT_PERIOD_SECONDS
|
The duration in seconds while the account is locked after an unsuccessful authentication attempt
|
-1 (off)
|
Note that locked account information is not persisted and if the gateway is restarted, then any locked accounts will be unlocked.
Resources control
This allows a separate max message size to be set independently of the one set in the Kafka server. This can allow external users to have a more restrictive max message size than internal users of the same Kafka cluster.
CR
|
Environment Variable
|
Description
|
Default Value
|
spec.security.request.maxSizeBytes
|
KAFKA_MAX_MESSAGE_LENGTH
|
The maximum size allowed for the request payload in bytes
|
-1 (off)
|
Quotas
Quotas (https://ibm.github.io/event-automation/eem/describe/option-controls/#quota-consume) allow control over either how many messages or bytes a producer or consumer is allowed to send through the gateway on a per option basis. Quotas can be used to prevent excessive production or consumption by clients which in turn reduces the load on both the gateway and Kafka cluster.
The ability to control quotas and maximum message sizes builds upon the existing and well understood Kafka mechanisms, whilst providing a greater degree of control and flexibility on how they are applied. IBM Event Endpoint Management applies these restrictions at a more granular level allowing different values to be applied to the same underlying topic, whilst still adhering to the Kafka protocol. This allows developers to write their applications to respond to these restrictions as if they came directly from Kafka without needing any specialised knowledge.
Deployment of internet facing gateways
It is recommended that you have different gateway groups for internal and external Kafka applications. The gateways serving external clients should have appropriate restrictions set, whereas the internal gateways can have fewer or less restrictive policies applied. Additionally, options that are published to the external gateway group can have quotas applied so as to restrict the amount of data that those clients can produce or consume.