What is a KCOP?
A Kafka Custom Operation Processor (KCOP) lets you decide what gets written to Kafka in response to an event that happens on the source database. Not only do they allow unparalleled flexibility but also improve the performance of writing data to Kafka!
KCOPs are pluggable java transformation modules. The input to a KCOP is the source event’s values and metadata and a KCOP’s output is the Kafka producer records that IBM Data Replication’s CDC Replication Engine for Kafka will write to a Kafka cluster in response.
A KCOP is called once for every replicated source insert, update and delete. For each event, it returns a list of Kafka records. By doing so, it decides how many records to write to Kafka, where those records should go (destination topic and partition) and what the format and content of the record payload is.
https://www.ibm.com/docs/en/idr/11.4.0?topic=crek-kafka-custom-operation-processor-kcop-cdc-replication-engine-kafka
Videos and Examples…
https://www.ibm.com/support/pages/ibm-data-replication-community-wiki#Kafka
Sounds Great, But How do I Use One?
If you are running the CDC Replication Engine for Kafka, you already are! A KCOP is always used to generate the Producer Records that IDR then sends to Kafka. CDC includes 6 KCOPs that can be selected for a subscription, with a KCOP sending records with Avro formatted content being selected by default. These “integrated KCOPs” are built-in and selectable through the GUI.
The provided integrated KCOPs span multiple data formats (Avro, JSON, CSV) and multiple patterns of apply (Kafka compaction compliant streaming, Audit messages, etc). Each has a myriad of additional features you can select to further customize the messages: you can configure headers, destination topics, metadata fields, alternate serializers, etc. through properties.
https://www.ibm.com/docs/en/idr/11.4.0?topic=kafka-enabling-integrated-custom-operation-processors-kcop
What about CRAZY levels of CuStOmiZATion?
While many make use of the integrated KCOPs, there are business flows that require bespoke custom KCOPs tailored to their exact needs. By providing a custom KCOP, you can determine to the byte level the content of records sent to Kafka.
Examples of leveraging custom KCOPs include:
1) One Source Event To Many Kafka Messages
Multiple Data Formats : Writing messages out in two different formats (eg. Avro and JSON) for each replication event. This can assist in migration scenarios, older applications can read the JSON topic while newer ones can make use of the Avro topic containing the same replication events.
Managing Sensitive Information: Often times the source data contains sensitive information which isn’t required for aggregating analytics use cases, but is required for some downstream applications. Eg. a retailer doesn’t need to expose credit card numbers to his analytics applications but does need it for his accounting ones. Messages containing the full source table content are written to a topic with appropriate access control and the same message without the sensitive columns to a topic which more general users can access.
Flagging and Alerts: Examine event data to programatically decide if an extra event should be sent to a special notify topic. Banks leverage this to write a specific message to a topic for high dollar transactions so as to notify users of account activity in real time.
2) Encryption and Bespoke Transformations
Encryption: Many businesses have security requirements that preclude data leaving their local network with certain column values un-encrypted. KCOPs are a life saver for making calls to Encryption API’s in horizontally scalable parallelism and while streaming in real time. The CDC Replication Engine for Kafka sits on the local network and encrypts specific source column values before writing the resulting messages to a cloud based managed Kafka solution. Some companies implemented multiple enterprise third party encryption solutions, while other companies developed RSA encryption protocols in-house.
Heavy Transformation: Kafka doesn’t define the format of the content of its message payload so implementing data formats beyond those provided enables great flexibility (eg. protobuf).
Integration With Third Party And Legacy Applications: Integrated KCOPs offer many property-driven options to alter the schema. In cases where a legacy application exists that needs an exact schema, you can write a custom KCOP to specify the schema and data content to the byte level. Need a datatype transformation to avoid changing a legacy app… or some extra fields added? KCOPs have you covered.
How Easy Is Making A Custom KCOP?
Creating a custom KCOP requires writing a java class that implements 3 methods. In case it has been a while since the last time you wrote a java class, we have you covered. The code for every one of our integrated KCOPs is provided, and often all that’s needed for a bespoke requirement is a few lines added to one of the existing ones. Each subscription can run one KCOP and you provide the name of yours in the Management Console GUI.
If you want to get fancy and incorporate elaborate third party libraries and what not, KCOPs even offer an optional separate classloader so you can be sure your KCOP is running your dependencies.
https://www.ibm.com/docs/en/idr/11.4.0?topic=kcopkcrek-developing-user-defined-kafka-custom-operation-processor-unix-linux
A video example of a quick custom KCOP using an integrated KCOP as the starting point…
Build a user-defined KCOP
Fancier KCOP creation with a custom class loader for external jars
https://ibm.box.com/shared/static/0tvt9nqsh9m45nz93wr2eho10z1uh1sq.mov
Better Performance By Design:
Transformation Scalablity:
KCOPs are horizontally scalable by design. By default, every source replication event is processed by a single thread executing the KCOP. This can achieve many thousands of operations per second. However some mainframe workloads are particularly heavy, and some custom KCOPs do very complex transformations. The CDC Replication Engine for Kafka allows you to define how many parallel threads you’d like to be executing KCOPs at the same time.
Setting the target_num_image_builder_threads property enables that number of parallel executors of the KCOP module. The engine preserves the ultimate order of sending the records, and transformation performance is significantly improved in heavy workloads. Just make sure you have the cores to run the number of threads you’ve asked for.
Offload The Kafka Producer:
The Kafka Producer is the library is the open-source library that writes data to the actual cluster. For convenience, it also traditionally incorporates transforming the data through its serializers and interceptors. The CDC Replication Engine for Kafka does all the transformation work upstream and in parallel; what Kafka producers see is bytes to be sent as-is. What you see is better performance!
Orchestrate Many Producers:
Traditionally, to send in-order data to a partition of a Kafka topic, you can only make use of one Kafka producer per topic to avoid race conditions . This Single Kafka Producer carries the burden of determining which partition a given message should go to and sending it for every message in a topic, across all the partitions of that topic. But what are your other producers doing?
The KCOP partitions data upstream of the producer. This allows the replication engine to leverage workload balancing and have multiple producers writing in an ordered manner to different partitions of the same topic! The engine-orchestrated Kafka producers don’t need to perform partitioning because they only see the data meant for the partition they are writing to. This can multiply throughput rates in demanding workloads with uneven distribution against destination topics. It also helps with batching and compression.
https://www.ibm.com/docs/en/idr/11.4.0?topic=kafka-enabling-partitioning-data-topics-kcops
In Conclusion:
The Kafka Custom Operation Processor is a powerful tool that extends the capabilities of the CDC Replication Engine for Kafka. It allows you to control the records that are written to Kafka topics, enabling you to transform, filter, or enrich the data as needed. This makes KCOP an invaluable tool for anyone looking to unlock the performance and flexibility of Kafka.