AIOps

AIOps

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

 View Only

CP4AIOps - v4.10 Feature Update: Improved Observability for Large Jobs via Bulk Loading and Kafka Tuning

By Krishna Kodali posted 2 days ago

  

Optimizing Observability for Large Workloads: Reducing Data Loading Time Without Impacting Smaller Jobs

This blog explores techniques for improving observability data loading performance when working with various Observers, such as IBM Network Manager, File Observer for multiple large text files, Instana and Dynatrace etc. It focuses on options to reduce observability time in large-scale workloads while ensuring smaller jobs remain unaffected. 

When creating an Observation in CP4AIOps, you'll notice a new "Bulk load job" option under the "Additional Parameters (optional)" section. This feature is disabled by default (set to false). To enable it for handling large data loads, simply set the parameter to true.

Below is a graphical representation of the behavior in versions v4.9.x and earlier, where large jobs were handled by a single Topology Service instance, causing smaller or regular jobs to get delayed behind them:

The following illustrates the updated approach introduced in v4.10, where large jobs are now processed by multiple Topology service instances using multiple Kafka partitions for improved scalability and performance.

In short, for optimal performance, run regular jobs without the Bulk-load option. Enable Bulk-load only for large jobs. Using it for every job is not recommended, as it may slow down some operations.

Since data is processed by all Topology instances, a default production-sized AIOps deployment typically includes three Cassandra nodes and two Topology service instances. If sufficient hardware resources are available, it's recommended to add a third Topology service instance, one per Cassandra node for improved performance and balanced processing.

Here’s an example: loading over 15,000 network devices using the IBM Network Manager Observer without the Bulk-load option took approximately 8 hours. With Bulk-load enabled, the observation completed in under 3 hours, resulting in a time savings of nearly 5 hours.

Last but not least, if you're using the New multi-file option for File-Observer, meaning you'll be loading multiple files—it will automatically default to bulk-load mode, unless you manually choose to disable it.

With that, Happy Observing !!

0 comments
55 views

Permalink