AIOps

AIOps

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

 View Only

Training-Level Filtering in AIOps

By Vinayaka Hanumanthappa posted Mon May 19, 2025 03:28 AM

  

Overview

In environments where AIOps is used to detect patterns, anomalies, or perform event grouping, training-level filtering provides a non-destructive way to refine the training dataset without altering or deleting the underlying data stored in Cassandra.

This document outlines the purpose, configuration, and impact of training-level filtering. While training filters are not enabled by default, efforts are underway to embed this capability more seamlessly into the platform.

Training-level filtering allows you to exclude specific types of events from being used in AI model training, particularly noisy, low-relevance, or operationally expected alerts, while retaining those events within the system for visibility, historical tracking, or other analytical purposes.

Use Case

Training-level filtering is especially valuable when certain types of alerts are frequent, low-value, or operationally expected, and may negatively affect model accuracy or performance if included in training. These filters help refine the dataset without deleting or excluding the events from system visibility or historical tracking.

Common Scenarios for Filtering

  • Severity-Based Exclusion
    In many environments, low-severity alerts (e.g., severity 1 or 2) generate high event volumes but have
    low operational relevance for AI models. These alerts often originate from non-critical components such as:

    • PING

    • CD/DVD devices

    • Floppy drives

Excluding such alerts improves training efficiency, reduces noise, and helps avoid out-of-memory (OOM) issues caused by high-volume, low-value data.

 Example Filter Expression:

value: "not(array_contains(severity,2)) and not(array_contains(severity,1))"
  • Excluding Operationally Expected Events
    Some events, although legitimate, are
    predictable and recurring, making them unsuitable for anomaly detection or event grouping. These often involve automated infrastructure actions or scheduled operational tasks.

    Examples of Events to Exclude from Training:

    • Virtual machine movements or resizes that are part of standard operational tasks:

      • Move Virtual Machine vprox48 from vmw-hostdl9-1-10-kn.cl to vmw-hostdl13-1-4-kn.cl

      • Resize up VCPU for Virtual Machine UPMDEK8MAS02 from 16 to 24 vCPUs

    • Automated provisioning of new machines:

      • Provision Virtual Machine similar to upcoqmas04

    • Volume migration across storage arrays:

      • Move Volume mwc-ihsfog1 Disk 1 of Virtual Machine mwc-ihsfog1 from CQN_HTCH_Srv_BAW_FOGAPE to CQN_VMAX_Srv_Veritran_Core

 Example Filter Expression:

value: 'summary not like "Move Virtual Machine % from % to %"
and summary not like "Resize % VCPU for Virtual Machine % from % to % vCPUs"
and summary not like "Provision Virtual Machine similar to %"
and summary not like "Move Volume % Disk % of Virtual Machine % from % to %"'

Notes:

  • Training-level filtering is designed to exclude data from model training only, not to modify or delete any data already stored in the system.

  • Requires manual configuration via environment variables in the ConfigMap.

How Training-Level Filtering Works

Training filters are implemented via environment variables, within the training (spark-pipeline-composer) component. These filters are disabled by default and must be explicitly enabled and configured.

Enabling Training-Level Filtering

Step-by-Step Configuration

  1. Enable Filtering
    Set the
    EA_TRAINING_FILTER_ENABLED environment variable to "true".

  2. Define the Filter Logic
    Use the
    EA_TRAINING_FILTER variable to specify filter conditions, using expressions against event attributes (e.g., severity, source, etc.).

This can be set using the custom configMap.
Example: Exclude Severity 1 & 2 Events

apiVersion: v1
data:
  profiles: |
    generatedfor: HA
    operandconfigs:
    - name: ir-ai-operator
      spec:
        aiopsanalyticsorchestrator:
          customEnv:
          - containers:
            - env:
              - name: EA_TRAINING_FILTER_ENABLED
                value: "true"
              - name: EA_TRAINING_FILTER
                value: "not(array_contains(severity,2)) and not(array_contains(severity,1))"
              name: spark-pipeline-composer
            kind: Deployment
            name: spark-pipeline-composer
kind: ConfigMap
metadata:

This example excludes all alerts with severity = 1 or 2 from the training process.

Conclusion

Training-level filtering is a strategic feature for teams looking to improve model quality and avoid irrelevant grouping of events without sacrificing access to historical or operational data. It provides fine-grained control over the training dataset, allowing you to reduce noise and tailor the training to the related data.

While currently a manual configuration, this capability will be more tightly integrated into future versions of the AIOps platform.

2 comments
64 views

Permalink

Comments

Wed May 21, 2025 05:49 AM

@Sateesh Racharla No, the temporal grouping logic in CP4AIOps is not solely based on the Summary field. The platform supports flexible filtering mechanisms using various ObjectServer fields within the alert payload. For example, to group alerts where eventType equals "problem", you can define a condition such as type.eventType LIKE "%problem%". This behaviour is configurable and supported in the current AIOps implementation, allowing fields like AlertKey, Node, or any other relevant attributes to be used based on the use

Tue May 20, 2025 12:02 PM

Hi @Vinayaka Hanumanthappa,

Could you confirm which ObjectServer field is used for temporal group training in CP4AIOps? Is the grouping logic currently based on the Summary field?
If so, is it possible to configure or override this behavior to use a different field such as AlertKey instead of Summary?