Overview
In environments where AIOps is used to detect patterns, anomalies, or perform event grouping, training-level filtering provides a non-destructive way to refine the training dataset without altering or deleting the underlying data stored in Cassandra.
This document outlines the purpose, configuration, and impact of training-level filtering. While training filters are not enabled by default, efforts are underway to embed this capability more seamlessly into the platform.
Training-level filtering allows you to exclude specific types of events from being used in AI model training, particularly noisy, low-relevance, or operationally expected alerts, while retaining those events within the system for visibility, historical tracking, or other analytical purposes.
Use Case
Training-level filtering is especially valuable when certain types of alerts are frequent, low-value, or operationally expected, and may negatively affect model accuracy or performance if included in training. These filters help refine the dataset without deleting or excluding the events from system visibility or historical tracking.
Common Scenarios for Filtering
-
Severity-Based Exclusion
In many environments, low-severity alerts (e.g., severity 1 or 2) generate high event volumes but have low operational relevance for AI models. These alerts often originate from non-critical components such as:
-
PING
-
CD/DVD devices
-
Floppy drives
Excluding such alerts improves training efficiency, reduces noise, and helps avoid out-of-memory (OOM) issues caused by high-volume, low-value data.
Example Filter Expression:
value: "not(array_contains(severity,2)) and not(array_contains(severity,1))"
-
Excluding Operationally Expected Events
Some events, although legitimate, are predictable and recurring, making them unsuitable for anomaly detection or event grouping. These often involve automated infrastructure actions or scheduled operational tasks.
Examples of Events to Exclude from Training:
-
Virtual machine movements or resizes that are part of standard operational tasks:
-
Automated provisioning of new machines:
-
Volume migration across storage arrays:
Example Filter Expression:
value: 'summary not like "Move Virtual Machine % from % to %"
and summary not like "Resize % VCPU for Virtual Machine % from % to % vCPUs"
and summary not like "Provision Virtual Machine similar to %"
and summary not like "Move Volume % Disk % of Virtual Machine % from % to %"'
Notes:
-
Training-level filtering is designed to exclude data from model training only, not to modify or delete any data already stored in the system.
-
Requires manual configuration via environment variables in the ConfigMap.
How Training-Level Filtering Works
Training filters are implemented via environment variables, within the training (spark-pipeline-composer) component. These filters are disabled by default and must be explicitly enabled and configured.
Enabling Training-Level Filtering
Step-by-Step Configuration
-
Enable Filtering
Set the EA_TRAINING_FILTER_ENABLED environment variable to "true".
-
Define the Filter Logic
Use the EA_TRAINING_FILTER variable to specify filter conditions, using expressions against event attributes (e.g., severity, source, etc.).
This can be set using the custom configMap.
Example: Exclude Severity 1 & 2 Events
apiVersion: v1
data:
profiles: |
generatedfor: HA
operandconfigs:
- name: ir-ai-operator
spec:
aiopsanalyticsorchestrator:
customEnv:
- containers:
- env:
- name: EA_TRAINING_FILTER_ENABLED
value: "true"
- name: EA_TRAINING_FILTER
value: "not(array_contains(severity,2)) and not(array_contains(severity,1))"
name: spark-pipeline-composer
kind: Deployment
name: spark-pipeline-composer
kind: ConfigMap
metadata:
This example excludes all alerts with severity = 1 or 2 from the training process.
Conclusion
Training-level filtering is a strategic feature for teams looking to improve model quality and avoid irrelevant grouping of events without sacrificing access to historical or operational data. It provides fine-grained control over the training dataset, allowing you to reduce noise and tailor the training to the related data.
While currently a manual configuration, this capability will be more tightly integrated into future versions of the AIOps platform.