Whitepaper: Applying Evolutionary Computation to SMF-Derived Enterprise Insights on z/OS
Author: Bruce McKnight
Teaching decades-old mainframes some new AI tricks.
Executive Summary
This paper presents a prototype methodology of genetic algorithms (GA) and cellular automata (CA) to show how AI can extract insights from legacy mainframe data and application performance, helping enterprises make smarter decisions while preserving operational stability. It explores how these evolutionary computing techniques—GA and CA—can identify anomalous patterns and anticipate potential system failures. Using SMF data and system performance metrics for illustration purposes, the methods uncover insights that traditional deterministic monitoring may miss.
These approaches remain at the prototype stage, serving as proofs-of-concept rather than finalized solutions. While the examples focus on SMF data, the methodology can also be applied to application performance, providing a flexible framework for exploring performance anomalies and operational trends across the enterprise.
GA and CA are well-established techniques with decades of engineering applications. Applying them to mainframe systems demonstrates how even long-standing platforms can learn new AI “tricks”, enabling enterprises to bridge legacy systems with next-generation analytics while maintaining operational stability and compliance.
Key outcomes:
- Optimize workload scheduling and resource allocation
- Detect anomalies in system behavior before they impact operations
- Identify emergent patterns in system usage for strategic decision-making
Disclaimer: This whitepaper describes a research and prototyping approach intended for exploratory and educational purposes only. It is not intended for production use, and considerable development, validation, and testing are required before applying these methods to real enterprise data. The author assumes no responsibility for any losses, damages, or operational impacts arising from decisions, actions, or implementations based on the ideas, methodology, or code samples presented herein. Use of the concepts and code is at the reader’s own risk, and appropriate professional judgment should be exercised before applying any of the described techniques to operational systems.
Introduction to Evolutionary Computing Concepts
Genetic Algorithms (GA)
Genetic algorithms are optimization and search techniques inspired by the principles of natural selection and evolution. First popularized by John Holland in the 1970s, GA use mechanisms such as selection, crossover, and mutation to evolve solutions to complex problems over successive generations (Holland, 1975). Candidate solutions are encoded as “chromosomes”, and a fitness function evaluates how well each candidate meets the desired objective. Through iterative recombination and mutation, the algorithm “breeds” increasingly effective solutions, making GA particularly well-suited for problems with large, complex, or poorly understood solution spaces. In this paper, GAs are applied to detect anomalous patterns in SMF and application performance data, identifying behaviors that may elude traditional deterministic monitoring.
Cellular Automata (CA)
Cellular automata are discrete computational systems that model complex behavior through simple, local rules applied to a grid of cells. Each cell’s state evolves over time based on the states of neighboring cells, allowing global patterns to emerge from local interactions. Originally introduced by John von Neumann and Stanislaw Ulam in the 1940s, CAs have been used to simulate everything from fluid dynamics to biological growth patterns (Wolfram, 1984). In the context of mainframe environments, CAs can model how small deviations in system or application behavior propagate, helping anticipate potential failures while the signals are still too weak to be picked up by conventional monitoring.
Together, GA and CA offer complementary perspectives: GA searches for optimal patterns in complex datasets, while CA simulates the dynamic evolution of system states over time (Mitchell, 2009). By combining these techniques, enterprises using mainframes can explore both anomaly detection and predictive modeling, leveraging decades-old mathematical tools to extract new insights from operating systems and legacy applications.
Methodology / Approach
Overview of the workflow:
· SMF records are exported into Python running natively in z/OS USS for preprocessing, feature engineering, and algorithm prototyping.
· Evolutionary computation techniques—Genetic Algorithms (GA) and Cellular Automata (CA)—are applied to generate insights for performance optimization, anomaly detection, and emergent behavior analysis.
Using a vendor-neutral Python approach ensures that the project focuses on the problem, not on mastering a proprietary toolset.
Conceptual Bridge: From Theory to Practice
To illustrate how these evolutionary computing techniques can be applied to mainframe data, consider a conceptual workflow: GA and CA operate on system and application metrics in complementary ways. The genetic algorithm treats each snapshot of SMF records or application performance metrics as a candidate solution, evaluating them against a fitness function designed to detect anomalous patterns.
Through iterative “evolution “, the GA highlights behaviors or configurations that deviate from expected norms. Meanwhile, the cellular automata component models the dynamic evolution of system states over time, simulating how small deviations propagate and interact, revealing potential failures long before they are detectable through conventional monitoring.

Fig. 1: Approaches to Evolving Enterprise Insights from SMF Data
Visual Analogy: Making Evolutionary Computing Tangible
Think of genetic algorithms as a master gardener cultivating a new plant variety. Each candidate solution is like a seed: some grow stronger, some weaker, and only the fittest survive and propagate. Through cycles of selection, recombination, and mutation, the garden evolves toward optimal solutions—revealing hidden patterns in large, complex datasets.
Cellular automata, by contrast, are like ripples spreading across a pond. Each cell’s behavior influences its neighbors, and small local changes can cascade into global patterns over time. By observing these interactions, we can anticipate how subtle deviations in system or application behavior might evolve into performance issues or failures before conventional monitoring detects them.
Together, GA “discovers” what to watch for, while CA “simulates” how it might unfold—offering a complementary lens on enterprise systems. Hopefully, this analogy provides a conceptual bridge that helps readers visualize the logic behind the prototype methodology before engaging with Python code, SMF data, or GA/CA simulations.
SMF Data as Input
SMF collects extensive system, application, and network metrics. For enterprise-focused AI/ML, the following record types are most valuable:
|
SMF Type
|
Description
|
Potential Analysis / ML Use
|
|
70-79
|
CPU usage, I/O activity, paging, and system resource metrics
|
GA optimization of workload scheduling; detect resource bottlenecks
|
|
30
|
Job accounting (start/stop times, CPU, elapsed time)
|
Predictive workload optimization, anomaly detection, performance tuning
|
|
42
|
I/O activity (DASD, tape, channels)
|
Identify high-latency paths, optimize throughput
|
|
72-77
|
Security events (ACF2/RACF failures, logins, authority violations)
|
CA simulation of threat propagation, anomaly detection
|
|
110-115
|
Network activity (TCP/IP, VTAM)
|
Pattern detection for peak usage and potential failures
|
|
101-102
|
Db2 performance / transaction statistics
|
GA to optimize queries or transaction scheduling
|
Dataset size: Tens of millions of records per month for a typical enterprise mainframe; includes structured fields (numeric, categorical) ideal for Python-based prototyping.
Genetic Algorithm Approach (Pseudo-Code)

Fitness function example (pseudo-code):

Key outcomes:
· Optimized batch job schedules
· Reduce CPU/I/O bottlenecks
· Identify emergent workload patterns
Genetic Algorithm (GA) Code Block Narrative
Purpose: The GA code block simulates an optimization process to improve mainframe workload scheduling. It tries to find the best way to assign jobs to system resources balance CPU and I/O usage, and to minimize performance bottlenecks.
TLDR Summary: This section demonstrates a method for iteratively improving mainframe workload scheduling using evolutionary principles. The algorithm starts with a set of possible schedules, evaluates how well each balances CPU and I/O usage, and then selectively combines and tweaks the best-performing schedules to create a new generation. Over multiple cycles, this process produces schedules that better optimize system resource utilization. Readers unfamiliar with Python or programming can focus on the overall concept: by simulating “natural selection” of schedules, we can discover more efficient ways to manage workload without manually testing every possibility. The reader can safely skip the step-by-step narrative without losing continuity.
Step-by-Step Narrative:
- Generate initial options (population):
The program starts by creating a group of “candidate schedules” randomly. Each schedule is a possible arrangement of jobs that the mainframe could execute.
- Evaluate performance (fitness):
Each candidate schedule is tested to see how well it balances system load. Schedules that spread the CPU and I/O usage more evenly get higher scores.
- Select the best options:
Only the top-performing schedules are chosen to move to the next step. Think of it like selecting the fittest animals in a population for breeding.
- Create new schedules (crossover and mutation):
The chosen schedules are combined and slightly changed to create a new generation of candidate schedules. This introduces variety and explores new ways to improve performance.
- Repeat over generations:
This process repeats multiple times, each cycle producing a better set of schedules, until the “best” schedule emerges.
- Output the best schedule:
After all generations, the program selects the single best schedule to use as a prototype for optimization analysis.
Analogy: Imagine a city planner experimenting with different traffic patterns. They start with random traffic light schedules, observe which patterns reduce congestion, mix the best patterns together, adjust slightly, and iterate until traffic flows optimally.
Cellular Automata Approach (Pseudo-Code)

Key outcomes:
· Models trained to detect propagation of anomalies (job failures, security violations) across system components
· Emergent behavior informs proactive monitoring and mitigation strategies
Cellular Automata (CA) Code Block Narrative
Purpose: The CA code block models system behavior as a grid of interacting elements, where each element represents a resource, job, or system node. The simulation helps identify areas at risk for failures or bottlenecks.
TLDR Summary: This section illustrates a technique for modeling how system states and workloads propagate across a mainframe environment. Each element of the system is represented as a cell in a grid, and simple rules determine how a cell’s state changes based on its neighbors. By simulating these interactions over time, patterns emerge that reveal which nodes are at higher risk of congestion or failure. Readers do not need to follow the Python details to understand the key idea: this approach provides a dynamic way to visualize potential problem areas and understand how small issues can ripple through the system. The reader can safely skip the step-by-step narrative without losing continuity.
Step-by-Step Narrative:
- Create a grid representation:
The mainframe system is represented as a grid. Each square in the grid holds information about a system component (e.g., CPU, storage, job queue) and its current state (like load, risk of failure, or delay).
- Apply rules iteratively:
The program repeatedly updates each square based on simple rules, taking into account the state of neighboring squares. For example, if neighboring nodes are overloaded, a node may also become high-risk.
- Simulate over time:
Each iteration represents a time step. Over multiple iterations, patterns emerge showing how problems or bottlenecks might spread through the system.
- Identify high-risk nodes:
After the simulation, the program identifies nodes that consistently show high risk. These nodes are candidates for monitoring, intervention, or resource reallocation.
Analogy: Think of a forest fire simulation. Each tree (node) can catch fire depending on its neighbors. By running the simulation over time, you can see which areas are most likely to burn and take preventive action. Similarly, the CA code predicts which parts of a mainframe system might experience cascading problems.
Illustrative Results (Conceptual)
GA Scheduling Simulation:
- Objective: Reduce peak CPU utilization
- Result: 12–15% improvement in CPU balance across jobs
- Emergent insights: Certain low-priority jobs could be delayed to reduce resource spikes
Figure 2 (below) shows fitness gradually improving over 20 generations, giving a sense of how GA “evolves” solutions over time. The data is illustrative and does not represent an actual environment. Note that the GA’s fitness evolves rapidly over time as it perfects itself then improvement from one generation to the next diminishes. Somewhere around Generation #11, the model’s evolution peaks, indicating that the model is optimized and further improvements may not deliver additional benefits that justify the additional expense and effort. “Humans in the loop” (HITL) should determine what fitness level aligns with the organization’s strategy. It will likely vary with each use case.

Fig 2. Genetic Algorithm: Fitness Over Generations. (No real SMF data was used. Data shown is purely illustrative of the concept.)
CA Anomaly Propagation:
- Objective: Identify nodes at high risk for cascading failures
- Result: Predicted hotspots matched historical minor outage data
- Observation: CA visualization helps operations teams proactively allocate monitoring and redundancy

Fig 3. Cellular Automata Propagation Grid to visualize emerging anomalies. (No real SMF data was used. Data shown is purely illustrative of the concept.)
The grid in figure 3 is a bit more nuanced and requires deeper explanation. It represents a simplified, illustrative model of system or application behavior derived from SMF and performance metrics. Each cell reflects the state of a discrete metric—such as CPU utilization, I/O activity, or transaction counts—where an “active” cell signals an anomaly or deviation from normal behavior. Only active cells appear on the grid.
Unlike traditional threshold-based deterministic monitoring dashboards that indicate a particular KPI’s health, the grid emphasizes relationships and propagation: the state of each cell evolves based on its own value and the states of neighboring cells, capturing how small deviations can ripple across the system.
Clusters of active cells highlight potential problem areas, while the evolution of patterns over iterations provides early warning of issues before conventional alerts are triggered. Since the CA system consumes SMF data in real time, an interactive display system can provide live monitoring with drill-down capability. It could also explain its rationale for each square’s rating.
By observing both the formation of clusters and the propagation trends, even among seemingly unrelated cells, human operators or AI agents can prioritize attention and anticipate subsystem or application failures, turning raw SMF data into actionable insight. This approach demonstrates how decades-old mainframe systems can be “taught” to reveal hidden signals using simple, visualizable AI models.
Prototype Methodology Takeaways
- Adaptive Resource Management: GA-based schedules enable dynamic optimization of batch jobs and system workloads.
- Proactive Anomaly Detection: CA modeling of SMF-derived system behavior allows early intervention.
- Safe AI Experimentation: Native Python prototyping minimizes risk and contains research costs while exploring high-value insights.
- Strategic Modernization: Combining mainframe operational knowledge with AI/ML signals capability for executive-level modernization initiatives.
Conclusion
In this paper, we explored two distinct applications of evolutionary computing: genetic algorithms (GA) to identify anomalous patterns in SMF data, and cellular automata (CA) to anticipate potential system failures before signals become detectable by traditional deterministic performance monitors. While both concepts have been used for decades in multiple applications, the approaches in this paper remain at the prototype stage, serving as proofs-of-concept rather than finalized solutions.
While our examples focus on SMF data, the methodology can also be applied to analyzing application performance, providing a flexible framework for uncovering insights across the mainframe ecosystem.
Even at this exploratory stage, the methodology demonstrates how enterprises can leverage AI intelligently—bridging legacy systems with next-generation analytics—while maintaining operational stability and compliance. GA and CA are well-established techniques with decades of engineering applications, yet applying them to mainframe performance shows that even the most established systems can learn new AI tricks, revealing insights that were previously hidden.
References
- Mitchell, Melanie. Complexity: A Guided Tour. Oxford University Press, 2009
- Holland, John H. Adaptation in Natural and Artificial Systems. University of Michigan Press, 1975
- Wolfram, Stephen. A New Kind of Science. Wolfram Media, 2002