Fans of IBM Z Hub

Fans of IBM Z Hub

Fans of IBM Z Hub

Join us and share the love of IBM Z with our global community!

 View Only

Harnessing the power of IBM LinuxONE to combat electricity theft – A Datathon winning team’s journey!

By Simanga Mchunu posted Thu April 24, 2025 01:26 PM

  

Harnessing the power of IBM LinuxONE to combat electricity theft – A Datathon winning team’s journey!

Authors: Simanga Mchunu(Team leader) - Software Engineer graduate, ALX Africa, simacoder@hotmail.com, Nkosinathi Nhlapo - Data Science graduate, ALX Africa, pronkosinathi@mail.com,  Kagiso Leboka - Data Analytics graduate, ALX Africa, kagisogrant@gmail.com, Bongani Baloyi - Software Engineer, ALX Africa, bonganibaloyi94@gmail.com

Mentors:  Ajit-Samuel John - Software development Manager, AI on IBM Z,  IBM India Systems  Development Lab, Bangalore, INDIA, Ajit.SamuelJohn@ibm.com, Saurabh Srivastava - AI Architect, AI on IBM Z and LinuxONE, IBM India Systems Development Lab, Bangalore, India, Saurabh.srivastava4@ibm.com

Abstract

Meter fraud and electricity theft pose significant financial and operational challenges for utility providers. Fraudulent activities such as meter tampering and illegal connections lead to substantial revenue losses and an increased burden on legitimate consumers. This study leverages machine learning techniques, including Isolation Forest, Random Forest, and XGBoost, to identify anomalies in energy consumption patterns. The findings highlight that XGBoost provides superior accuracy in detecting fraudulent behavior. The study also incorporates geospatial mapping and trend analysis to enhance fraud detection efforts. This research contributes to the ongoing efforts to mitigate electricity theft using AI-driven solutions. The project was supported by IBM Z and LinuxONE machines and was made possible through collaboration with the Shooting Stars Foundation and dedicated mentors.

Figure 1: The data pipeline of our project, illustrating the end-to-end flow from raw smart meter data ingestion through preprocessing, feature engineering, model training, and real-time fraud detection deployment.

Figure 1: The data pipeline of our project, illustrating the end-to-end flow from raw smart meter data ingestion through preprocessing, feature engineering, model training, and real-time fraud detection deployment.

Introduction

Electricity theft remains a significant issue in South Africa, leading to billions of rands in annual losses. Fraudulent consumers manipulate their smart meters to reduce their bill amounts, negatively impacting utility providers and honest customers. Traditional fraud detection methods rely on manual inspections and rule-based systems, which are inefficient and prone to human error. This study proposes an automated, machine-learning-based approach to detecting fraudulent energy consumption patterns. By leveraging the power of L1CC, we can implement a machine-learning-based approach to detect fraudulent energy consumption patterns. The high-performance computing capabilities of L1CC allow us to process large datasets, train deep learning models efficiently, and detect anomalies in real time.

L1CC offers a powerful, secure, and cost-effective solution for running complex computations, such as fraud detection algorithms. Its sustainability, scalability, and performance enhancements make it an ideal choice for enterprises dealing with vast data-processing requirements. By adopting L1CC, we not only improve operational efficiency but also contribute to a more secure and sustainable IT infrastructure.

Problem Statement

Electricity theft in South Africa poses a significant challenge to utility providers, resulting in substantial economic losses estimated at approximately R20 billion annually (Netwerk24, 2024), (Mujuzi, 2020). According to the article by Jamil Ddamulira Mujuzi, the leading cause of blackouts in South Africa is primarily due to the collapsing Eskom infrastructure as a result of illegal connections. Figure 1 emphasizes the increase in load-shedding hours over the years, indicating the impact of degrading Eskom infrastructures possibly due to illegal connections

Traditional manual detection methods are proving increasingly ineffective and time-consuming in combating this pervasive issue. This necessitates the exploration and implementation of automated, AI-driven solutions capable of real-time anomaly detection and prevention to mitigate the escalating financial burden and ensure the sustainable provision of electricity services.

Figure 2: Load-shedding Trend by hour by stages (Wikipedia, South African Energy Crisis 2025)

Figure 2: Load-shedding Trend by hour by stages (Wikipedia, South African Energy Crisis 2025)

In South Africa, power outages are implemented in 6 stages depending on the demands that the power grid has to meet. The stages of loadshedding are explained in detail below:

Definition: Stage 1: 2 hours of power cut per day( 1000 MW to be taken on selected areas)

     Stage 2:  4  hours of power cut per day(2000 MW to be taken on the whole country)

                 Stage 3: 4 hours of power cut per day (3000 MW to be taken on the whole country)

                 Stage 4: 8 hours of power cut per day(4000 MW to be taken on the whole country)

                Stage 5: at least 8 hours of power cut per day(5000 MW to be taken on whole country)

               Stage 6:  at least 8 hours of power cut per day(6000 MW to be taken on whole country)

Data Collection & Preprocessing

Dataset Features

The dataset includes:

     Customer Data: Meter ID, Timestamp, Province, City, GPS Coordinates

     Energy Metrics: Energy Consumption (kWh), Solar Generation (kWh), Voltage, Frequency, Power Factor

     Fraud Label: Binary classification (Fraud/No Fraud)

     Load Shedding Impact Considered

Figure 3: Distribution of energy consumption for normal versus suspicious users. Suspicious cases show irregular patterns, including sudden drops or spikes, suggesting potential meter tampering or illegal connections.

Figure 3: Distribution of energy consumption for normal versus suspicious users. Suspicious cases show irregular patterns, including sudden drops or spikes, suggesting potential meter tampering or illegal connections.

Preprocessing Steps

     Timestamp Conversion: Extract hour, day, month, and weekday for trend analysis.

     Feature Engineering: Calculate power efficiency and energy consumption per voltage.

     Scaling & Normalization: Standardize features for improved machine learning performance.

Figure 4: SHAP value analysis showing the impact of each feature on the output of the Isolation Forest model. Higher absolute SHAP values indicate greater influence on the model’s anomaly detection decisions

Figure 4: SHAP value analysis showing the impact of each feature on the output of the Isolation Forest model. Higher absolute SHAP values indicate greater influence on the model’s anomaly detection decisions

Machine Learning Models & Evaluation

Implemented Models:

     Isolation Forest: An unsupervised anomaly detection model.

     Random Forest: A robust supervised classification algorithm.

     XGBoost: A gradient-boosting algorithm that excels in fraud detection.

     Logistic Regression: A baseline model for comparison.

Performance Metrics Best Model:

Figure 5: Model accuracy performance comparison across various algorithms. XGBoost achieved the highest accuracy and recall, establishing it as the most effective model for electricity fraud detection in this study.

Figure 5: Model accuracy performance comparison across various algorithms. XGBoost achieved the highest accuracy and recall, establishing it as the most effective model for electricity fraud detection in this study.

Isolation Forest: Theory and Application

Understanding Isolation Forest

Isolation Forest is an unsupervised anomaly detection algorithm based on the concept of Isolation Trees (iTrees). The key concepts include:

     Isolation Trees: The dataset is recursively partitioned by randomly selecting features and splitting points.

     Anomaly Score: The number of splits required to isolate a data point determines its anomaly score. Lower splits indicate a higher likelihood of fraud.

     From iTree to Isolation Forest: Multiple isolation trees aggregate results to improve detection accuracy.

In this study, Isolation Forest helps identify fraudulent users by isolating anomalous energy consumption patterns.

Figure 6: Anomaly score distribution for normal versus anomalous instances, as computed by the anomaly score method. Higher scores indicate a greater likelihood of fraudulent behavior, effectively separating typical consumption from suspicious activity.

Figure 6: Anomaly score distribution for normal versus anomalous instances, as computed by the anomaly score method. Higher scores indicate a greater likelihood of fraudulent behavior, effectively separating typical consumption from suspicious activity.

Figure 7: Relationship between power factor and energy consumption (kWh), annotated by anomaly scores. Outliers with abnormal power factors and consumption levels are flagged as potential fraud cases based on elevated anomaly scores.

Figure 7: Relationship between power factor and energy consumption (kWh), annotated by anomaly scores. Outliers with abnormal power factors and consumption levels are flagged as potential fraud cases based on elevated anomaly scores.

Harnessing the Power of IBM LinuxONE

Figure 8: Harnessing the power of IBM LinuxONE for scalable, high-performance model training and deployment. The enterprise-grade infrastructure enabled rapid prototyping, real-time inference, and seamless integration of fraud detection APIs.

Figure 8: Harnessing the power of IBM LinuxONE for scalable, high-performance model training and deployment. The enterprise-grade infrastructure enabled rapid prototyping, real-time inference, and seamless integration of fraud detection APIs.

Analogous to our peers in Code Catalysts, we, too, experienced the transformative benefits of using IBM LinuxONE. The power of LinuxONE’s enterprise-grade hardware, combined with the ease of use from the LinuxONE Community Cloud, substantially accelerated our experiments. Here’s how LinuxONE played a key role:

     Rapid Prototyping: Our initial model training and tuning, which would have taken considerably longer on standard CPUs, were completed swiftly thanks to LinuxONE’s advanced processing capabilities.

     Docker and Open-Source Tools: Our environment was set up using Docker containers running Ubuntu Linux — ensuring consistency with our local development environments. All our favorite machine learning libraries like PyTorch, TensorFlow, and SciKit-Learn were pre-installed and optimized for the s390x architecture.

     Scaling Inference: Once trained, deploying the model on LinuxONE allowed us to achieve inference speeds that far outpaced our local systems. This low latency is critical when monitoring millions of meter readings in real time, equipping utility companies to detect and respond to theft instantly.

Deployment and Beyond


After thorough training and validation, we deployed our model as a REST API on the LinuxONE environment. This API now processes live data feeds from smart meters and flags suspicious consumption patterns in under a second, transforming what was once a laborious manual investigation into an automated, scalable solution. Much like our learned experiences from IBM mentors and contributions from the open-source community, LinuxONE’s capabilities reinvigorated our approach to large-scale ML deployments.

Looking Ahead


Our journey with IBMZ Datathon has been both challenging and immensely rewarding. The support from IBM representatives and mentors, combined with the power of IBM LinuxONE, has set the stage for further research. We envision integrating advanced data pipelines, real-time analytics, and even more refined anomaly detection techniques in future deployments. For now, we celebrate our win at the IBMZ competition and look forward to transforming the landscape of energy theft detection in South Africa.

Results & Insights

Trend Analysis

     Fraudulent activities peak between 12 AM and 5 AM, indicating suspicious behavior.

     Fraudsters exhibit sudden drops in consumption, suggesting meter tampering.

     High-risk 5 cities include Rusternburg, Johannesburg, Cape Town, Durban, and Bloemfontein.

Figure 9: Top 10 cities ranked by likelihood of electricity fraud, identified using the Local Outlier Factor (LOF) method. Cities with higher LOF scores exhibit greater deviation from normal consumption patterns, indicating elevated fraud risk.

Figure 9: Top 10 cities ranked by likelihood of electricity fraud, identified using the Local Outlier Factor (LOF) method. Cities with higher LOF scores exhibit greater deviation from normal consumption patterns, indicating elevated fraud risk.

Geospatial Fraud Mapping

     Used GeoPandas & Shapefiles to visualize fraud hotspots.

     High-risk areas were mapped, enabling targeted inspections.

Figure 10: Geospatial map highlighting areas with a high likelihood of electricity fraud, created using shapefiles and GeoPandas. Red dots represent specific locations with elevated anomaly scores, signaling potential instances of meter tampering or illegal connections.

Figure 10: Geospatial map highlighting areas with a high likelihood of electricity fraud, created using shapefiles and GeoPandas. Red dots represent specific locations with elevated anomaly scores, signaling potential instances of meter tampering or illegal connections.

Actionable Recommendations

For Utility Companies

     Deploy XGBoost-based fraud detection models for real-time monitoring.

     Increase meter inspections in high-risk areas.

     Implement automated alerts for anomaly detection.

For Policy Makers

     Strengthen laws against electricity theft (BusinessTech, 2024).

     Introduce stricter penalties for fraudsters.

For Consumers

     Raise awareness about the risks of illegal connections.

     Provide incentives for legal energy use and solar adoption.

Conclusion & Future Work

Conclusion:

This study and our project that won in IBM Z datathon 2024  demonstrates that machine learning can effectively detect fraud in smart meters. XGBoost outperformed other models and can be deployed for real-time monitoring.

the models were trained and validated using the approach documented in our notebook on GitHub.

Future Work:

     Develop a real-time fraud detection API (FastAPI).

     Integrate deep learning models (LSTMs) for advanced pattern recognition.

     IoT integration with smart meters for proactive fraud prevention.

Acknowledgments

This project was the first-place winning solution at IBM Z Datathon 2024 hosted by Shooting Stars Foundation. The project was made possible thanks to dedicated support of our IBM mentors and the power of IBM LinuxONE machines.

References

Netwerk24 (2024). Meter Tampering: The Facts. Netwerk24. Available at: https://www.netwerk24.com/netwerk24/za/weskusnuus/nuus/meter-tampering-the-facts-20240909-2


BusinessTech (2024). D-Day for Electricity Theft in South Africa. BusinessTech. Available at: https://businesstech.co.za/news/energy/800437/d-day-coming-for-south-africans-stealing-electricity/ .

Council for Scientific and Industrial Research (CSIR) (2024). CSIR Releases Statistics on Power Generation in South Africa. CSIR. Available at: https://www.csir.co.za/csir-releases-statistics-on-power-generation-south-africa-2024 .


Cape Business News (CBN) (2024). The Growing Threat of Electricity Meter Tampering in South Africa: A Call for Smart Solutions. CBN. Available at: https://www.cbn.co.za/featured/the-growing-threat-of-electricity-meter-tampering-in-south-africa-a-call-for-smart-solutions/ .


Liu, F. T., Ting, K. M., and Zhou, Z. H. (2022). Isolation-Based Anomaly Detection. arXiv. Available at: https://arxiv.org/abs/2206.06602 .

South African Energy Crisis (2025) Wikipedia. Available at: https://en.wikipedia.org/wiki/South_African_energy_crisis.

Mujuzi, J.D. (2020) Electricity theft in South Africa: Examining the need to clarify the offence and pursue private prosecution?, Obiter. Available at: https://www.scielo.org.za/scielo.php?script=sci_arttext&pid=S1682-58532020000100005.

0 comments
23 views

Permalink