IBM Z and LinuxONE - IBM Z

IBM Z

The enterprise platform for mission-critical applications brings next-level data privacy, security, and resiliency to your hybrid multicloud.

 View Only

Unlocking Deep Network Visibility with the Network Observability Operator on Openshift on IBM Z and IBM LinuxONE

By Jitendra Singh posted 8 days ago

  

Author: Jitendra Singh, Rishika Kedia, Niha Tahoor, Ayana Rukasar

Enterprise Problem Statement and Use Case

Modern enterprises, especially in regulated sectors like banking, telecommunications, and government face

increasing challenges in ensuring secure and reliable communication between microservices. As cloud-

native architectures scale in complexity, traditional observability tools often fall short in diagnosing issues

like DNS delays, network policy misconfigurations, and unauthorized traffic patterns.

Real-World Challenge

Consider a global financial institution that operates hundreds of microservices on Red Hat® OpenShift®

across multiple regions. During peak trading hours, end-users report timeouts and latency spikes. Despite

healthy metrics and clean logs, the operations team is unable to pinpoint the issue.

Root Cause

After extensive troubleshooting, the issue was traced to intermittent DNS resolution failures and misconfigured

network policies—problems that traditional monitoring solutions failed to detect. This resulted in extended

downtime, compliance concerns, and reputational damage.

The Need for Deep Network Visibility

In today’s dynamic, cloud-native environments, understanding the flow and behavior of network traffic is critical to maintaining performance, security, and reliability. As microservices interact across nodes and namespaces, visibility into how they communicate becomes essential. This is where the Network Observability Operator for Red Hat® OpenShift® comes in—a Kubernetes-native solution that delivers real-time insights into network traffic by collecting, enriching, and visualizing flow data across your OpenShift clusters.

What Is the Network Observability Operator?

The Network Observability Operator is a Kubernetes-native tool designed specifically for OpenShift clusters. It captures and analyzes network flow data using an eBPF (Extended Berkeley Packet Filter)- based agent deployed as a privileged DaemonSet across cluster nodes. This agent hooks into the Linux Traffic Control (tc) layer to passively listen to ingress and egress packets. These raw flow logs are then processed through a Flowlogs-Pipeline that enriches the data with Kubernetes metadata and forwards it to observability backends like Grafana Loki, Kafka, and Prometheus.

Key Capabilities:

·       Tracks communication between Pods, Deployments, Services, and Routes

·       Enables real-time traffic analysis and topology mapping

·       Supports both flow visibility and network performance troubleshooting

A diagram of a flowchart

AI-generated content may be incorrect.

Figure: Network Observability Architecture

Core Functional Areas of the Operator

The Network Observability Operator is not just about visualizing traffic, It delivers six key functional areas:

           

                       

Figure: Functional Areas and Use Cases

Installing the Network Observability Operator :

You can install the Network Observability Operator using the OpenShift Container Platform web console OperatorHub.

When installed the Operator, it provides the FlowCollector custom resource definition  (CRD). You can set specifications in the web console when you create the FlowCollector.

Prerequisites

·       If you choose to use Loki, install the Loki Operator version 5.7+.

·       You must have cluster-admin privileges.

·       One of the following supported architectures is required: amd64ppc64learm64, or s390x.

·       Any CPU supported by Red Hat® Enterprise Linux® (RHEL) 9.

·       Must be configured with OVN-Kubernetes as the primary network plugin. Optionally, you can use Multus and SR-IOV to configure secondary interfaces.

Procedure

1.   In the OpenShift Container Platform web console, click Operators → OperatorHub.

2.   Choose Network Observability Operator from the list of available Operators in the OperatorHub, and click Install.

A screenshot of a computer

AI-generated content may be incorrect. A screenshot of a computer

AI-generated content may be incorrect. A screenshot of a computer

AI-generated content may be incorrect.

3.   Navigate to the Flow Collector tab, and click Create FlowCollector. Make the following selections in the form view:

A screenshot of a computer

AI-generated content may be incorrect.

  • spec.agent.ebpf.Sampling: Specify a sampling size for flows. Lower sampling sizes can have higher impact on resource utilization. For more information, see the "FlowCollector API reference", spec.agent.ebpf.sampling.
  • If you are not using Loki, click Loki client settings and change Enable to False. The setting is True by default.
  • If you are using Loki, set the following specifications:
    • spec.loki.mode: Set this to the LokiStack mode, which automatically sets URLs, TLS, cluster roles and a cluster role binding, as well as the authToken value. Alternatively, the Manual mode allows more control over configuration of these settings.
    • spec.loki.lokistack.name: Enter the name of your LokiStack resource. In this documentation, loki is used as the example LokiStack name.
  • Optional: If you are in a large-scale environment, consider configuring the FlowCollector with Kafka for forwarding data in a more resilient, scalable  way. See "Configuring the Flow Collector resource with Kafka storage" in the "Important Flow Collector configuration considerations" section.
  • Optional: Configure other optional settings before the next step of creating the FlowCollector. For example, if you choose not to use Loki, then you can configure exporting flows to Kafka or IPFIX.

See "Export enriched network flow data to Kafka and IPFIX" and more in the "Important Flow Collector configuration  considerations" section.

4.   Click Create.

A screenshot of a computer

AI-generated content may be incorrect.

A screenshot of a computer

AI-generated content may be incorrect.

Verification

To verify successful installation, navigate to the Observe section in the OpenShift web console. You should see Network Traffic listed as an available option.

A screenshot of a computer

AI-generated content may be incorrect.

Other Powerful Visualizations in OpenShift Console

  • NetFlow Table – Displays detailed pod-to-pod network traffic. Supports filtering, exporting, and sorting.
  • NetFlow Overview – Offers top talkers, packet drop rates, DNS latency and trends.
  • NetFlow Topology – Provides a dynamic graph-based visualization of communication paths within the cluster.

A screenshot of a computer

AI-generated content may be incorrect.   A screenshot of a computer

AI-generated content may be incorrect.

A screenshot of a computer

AI-generated content may be incorrect.

Figure: NetFlow Visuals in OpenShift Console

Why Use the Network Observability Operator?

  • As illustrated in the image below, the Network Observability Operator plays a critical role in OpenShift environments. In complex Kubernetes setups with thousands of microservices, traditional tools often leave administrators blind to the root causes of network issues like latency, DNS failures, and unknown traffic flows.
  • The Network Observability Operator addresses these challenges by offering deep, real-time visibility into cluster network activity all from within the OpenShift console. It captures pod-to-pod traffic using lightweight eBPF probes and enriches it with Kubernetes metadata, enabling context-aware insights.  
  • With features like flow graphs, latency heatmaps, DNS monitoring, and protocol-level summaries, teams can immediately visualize how services interact and spot anomalies before they escalate.
  • The dynamic topology view uncovers the entire communication map across workloads, helping administrators and SREs trace issues faster, validate policy enforcement, and confidently troubleshoot in seconds—not days. It transforms reactive firefighting into proactive network management.

A cartoon of two people sitting at a table

AI-generated content may be incorrect.

Benefits include:

·       Rapid identification of network issues.

·       Latency and performance bottleneck detection

·       Dynamic topology view as enrich dependency mapping.

·       Provides the visibility and data necessary for traffic optimization and auditing.

·       DNS tracking and resolution monitoring

·       Compliance and policy enforcement

Conclusion

The Network Observability Operator brings a modern, Kubernetes-native approach to network monitoring enabling DevOps, SREs, and platform teams to gain complete visibility into how services communicate in OpenShift environments. If you're looking to move beyond black-box monitoring and toward full-spectrum observability, this operator is the solution.

0 comments
28 views

Permalink