In the fast-paced world of cloud-native development, a proliferation of CI/CD pipelines is both a sign of progress and a significant operational challenge. As teams independently build, test, and deploy services, the overall landscape can become fragmented and opaque. Questions that are critical for engineering leadership, security auditors, and DevOps teams become nearly impossible to answer:
-
Are our engineering standards being met across all projects?
-
Where are the hidden bottlenecks slowing down our release velocity?
-
Are we exposed to security risks from misconfigured toolchains or unmanaged secrets?
-
Which teams are excelling, and which might need more support?
Answering these questions requires transforming raw operational data from services like IBM Cloud Tekton Pipelines into a centralized source of truth. This post details the architecture and implementation of a secure, multi-user, and persona-driven dashboard designed to provide precisely these insights. We will explore a solution that not only visualizes data but also delivers actionable intelligence for different roles—from a high-level executive summary to granular failure analysis for developers.
In modern DevOps, visibility is everything. As organizations scale their CI/CD practices using powerful tools like Tekton on IBM Cloud, the number of pipelines, triggers, and runs can become overwhelming. Gaining meaningful insights—understanding performance bottlenecks, tracking key metrics, and ensuring security compliance—requires more than just raw logs. It requires a centralized, secure, and insightful dashboard.
This post will guide you through the architecture and implementation of a custom, multi-user CI/CD insights dashboard using Python. We'll focus on a security-first approach to user authentication, demonstrate how to build several powerful analytical views (including DORA and DevSecOps metrics), and finally, cover how to deploy the entire application on Kubernetes.
Core Architecture
Our dashboard will consist of three main components:
- Python Backend Logic: A set of Python functions responsible for authenticating with IBM Cloud, fetching data via its REST APIs, and processing it into a usable format. We'll use libraries like
requests for API calls and pandas for data manipulation.
- Streamlit Frontend: A powerful, open-source Python library that allows us to build and share web apps for data science and machine learning. We'll use it to create our interactive UI and
plotly for rich, dynamic visualizations.
- Kubernetes Deployment: A robust, scalable environment to host our application, making it accessible to users across the organization.
Before diving into the implementation, it's important to establish the core principles that guide the dashboard's design, ensuring it's not just functional but also secure, usable, and effective.
-
Security First: The application is built on the premise that user credentials must never be compromised. API keys are treated as ephemeral, in-memory assets used only to bootstrap a session, and are never persisted to disk on the server.
-
Persona-Driven Views: A one-size-fits-all dashboard is rarely useful. This application is designed with distinct user roles in mind. An executive needs a high-level summary of risk and performance, while a DevOps engineer needs to drill down into pipeline logs and failure reasons. The UI is structured to provide tailored views that answer specific questions for each persona.
-
Actionable Intelligence, Not Just Data: The goal is to move beyond simply displaying metrics. By calculating composite health scores, flagging unvaulted secrets, and highlighting the most common failure reasons, the dashboard provides clear, actionable next steps for improving processes.
-
Performant and Non-Blocking: Fetching data from hundreds of pipelines can be time-consuming. The architecture uses background subprocesses for data collection, presenting a responsive loading screen to the user. This ensures the UI remains interactive and doesn't time out while the backend does the heavy lifting.
1. Secure Authentication & Session Management
Handling user credentials is the most critical aspect of a multi-user application. The goal is to allow users to access data from their own IBM Cloud accounts without ever storing their sensitive API keys on our server's disk. We can achieve this with a secure, session-based workflow.
The Challenge: API Key Security
A user's IBM Cloud API key grants significant access to their account. Storing these keys in a database or even in plaintext files on a server is a major security risk. Our design must ensure API keys are only held ephemerally.
The Solution: A Token-Based Session Flow
-
Initial Login: The user visits the dashboard and is presented with a login screen asking for their IBM Cloud API Key and region.
-
In-Memory Validation: When the user submits the form, our Python backend receives the key. It does not save it. Instead, it immediately uses the key to make a call to the IBM Cloud IAM token endpoint (https://iam.cloud.ibm.com/identity/token). If the call is successful and a bearer token is returned, the API key is valid.
-
Session Creation: Upon successful validation, the server performs two actions:
- It generates a cryptographically secure, unique session ID using a library like
uuid.
- It initiates the data fetching process (covered in the next section) for that user.
-
Server-Side Session Cache: Once the pipeline data is fetched, it's saved to a JSON file on the server's local disk. The file is named using the unique session ID (e.g., session_a1b2c3d4-e5f6-....json). Crucially, the API key is never written to this file. The file only contains the fetched pipeline data and some metadata like a timestamp.
-
Client-Side Token: The unique session ID is sent back to the user's browser, where a small piece of JavaScript stores it in the browser's localStorage.
-
Session Resumption: When the user refreshes the page or returns later, the Streamlit app retrieves the session ID from localStorage. It then looks for the corresponding session file on the server. If found, it loads the data and reconstructs the user's session instantly, without requiring another login.
-
Logout & Expiration: The logout button explicitly deletes the server-side session file and clears the key from localStorage. A background task can also be configured to clean up stale session files that haven't been accessed in a set period (e.g., 24 hours).
This architecture ensures that sensitive API keys only exist in the application's memory for the brief period required to authenticate and are never persisted to disk, mitigating the risk of a leak.
2. Fetching and Processing Tekton Data
With a secure authentication model in place, we can now fetch the data. Interacting with the IBM Cloud API to discover all Tekton pipelines is a multi-step process.
-
Get a Bearer Token: Use the validated user API key to request a temporary JWT bearer token from the IAM service. This token will be included in the Authorization header for all subsequent API calls.
-
Discover Toolchains: The entry point to finding pipelines is through toolchains. Make a GET request to the IBM Cloud Resource Controller API to list all toolchain instances for the user's specified region.
-
Identify Tekton Pipeline Services: Iterate through the tools within each toolchain. We're looking for tools with a tool_type_id of tekton-pipeline.
-
Fetch Pipeline Details: For each Tekton pipeline service you find, use its unique ID to query the Continuous Delivery service API. This will give you a list of all defined Pipelines, their Triggers, and, most importantly, their execution history (PipelineRuns).
-
Enrich and Consolidate: The raw API responses are often siloed. A crucial step is to process and enrich this data. For example, when you fetch a list of PipelineRuns, the response may not include the user-friendly pipeline name. Your Python code should map run data back to its parent pipeline and trigger to create a single, comprehensive data structure. This makes building the dashboard much easier.
To prevent the UI from freezing during this potentially lengthy fetch operation, it's best to run this entire logic in a background subprocess. The Streamlit frontend can display a loading spinner and periodically check if the final session JSON file has been created, loading the dashboard only when the data is ready.
3. Building a Suite of Insightful Dashboards
With a consolidated data model, we can build a suite of dashboards tailored to different stakeholders. Each view is designed to answer a specific set of questions and drive targeted improvements.
Executive Summary & Pipeline Health
-
Target Audience: Engineering Leadership, Product Managers, and Stakeholders.
-
Usability: This view provides a high-level, "at-a-glance" understanding of the entire CI/CD landscape. It answers questions like, "What is our overall operational health?" and "Where should we focus our attention?"
-
Key Features:
-
Executive KPIs: High-level metrics like overall success rate, total applications monitored, and the number of high-risk applications.
-
Action Items: Dynamically generated recommendations, such as "Critical: Review the 5 high-risk applications" or "Medium: Investigate systemic reliability issues."
-
Performance Rankings: A leaderboard showing the top 5 best-performing and bottom 5 worst-performing applications, fostering a culture of excellence and identifying teams that may need support.
-
Health Scorecard: A table ranking every pipeline by a composite "Health Score" (calculated from success rate, run frequency, and duration), making it easy to spot unhealthy or abandoned pipelines.
DORA Metrics Dashboard
-
Target Audience: DevOps Leads, Engineering Managers.
-
Usability: This dashboard implements the four industry-standard DORA (DevOps Research and Assessment) metrics to benchmark engineering team velocity and stability against elite performers. It helps quantify the impact of process improvements over time.
-
Key Features:
-
Deployment Frequency: Counts successful deployments to production environments (identified by pipelines named with "cd-" or "deploy"), showing the daily average release cadence.
-
Lead Time for Changes: Approximates the time from commit to deployment by measuring the average duration of successful deployment pipelines.
-
Change Failure Rate: Calculates the percentage of deployment runs that failed, providing a clear metric for release stability.
-
Time to Restore Service (MTTR): Measures the average time it takes to recover from a failed deployment by calculating the duration between a failure and the next successful deployment of that same pipeline.
DevSecOps & Compliance Dashboard
-
Target Audience: Security Teams, Compliance Officers, and DevOps Engineers.
-
Usability: This is a crucial security-focused view that automates the auditing of toolchain configurations, instantly highlighting compliance gaps and security risks.
-
Key Features:
-
Toolchain Compliance Matrix: A table that automatically scans every toolchain and verifies the presence of critical security tools like SonarQube, Secrets Manager, and evidence/incident repositories.
-
Secret Health Analysis: A powerful and critical check that iterates through every tool's configuration parameters. It flags any API token or secret stored in plain text that is not vaulted via Secrets Manager (i.e., does not use the {vault::...} syntax), providing an immediate, actionable list of critical vulnerabilities.
Application & User-Centric Insights
-
Target Audience: Developers, Team Leads, and DevOps Engineers.
-
Usability: These views allow for deep dives into the performance of specific applications and the activity patterns of individual users. They are designed for root cause analysis and identifying recurring issues.
-
Key Features:
-
Application Drill-Down: Select a single application to see its specific KPIs, success rate trends, and a breakdown of its most common failure reasons, aggregated from run logs.
-
User Performance Summary: A table showing every user's total number of runs, their personal success rate, average pipeline duration, and last activity time. This can help identify power users or individuals who may be struggling with pipeline failures.
-
Failure Correlation: Pinpoints which users have the highest number of failed builds, helping to correlate specific changes or activities with pipeline instability.
4. Deploying on Kubernetes
Once the application is built, we need to deploy it in a scalable and reliable way. Kubernetes is the perfect fit.
Step 1: Containerize the Application
First, create a Dockerfile to package your Streamlit app into a container image.
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8501
CMD ["streamlit", "run", "sps_dashboard_multi_user.py", "--server.port=8501", "--server.address=0.0.0.0"]
Build and push this image to a container registry like Docker Hub or IBM Cloud Container Registry.
Step 2: Create Kubernetes Resources
We'll need three key Kubernetes resources: a PersistentVolumeClaim to store our session files, a Deployment to run our app pods, and a Service to expose them.
- PersistentVolumeClaim (pvc.yaml): Session data must persist even if a pod restarts. A PVC provides durable storage for our
.streamlit_sessions directory.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: dashboard-session-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
- Deployment (deployment.yaml): This defines how to run our application pods. We mount the PVC at the path where we will store the session files.
apiVersion: apps/v1
kind: Deployment
metadata:
name: sps-dashboard-deployment
spec:
replicas: 2
selector:
matchLabels:
app: sps-dashboard
template:
metadata:
labels:
app: sps-dashboard
spec:
containers:
- name: dashboard
image: your-registry/sps-dashboard:latest
ports:
- containerPort: 8501
volumeMounts:
- name: session-storage
mountPath: /app/.streamlit_sessions
volumes:
- name: session-storage
persistentVolumeClaim:
claimName: dashboard-session-pvc
- Service (service.yaml): This creates a stable internal network endpoint for our deployment.
apiVersion: v1
kind: Service
metadata:
name: sps-dashboard-service
spec:
selector:
app: sps-dashboard
ports:
- protocol: TCP
port: 80
targetPort: 8501
Apply these files to your cluster: kubectl apply -f pvc.yaml -f deployment.yaml -f service.yaml.
Step 3: Expose via Ingress
Finally, use an Ingress controller to expose the service to the internet securely with a user-friendly URL.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: sps-dashboard-ingress
annotations:
kubernetes.io/ingress.class: "nginx"
spec:
rules:
- host: "sps-dashboard.yourcompany.com"
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: sps-dashboard-service
port:
number: 80
Apply the Ingress manifest (kubectl apply -f ingress.yaml), and after your DNS propagates, users can access the dashboard at https://sps-dashboard.yourcompany.com.
5. Beyond Visualization: Reporting and Exports
While an interactive dashboard is invaluable for daily operations, the ability to share insights with a broader audience and integrate data into other systems is equally important. The application includes a dedicated reporting and export module to address this need.
-
Automated PDF Reports: For stakeholders who may not log into the live dashboard, the system can generate comprehensive PDF reports. These reports are designed for clarity and impact, including:
-
A title page with the report type, generation date, and account information.
-
An executive summary with dynamically generated key findings.
-
A list of top-level KPIs.
-
Embedded charts and visualizations (e.g., trend graphs, performance comparisons) for a rich, data-driven narrative.
-
Flexible Data Exports: To enable further analysis or data warehousing, the dashboard allows users to export processed data in common formats like JSON or CSV. Users can select the scope of the data they wish to export, choosing from:
-
All raw pipeline and trigger data.
-
Processed application-level metrics.
-
A complete history of all pipeline runs.
This functionality ensures that the insights dashboard is not a data silo but a powerful hub that can feed into broader business intelligence and reporting workflows.
See it in Action: Try the Live Dashboard
You can explore a live, deployed version of this application to see how these concepts come together in a functional dashboard. Simply navigate to the URL below and use your own IBM Cloud API key to securely load the analytics for your account's Tekton pipelines.
Live Application URL: https://xforce-devops.us-east.containers.appdomain.cloud/sps-dashboard/
As described in the security model, your API key is only used for your current browser session to authenticate with IBM Cloud and fetch your data. It is never stored, logged, or persisted on the server, ensuring your credentials remain secure.
Conclusion
Building a custom CI/CD dashboard provides unparalleled visibility into your engineering processes. By using Python with Streamlit and Plotly, you can create a rich, interactive experience. Most importantly, by adopting a secure, session-based authentication model, you can provide this value to your users without compromising the security of their cloud accounts. Deploying this solution on Kubernetes ensures that your dashboard is as resilient and scalable as the cloud-native pipelines it's designed to monitor.