Cloud Pak for Data uses certificate manager for managing the lifecycle of internal certificates. These internal certificates are configured to be automatically renewed, for example internal-tls-certificate is renewed once every 60 days. Whenever the certificates are renewed the pods mounting the secrets are automatically restarted to facilitate the availability of new certificates for the applications. This process can affect the availability of your applications resulting in downtimes until the pods are restarted.
In the following article we aim to demonstrate a method for monitoring the certificate renewal process by enabling alert notifications when the certificate is due to expire. This gives users more control in planning for a downtime and whether to bring forward the renewal process with manual intervention.
Pre-requisties
- Red Hat OpenShift certificate manager is installed on the cluster.
- As a cluster administrator, enable the user workload monitoring within your OpenShift monitoring configuration by executing the following command,
cat <<EOF |oc apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
enableUserWorkload: true
EOF
- Verify whether the monitoring components for user workloads are running,
· oc get pods -n openshift-user-workload-monitoring
Enable Service monitor for the certificate manager
The cert-manager Operator for Red Hat OpenShift operands exposes metrics by default on port 9402 at the /metrics service endpoint. You can configure metrics collection for the cert-manager operands by creating a ServiceMonitor custom resource (CR) that enables Prometheus Operator to collect custom metrics
Run the following command as a cluster administrator, to enable the service monitor
cat <<EOF|oc apply -f -
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app: cert-manager
app.kubernetes.io/instance: cert-manager
app.kubernetes.io/name: cert-manager
name: cert-manager
namespace: cert-manager
spec:
endpoints:
- honorLabels: false
interval: 60s
path: /metrics
scrapeTimeout: 30s
targetPort: 9402
selector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- cainjector
- cert-manager
- webhook
- key: app.kubernetes.io/instance
operator: In
values:
- cert-manager
- key: app.kubernetes.io/component
operator: In
values:
- cainjector
- controller
- webhook
EOF
After the ServiceMonitor
CR is created, the user workload Prometheus instance begins metrics collection from the cert-manager Operator for Red Hat OpenShift operands.
Finally, create a PrometheusAlert Rule for notifying whenever the certificates are due to expire. For example, the following alerts will notify users all the certificates that are about to expire within the next 7 days. The duration can be customized by updating the expression as needed.
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: cert-manager-alerts
namespace: cert-manager
labels:
prometheus: k8s
role: alert-rules
spec:
groups:
- name: cert-manager.rules
rules:
- alert: CertificateExpiringSoon
expr: |
certmanager_certificate_expiration_timestamp_seconds - time() < 86400 * 7 # 7 days
for: 5m
labels:
severity: warning
annotations:
summary: "Certificate '{{ $labels.name }}' is expiring soon."
description: "The certificate '{{ $labels.name }}' will expire in less than 7 days."
The following notification is sent out for every certificate that matches the corresponding Prometheus rule.