The new Linux VM-based deployment option for IBM Cloud Pak for AIOps (AIOps) makes quick deployment of an AIOps system for evaluation purposes quick and easy. This blog outlines the high level steps required to do so, referencing links to the official product documentation.
The example scenario this blog covers is where AIOps is being deployed in a Proof-Of-Concept (POC) environment.
PROVISION YOUR ENVIRONMENT
The Linux VM based AIOps installer works on three operating system flavours. Choose one of the following:
- Red Hat® Enterprise Linux® 8.10
- Red Hat® Enterprise Linux® 9.4
- Ubuntu 24.04 LTS
The minimum hardware requirements required are:
- 9 nodes (3 control plane nodes and 6 worker nodes)
- 148 CPUs
- 358 GB RAM
- 120 GB disk per node
Note that these quoted numbers are minimum, hence uplifting these by a degree is recommended. In addition, the documentation also specifies an additional 1 CPU and 3 GB RAM per integration you deploy.
You will also need a load-balancer to deploy AIOps. If you don't have access to one, simply provision an additional VM and run haproxy
instead.
If you want to deploy any Netcool components, also provision a VM to run them.
SCENARIO:
Barbara consults the documentation and comes up with the following VM profile for the proposed AIOps deployment. This includes a load-balancer host to run haproxy
, and a Netcool VM to run Netcool/OMNIbus and Netcool/Impact:
- Red Hat® Enterprise Linux® 9.4
- 1 x load-balancer host: 4 CPU, 16 GB RAM, 300 GB disk
- 3 x control plane nodes: 16 CPU, 32 GB RAM, 300 GB disk + (2 x 300 GB disks)
- 8 x worker nodes: 16 CPU, 64 GB RAM, 300 GB disk + (1 x 300 GB disk)
- 1 x Netcool host: 16 CPU, 32 GB RAM, 300 GB disk
She opts to increase the number of worker nodes from the minimum of 6 to 8. This will provide for node failure scenarios and provide additional resources to accommodate upgrades.
--
NOTE: During upgrades, old pods are only terminated once the new pods have run up and are in a Ready state. More than the minimum number of resources is needed therefore to accommodate upgrades. A suggested uplift is 20%, or one to two additional worker nodes. It also pays to double the amount of hardware resources on the control plane nodes so that the control plane processes could still continue in the event of the failure of one of the three.
Documentation reference: https://www.ibm.com/docs/en/cloud-paks/cloud-pak-aiops/latest?topic=linux-planning#platforms
PREPARE YOUR HOSTS
The next step is to update the set of VMs so that they are at the latest patch levels. Run the following on all your VMs to update and reboot them:
yum update -y
shutdown -r now
Install the following package on the load-balancer host:
Install the following package on the control plane and worker nodes:
The control plane nodes are provisioned with two additional disks and the worker nodes with one additional disk. These must be prepared for use before attempting any installation. Follow the instructions on the link below to set these volumes up.
Documentation reference: https://www.ibm.com/docs/en/cloud-paks/cloud-pak-aiops/latest?topic=requirements-configuring-local-volumes
UPDATE YOUR LOCAL HOSTS FILE
If your set of VMs is not registered in DNS, update the /etc/hosts
file on all of your hosts to include the IP addresses of the load-balancer VM and the control plane and worker node VMs. This will help ensure there are no name resolution issues later on.
150.100.200.50 aiops-linux-loadbalancer.ibmcloud.com
150.100.200.51 aiops-linux-control-1.ibmcloud.com
150.100.200.52 aiops-linux-control-2.ibmcloud.com
150.100.200.53 aiops-linux-control-3.ibmcloud.com
150.100.200.54 aiops-linux-worker-1.ibmcloud.com
150.100.200.55 aiops-linux-worker-2.ibmcloud.com
150.100.200.56 aiops-linux-worker-3.ibmcloud.com
150.100.200.57 aiops-linux-worker-4.ibmcloud.com
150.100.200.58 aiops-linux-worker-5.ibmcloud.com
150.100.200.59 aiops-linux-worker-6.ibmcloud.com
150.100.200.60 aiops-linux-worker-7.ibmcloud.com
150.100.200.61 aiops-linux-worker-8.ibmcloud.com
SET UP AN SSH KEY ON PRIMARY CONTROL PLANE
The installation of AIOps works through the primary control plane node. As such, you need to create an SSH key on the primary control node, and then add it to the /root/.ssh/authorized_keys
file on each of the boxes. Run the following command on your primary control plane node to create an SSH key:
You should see a new file created called id_rsa.pub
. Copy the contents of this file to your clipboard.
COPY SSH KEY TO OTHER NODES AND TEST
Next, SSH as the root user to each of the other control plane nodes as well as each of the worker nodes and append this copied key to the end of the /root/.ssh/authorized_keys
file on each one. After you have done so, test that you can SSH to each control plane and worker node from the primary control plane node without having to enter a password.
NOTE: it is important to connect at least once to each control plane and worker node from the primary control plane node, and answer "yes" when prompted. If you miss out this step, the the AIOps deployment will likely fail later, as the installer will not be able to seamlessly connect to each node from the primary, as it will be awaiting an answer to this question behind the scenes.
The following is an example first connection where the user is prompted to add the target host to the list of known hosts:
[root@aiops-linux-control-1 ~]# ssh root@aiops-linux-control-2.ibmcloud.com
The authenticity of host 'aiops-linux-control-2.ibmcloud.com (::1)' can't be established.
ED25519 key fingerprint is SHA256:bOcJ27pUHfc/BVomiCixBsJK2ys0Z+LLBm3CdRMUy5o.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'aiops-linux-control-2.ibmcloud.com' (ED25519) to the list of known hosts.
[root@aiops-linux-control-2 ~]#
Subsequent connections will not require an answer (which is what you want):
[root@aiops-linux-control-1 ~]# ssh root@aiops-linux-control-2.ibmcloud.com
Last login: Tue Nov 5 09:36:40 2024 from ::1
[root@aiops-linux-control-2 ~]#
SET UP AN SSH KEY ON YOUR DEPLOYMENT MACHINE
This step involves creating an SSH key on the machine from which you will do the deployment from. This might be your laptop, for example. You should follow the same steps outlined above to append your local id_rsa.pub
key contents to the /root/.ssh/authorized_keys
file on the primary control plane node. You should then be able to SSH to the primary control plane node from your laptop without the need to enter a password. This is important since the installation steps you'll run later to deploy AIOps are of the following format:
ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} ssh ${TARGET_USER}@${CP_NODE} curl -LO "${AIOPSCTL_INSTALL_URL}"
Here, the SSH command connects first to the control plane node, then to the target node from there. Hence you need to be able to seamlessly SSH from your deployment location to the primary control plane node, then from there to all the other AIOps cluster nodes.
As before, test out your SSH connection to the primary control plane node and answer "yes" when prompted.
SET UP YOUR LOAD-BALANCER
For resiliency, AIOps is designed to run using a load-balancer to proxy connections to its control plane nodes. If you are setting up AIOps in an evaluation environment where there is no load-balancer available, you can simply provision an extra VM and run haproxy
on it. In our scenario, we already have a load-balancer host provisioned, with haproxy
already installed. Next we need to set up the configuration file, then start up the service.
The following is a sample haproxy.cfg
file you can use. The only thing you need to modify are the three IP addresses mentioned in the three places. The three IP addresses are simply those of the three control plane nodes. Hence use the following file as-is, except for the values of the three IP addresses. You do not need to change the hostnames or any other details in the file.
Sample haproxy.cfg file:
global
log 127.0.0.1 local2
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy
daemon
stats socket /var/lib/haproxy/stats
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option forwardfor except 127.0.0.0/8
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
maxconn 3000
frontend aiops-frontend-plaintext
bind *:80
mode tcp
option tcplog
default_backend aiops-backend-plaintext
frontend aiops-frontend
bind *:443
mode tcp
option tcplog
default_backend aiops-backend
frontend k3s-frontend
bind *:6443
mode tcp
option tcplog
default_backend k3s-backend
backend aiops-backend
mode tcp
option tcp-check
balance roundrobin
default-server inter 10s downinter 5s
server server0 150.100.200.51:443 check
server server1 150.100.200.52:443 check
server server2 150.100.200.53:443 check
backend k3s-backend
mode tcp
option tcp-check
balance roundrobin
default-server inter 10s downinter 5s
server server0 150.100.200.51:6443 check
server server1 150.100.200.52:6443 check
server server2 150.100.200.53:6443 check
backend aiops-backend-plaintext
mode tcp
option tcp-check
balance roundrobin
default-server inter 10s downinter 5s
server server0 150.100.200.51:80 check
server server1 150.100.200.52:80 check
server server2 150.100.200.53:80 check
Copy your haproxy.cfg
file to your load-balancer box. Then, after backing up the original configuration file, copy your replacement file into place. Finally, enable and start the service:
cd /etc/haproxy/
mv haproxy.cfg haproxy.cfg.orig
cp /tmp/haproxy.cfg .
systemctl enable haproxy
systemctl start haproxy
systemctl status haproxy
NOTE: Don't worry if you see some errors in the status output at this point, since the AIOps end points don't exist yet.
FOLLOW THE AIOPS INSTALLATION STEPS
Your VM environment is now ready to start the AIOps deployment steps. The deployment itself should take around an hour to complete.
Documentation reference: https://www.ibm.com/docs/en/cloud-paks/cloud-pak-aiops/latest?topic=linux-online-installation
ADDITIONAL NOTES
At the time of writing, there is a known issue whereby the AIOps deployment may get stuck on the deployment of the aimanager
component. This is because an internal process is trying to connect to the AIOps console via its FQDN instead of its internal service name. If the hosts are not resolveable via DNS, it will get stuck in a retry loop and deployment will eventually fail.
To resolve this, simply update the aiopsctl
coredns
service before starting step 5. Install IBM Cloud Pak for AIOps.
To do this, SSH to your primary control plane node after you have installed aiopsctl
(steps 1 to 4) and run the following using the IP address and FQDN of your load-balancer:
kubectl apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns-custom
namespace: kube-system
data:
default.server: |
cp-console-aiops.aiops-linux-loadbalancer.ibmcloud.com {
hosts {
150.100.200.50 cp-console-aiops.aiops-linux-loadbalancer.ibmcloud.com
fallthrough
}
}
EOF
kubectl -n kube-system rollout restart deployment coredns
NOTE: Keep a copy of the above in case you need to modify or add to your DNS settings. Each time you apply the above, it will replace the existing configuration.
If you subsequently update the coredns
service when setting up either of the Netcool Connectors as mentioned in this blog, you should add your Netcool server details to the above configuration and reapply it. An example cumulative file might look like the following:
kubectl apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns-custom
namespace: kube-system
data:
default.server: |
cp-console-aiops.aiops-linux-loadbalancer.ibmcloud.com {
hosts {
150.100.200.50 cp-console-aiops.aiops-linux-loadbalancer.ibmcloud.com
fallthrough
}
}
netcool-server.ibmcloud.com {
hosts {
150.100.200.62 netcool-server.ibmcloud.com
fallthrough
}
}
netcool-server {
hosts {
150.100.200.62 netcool-server
fallthrough
}
}
EOF
kubectl -n kube-system rollout restart deployment coredns
SUMMARY
The new Linux-based deployment option for AIOps provides a quick way to get up and running with AIOps within a couple of hours.
After completing the deployment, run aiopsctl status
from the primary control plane node and you should see something like the following:
[root@aiops-linux-control-1 ~]# aiopsctl status
o- [05 Nov 24 09:24 CST] Getting cluster status
Control Plane Node(s):
aiops-linux-control-1.ibmcloud.com Ready
aiops-linux-control-2.ibmcloud.com Ready
aiops-linux-control-3.ibmcloud.com Ready
Worker Node(s):
aiops-linux-worker-1.ibmcloud.com Ready
aiops-linux-worker-2.ibmcloud.com Ready
aiops-linux-worker-3.ibmcloud.com Ready
aiops-linux-worker-4.ibmcloud.com Ready
aiops-linux-worker-5.ibmcloud.com Ready
aiops-linux-worker-6.ibmcloud.com Ready
aiops-linux-worker-7.ibmcloud.com Ready
aiops-linux-worker-8.ibmcloud.com Ready
o- [05 Nov 24 09:24 CST] Checking AIOps installation status
15 Ready Components
aiopsanalyticsorchestrator
asm
baseui
rediscp
cluster
kafka
zenservice
aimanager
issueresolutioncore
elasticsearchcluster
lifecycletrigger
aiopsedge
aiopsui
commonservice
lifecycleservice
AIOps installation healthy
[root@aiops-linux-control-1 ~]#
Run the following to obtain the login page URL and login credentials:
[root@aiops-linux-control-1 ~]# aiopsctl server info --show-secrets
Cluster Access Details
URL: aiops-cpd.aiops-linux-loadbalancer.ibmcloud.com
Username: cpadmin
Password: rFUU53vr2b0u7KgesyzfJArlowIsAGoodBoy
[root@aiops-linux-control-1 ~]#
Congratulations! You're now ready to start working with AIOps.
Next steps:
- Install Netcool/OMNIbus and Netcool/Impact onto your Netcool VM in under an hour using the steps in this blog.
- Set up the AIOps Netcool Connectors using the steps in this blog.