AIOps

 View Only

Cloud Pak for AIOps 4 tips: install Linux-based AIOps in under 2 hours

By Zane Bray posted 15 days ago

  

The new Linux VM-based deployment option for IBM Cloud Pak for AIOps (AIOps) makes quick deployment of an AIOps system for evaluation purposes quick and easy. This blog outlines the high level steps required to do so, referencing links to the official product documentation.

The example scenario this blog covers is where AIOps is being deployed in a Proof-Of-Concept (POC) environment.

PROVISION YOUR ENVIRONMENT

The Linux VM based AIOps installer works on three operating system flavours. Choose one of the following:

  • Red Hat® Enterprise Linux® 8.10
  • Red Hat® Enterprise Linux® 9.4
  • Ubuntu 24.04 LTS

The minimum hardware requirements required are:

  • 9 nodes (3 control plane nodes and 6 worker nodes)
  • 148 CPUs
  • 358 GB RAM
  • 120 GB disk per node

Note that these quoted numbers are minimum, hence uplifting these by a degree is recommended. In addition, the documentation also specifies an additional 1 CPU and 3 GB RAM per integration you deploy.

You will also need a load-balancer to deploy AIOps. If you don't have access to one, simply provision an additional VM and run haproxy instead.

If you want to deploy any Netcool components, also provision a VM to run them.

SCENARIO:

Barbara consults the documentation and comes up with the following VM profile for the proposed AIOps deployment. This includes a load-balancer host to run haproxy, and a Netcool VM to run Netcool/OMNIbus and Netcool/Impact:

  • Red Hat® Enterprise Linux® 9.4
  • 1 x load-balancer host: 4 CPU, 16 GB RAM, 300 GB disk
  • 3 x control plane nodes: 16 CPU, 32 GB RAM, 300 GB disk + (2 x 300 GB disks)
  • 8 x worker nodes: 16 CPU, 64 GB RAM, 300 GB disk + (1 x 300 GB disk)
  • 1 x Netcool host: 16 CPU, 32 GB RAM, 300 GB disk

She opts to increase the number of worker nodes from the minimum of 6 to 8. This will provide for node failure scenarios and provide additional resources to accommodate upgrades.

--

NOTE: During upgrades, old pods are only terminated once the new pods have run up and are in a Ready state. More than the minimum number of resources is needed therefore to accommodate upgrades. A suggested uplift is 20%, or one to two additional worker nodes. It also pays to double the amount of hardware resources on the control plane nodes so that the control plane processes could still continue in the event of the failure of one of the three.

Documentation reference: https://www.ibm.com/docs/en/cloud-paks/cloud-pak-aiops/latest?topic=linux-planning#platforms

PREPARE YOUR HOSTS

The next step is to update the set of VMs so that they are at the latest patch levels. Run the following on all your VMs to update and reboot them:

yum update -y
shutdown -r now

Install the following package on the load-balancer host:

yum install -y haproxy

Install the following package on the control plane and worker nodes:

yum install -y lvm2

The control plane nodes are provisioned with two additional disks and the worker nodes with one additional disk. These must be prepared for use before attempting any installation. Follow the instructions on the link below to set these volumes up.

Documentation reference: https://www.ibm.com/docs/en/cloud-paks/cloud-pak-aiops/latest?topic=requirements-configuring-local-volumes

UPDATE YOUR LOCAL HOSTS FILE

If your set of VMs is not registered in DNS, update the /etc/hosts file on all of your hosts to include the IP addresses of the load-balancer VM and the control plane and worker node VMs. This will help ensure there are no name resolution issues later on.

150.100.200.50 aiops-linux-loadbalancer.ibmcloud.com
150.100.200.51 aiops-linux-control-1.ibmcloud.com
150.100.200.52 aiops-linux-control-2.ibmcloud.com
150.100.200.53 aiops-linux-control-3.ibmcloud.com
150.100.200.54 aiops-linux-worker-1.ibmcloud.com
150.100.200.55 aiops-linux-worker-2.ibmcloud.com
150.100.200.56 aiops-linux-worker-3.ibmcloud.com
150.100.200.57 aiops-linux-worker-4.ibmcloud.com
150.100.200.58 aiops-linux-worker-5.ibmcloud.com
150.100.200.59 aiops-linux-worker-6.ibmcloud.com
150.100.200.60 aiops-linux-worker-7.ibmcloud.com
150.100.200.61 aiops-linux-worker-8.ibmcloud.com

SET UP AN SSH KEY ON PRIMARY CONTROL PLANE

The installation of AIOps works through the primary control plane node. As such, you need to create an SSH key on the primary control node, and then add it to the /root/.ssh/authorized_keys file on each of the boxes. Run the following command on your primary control plane node to create an SSH key:

cd ~/.ssh
ssh-keygen -o

You should see a new file created called id_rsa.pub. Copy the contents of this file to your clipboard.

COPY SSH KEY TO OTHER NODES AND TEST

Next, SSH as the root user to each of the other control plane nodes as well as each of the worker nodes and append this copied key to the end of the /root/.ssh/authorized_keys file on each one. After you have done so, test that you can SSH to each control plane and worker node from the primary control plane node without having to enter a password.

NOTE: it is important to connect at least once to each control plane and worker node from the primary control plane node, and answer "yes" when prompted. If you miss out this step, the the AIOps deployment will likely fail later, as the installer will not be able to seamlessly connect to each node from the primary, as it will be awaiting an answer to this question behind the scenes.

The following is an example first connection where the user is prompted to add the target host to the list of known hosts:

[root@aiops-linux-control-1 ~]# ssh root@aiops-linux-control-2.ibmcloud.com
The authenticity of host 'aiops-linux-control-2.ibmcloud.com (::1)' can't be established.
ED25519 key fingerprint is SHA256:bOcJ27pUHfc/BVomiCixBsJK2ys0Z+LLBm3CdRMUy5o.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'aiops-linux-control-2.ibmcloud.com' (ED25519) to the list of known hosts.
[root@aiops-linux-control-2 ~]# 

Subsequent connections will not require an answer (which is what you want):

[root@aiops-linux-control-1 ~]# ssh root@aiops-linux-control-2.ibmcloud.com
Last login: Tue Nov  5 09:36:40 2024 from ::1
[root@aiops-linux-control-2 ~]# 

SET UP AN SSH KEY ON YOUR DEPLOYMENT MACHINE

This step involves creating an SSH key on the machine from which you will do the deployment from. This might be your laptop, for example. You should follow the same steps outlined above to append your local id_rsa.pub key contents to the /root/.ssh/authorized_keys file on the primary control plane node. You should then be able to SSH to the primary control plane node from your laptop without the need to enter a password. This is important since the installation steps you'll run later to deploy AIOps are of the following format:

ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} ssh ${TARGET_USER}@${CP_NODE} curl -LO "${AIOPSCTL_INSTALL_URL}"

Here, the SSH command connects first to the control plane node, then to the target node from there. Hence you need to be able to seamlessly SSH from your deployment location to the primary control plane node, then from there to all the other AIOps cluster nodes.

As before, test out your SSH connection to the primary control plane node and answer "yes" when prompted.

SET UP YOUR LOAD-BALANCER

For resiliency, AIOps is designed to run using a load-balancer to proxy connections to its control plane nodes. If you are setting up AIOps in an evaluation environment where there is no load-balancer available, you can simply provision an extra VM and run haproxy on it. In our scenario, we already have a load-balancer host provisioned, with haproxy already installed. Next we need to set up the configuration file, then start up the service.

The following is a sample haproxy.cfg file you can use. The only thing you need to modify are the three IP addresses mentioned in  the three places. The three IP addresses are simply those of the three control plane nodes. Hence use the following file as-is, except for the values of the three IP addresses. You do not need to change the hostnames or any other details in the file.

Sample haproxy.cfg file:

global
	log         127.0.0.1 local2
	chroot      /var/lib/haproxy
	pidfile     /var/run/haproxy.pid
	maxconn     4000
	user        haproxy
	group       haproxy
	daemon
	stats socket /var/lib/haproxy/stats

defaults
	mode                    http
	log                     global
	option                  httplog
	option                  dontlognull
	option http-server-close
	option forwardfor       except 127.0.0.0/8
	option                  redispatch
	retries                 3
	timeout http-request    10s
	timeout queue           1m
	timeout connect         10s
	timeout client          1m
	timeout server          1m
	timeout http-keep-alive 10s
	timeout check           10s
	maxconn                 3000

frontend aiops-frontend-plaintext
    bind *:80
    mode tcp
    option tcplog
    default_backend aiops-backend-plaintext

frontend aiops-frontend
    bind *:443
    mode tcp
    option tcplog
    default_backend aiops-backend

frontend k3s-frontend
    bind *:6443
    mode tcp
    option tcplog
    default_backend k3s-backend

backend aiops-backend
    mode tcp
    option tcp-check
    balance roundrobin
    default-server inter 10s downinter 5s
    server server0 150.100.200.51:443 check
    server server1 150.100.200.52:443 check
    server server2 150.100.200.53:443 check

backend k3s-backend
    mode tcp
    option tcp-check
    balance roundrobin
    default-server inter 10s downinter 5s
    server server0 150.100.200.51:6443 check
    server server1 150.100.200.52:6443 check
    server server2 150.100.200.53:6443 check

backend aiops-backend-plaintext
    mode tcp
    option tcp-check
    balance roundrobin
    default-server inter 10s downinter 5s
    server server0 150.100.200.51:80 check
    server server1 150.100.200.52:80 check
    server server2 150.100.200.53:80 check

Copy your haproxy.cfg file to your load-balancer box. Then, after backing up the original configuration file, copy your replacement file into place. Finally, enable and start the service:

cd /etc/haproxy/
mv haproxy.cfg haproxy.cfg.orig
cp /tmp/haproxy.cfg .
systemctl enable haproxy
systemctl start haproxy
systemctl status haproxy

NOTE: Don't worry if you see some errors in the status output at this point, since the AIOps end points don't exist yet.

FOLLOW THE AIOPS INSTALLATION STEPS

Your VM environment is now ready to start the AIOps deployment steps. The deployment itself should take around an hour to complete.

Documentation reference: https://www.ibm.com/docs/en/cloud-paks/cloud-pak-aiops/latest?topic=linux-online-installation

ADDITIONAL NOTES 

At the time of writing, there is a known issue whereby the AIOps deployment may get stuck on the deployment of the aimanager component. This is because an internal process is trying to connect to the AIOps console via its FQDN instead of its internal service name. If the hosts are not resolveable via DNS, it will get stuck in a retry loop and deployment will eventually fail.

To resolve this, simply update the aiopsctl coredns service before starting step 5. Install IBM Cloud Pak for AIOps.

To do this, SSH to your primary control plane node after you have installed aiopsctl (steps 1 to 4) and run the following using the IP address and FQDN of your load-balancer:

kubectl apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns-custom
  namespace: kube-system
data:
  default.server: |
    cp-console-aiops.aiops-linux-loadbalancer.ibmcloud.com {
        hosts {
              150.100.200.50 cp-console-aiops.aiops-linux-loadbalancer.ibmcloud.com
              fallthrough
        }
    }
EOF
kubectl -n kube-system rollout restart deployment coredns

NOTE: Keep a copy of the above in case you need to modify or add to your DNS settings. Each time you apply the above, it will replace the existing configuration.

If you subsequently update the coredns service when setting up either of the Netcool Connectors as mentioned in this blog, you should add your Netcool server details to the above configuration and reapply it. An example cumulative file might look like the following:

kubectl apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns-custom
  namespace: kube-system
data:
  default.server: |
    cp-console-aiops.aiops-linux-loadbalancer.ibmcloud.com {
        hosts {
              150.100.200.50 cp-console-aiops.aiops-linux-loadbalancer.ibmcloud.com
              fallthrough
        }
    }
    netcool-server.ibmcloud.com {
        hosts {
              150.100.200.62 netcool-server.ibmcloud.com
              fallthrough
        }
    }
    netcool-server {
        hosts {
              150.100.200.62 netcool-server
              fallthrough
        }
    }
EOF
kubectl -n kube-system rollout restart deployment coredns

SUMMARY

The new Linux-based deployment option for AIOps provides a quick way to get up and running with AIOps within a couple of hours.

After completing the deployment, run aiopsctl status from the primary control plane node and you should see something like the following:

[root@aiops-linux-control-1 ~]# aiopsctl status
o- [05 Nov 24 09:24 CST] Getting cluster status
Control Plane Node(s):
    aiops-linux-control-1.ibmcloud.com Ready
    aiops-linux-control-2.ibmcloud.com Ready
    aiops-linux-control-3.ibmcloud.com Ready

Worker Node(s):
    aiops-linux-worker-1.ibmcloud.com Ready
    aiops-linux-worker-2.ibmcloud.com Ready
    aiops-linux-worker-3.ibmcloud.com Ready
    aiops-linux-worker-4.ibmcloud.com Ready
    aiops-linux-worker-5.ibmcloud.com Ready
    aiops-linux-worker-6.ibmcloud.com Ready
    aiops-linux-worker-7.ibmcloud.com Ready
    aiops-linux-worker-8.ibmcloud.com Ready

o- [05 Nov 24 09:24 CST] Checking AIOps installation status

  15 Ready Components
    aiopsanalyticsorchestrator
    asm
    baseui
    rediscp
    cluster
    kafka
    zenservice
    aimanager
    issueresolutioncore
    elasticsearchcluster
    lifecycletrigger
    aiopsedge
    aiopsui
    commonservice
    lifecycleservice

  AIOps installation healthy
[root@aiops-linux-control-1 ~]# 

Run the following to obtain the login page URL and login credentials:

[root@aiops-linux-control-1 ~]# aiopsctl server info --show-secrets
Cluster Access Details
URL:      aiops-cpd.aiops-linux-loadbalancer.ibmcloud.com
Username: cpadmin
Password: rFUU53vr2b0u7KgesyzfJArlowIsAGoodBoy
[root@aiops-linux-control-1 ~]#

Congratulations! You're now ready to start working with AIOps.

Next steps:

  • Install Netcool/OMNIbus and Netcool/Impact onto your Netcool VM in under an hour using the steps in this blog.
  • Set up the AIOps Netcool Connectors using the steps in this blog.
0 comments
20 views

Permalink