Setting up High Availability (HA) for IBM Db2 Data Management Console (DMC) using Pacemaker is a crucial step in ensuring continuous system operation and minimizing downtime in the event of server failures. The following outlines the steps involved in configuring a high-availability environment for Db2 DMC with Pacemaker with the help of shared storage.
Architecture
Host
|
Set up role
|
DMC 1
|
Primary
|
DMC 2
|
Secondary
|
Node 3
|
VIP
|
Node 4
|
Shared storage
|

Environment
The following products/technologies are used:
· Red Hat Linux
· Data Management Console (DMC)
· Pacemaker
· pcs (pacemaker configuration system)
· shared storage
Overview
1. High Availability Setup Overview:
· In this High Availability (HA) configuration, two nodes are dedicated to DMC servers, each equipped with a Virtual IP (VIP) serving as a floating IP. All IPs reside within the same subnet to eliminating the need for additional port routing.
2. Installation of DMC and Pacemaker:
· DMC and Pacemaker are installed on both nodes of the HA environment.
3. Active/Passive Mode for DMC Servers:
· Both DMC servers share a common repository DB, but only one DMC server is active (online) at any given time, operating in an active/passive mode. The repository itself is hosted on a distinct independent node.
4. Virtual IP Usage for Console Access:
· A Virtual IP (VIP) is utilized as part of the URL to access the console, ensuring seamless access regardless of which DMC server is currently online.
5. Pacemaker Monitoring and Failover:
· Pacemaker performs periodic monitoring of the online DMC server status using the 'status.sh' script. Upon detecting a server outage, Pacemaker initiates an attempt to restart the DMC server using the 'restart.sh' script. If the DMC server cannot be revived within a specified timeframe, Pacemaker triggers a failover to the other node, bringing the DMC server online on that node.
6. Shared Storage for File Access:
· Shared storage facilitates the sharing of files between both DMC servers, ensuring synchronized access to the same set of files.
This setup ensures continuous availability of the DMC servers, offering redundancy and failover capabilities to maintain uninterrupted operation in the event of a server failure.
Setup hostnames
on each node, edit the hostname if they have different hostname, for example:
hostnamectl --static --transient set-hostname prudery2.fyre.ibm.com
- On each node, edit /etc/hosts to map the hostname of all servers, for example:`
Set up Pacemaker
Installation
- On each node in the cluster, install the Red Hat High Availability Add-On software packages along with all available fence agents from the High Availability channel.
yum install pcs corosync pacemaker fence-agents-all
- Set a password on each node for the user ID hacluster, which is the default pcs administration account created by the pcs install. It is recommended that the password for user hacluster be the same on both nodes
passwd hacluster
- On each node in the cluster, execute the following commands to start the pcsd service (a pcs daemon which operates as a remote server for pcs) and to enable it at system start:
systemctl start pcsd.service
systemctl enable pcsd.service
- On the node(s) from which you will be running pcs commands, authenticate the pcs user hacluster. Enter username “hacluster” and password when prompted.
pcs host auth prudery1.fyre.ibm.com prudery2.fyre.ibm.com
- You can manage the Pacemaker HA via command line or PCSD Web UI with the url: https://<server-ip>:2224/login,
for example, https://<your_ip>:2224/login or https://<your_ip>:2224/ui
Cluster creation
- Create the two-node cluster named “my_cluster” that consists of nodes prudery1.fyre.ibm.com and prudery2.fyre.ibm.com. This will propagate the cluster configuration files to both nodes in the cluster.
pcs cluster setup --start my_cluster prudery1.fyre.ibm.com prudery2.fyre.ibm.com
- Enable the cluster services to run on each node in the cluster when the node is booted.
pcs cluster enable –all
pcs cluster status
Setting up shared storage
a. Configure iSCSI Target
· Install administration tool
dnf -y install targetcli
· Create a directory
mkdir /var/lib/iscsi_disks
· Enter the admin console
targetcli
· create a disk-image with the name [disk01] on [/var/lib/iscsi_disks/disk01.img] with 10G
cd backstores/fileio
create disk01 /var/lib/iscsi_disks/disk01.img 10G
· create a target
· naming rule : [ iqn.(year)-(month).(reverse of domain name):(any name you like)
cd /iscsi
create iqn.2019-10.world.srv:dlp.target01
· Setting up luns
cd iqn.2019-10.world.srv:dlp.target01/tpg1/luns
create /backstores/fileio/disk01
· set ACL (it's the IQN of an initiator you permit to connect)
· You will need 2 initiators as 2 servers are to be connected
cd ../acls
create iqn.2019-10.world.srv:node01.initiator01
create iqn.2019-10.world.srv:node01.initiator02
· set UserID and Password for authentication for initiator 1
cd iqn.2019-10.world.srv:node01.initiator01
set auth userid=admin
set auth password=password
· set UserID and Password for authentication for initiator 2
cd ../ iqn.2019-10.world.srv:node01.initiator02
set auth userid= username
set auth password=password
exit
Note: The userid and password given can be changed. Make sure that same userid and password is used at the time of initiators setup also.
After configuring targetcli, we can enable listening using the below commands
ss -napt | grep 3260
systemctl enable target
b. Configure iSCSI Initiator
· Configure iSCSI Initiator to connect to iSCSI Target (to be done on both the nodes that host pacemaker.
dnf -y install iscsi-initiator-utils
· Change to the same IQN you set on the iSCSI target server
· On one node
vi /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.2019-10.world.srv:node01.initiator01
· On the other node
vi /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.2019-10.world.srv:node01.initiator02
· On both the nodes enter
vi /etc/iscsi/iscsid.conf
· Uncomment the following lines
· line number 58
node.session.auth.authmethod = CHAP
· line number 62,63. Also enter the correct user name and password entered in the target
node.session.auth.username = username
node.session.auth.password = password
· discover target
iscsiadm -m discovery -t sendtargets -p <your_IP>
· confirm status after discovery
iscsiadm -m node -o show
· login to the target
iscsiadm -m node –login
· confirm the established session
iscsiadm -m session -o show
· confirm the partitions
cat /proc/partitions
c. On a Node in Cluster, Set LVM on shared storage.
· Enter the below command
vi /etc/lvm/lvm.conf
· Change line 1217 as below
system_id_source = "uname"
· The LVM system ID on the node matches the uname for the node
lvm systemid
uname -n
· Create the LVM volume and create an XFS file system on that volume. Since the /dev/sdb1 partition is storage that is shared, you perform this part of the procedure on one node only.
· Create an LVM physical volume on partition /dev/sdb1
pvcreate /dev/sdb1
· Create the volume group my_vg that consists of the physical volume /dev/sdb1
· For RHEL 8.5 and later,
vgcreate --setautoactivation n my_vg /dev/sdb1
· For RHEL 8.4 and earlier,
vgcreate my_vg /dev/sdb1
· Verify that the new volume group has the system ID of the node on which you are running
vgs -o+systemid
· Create a logical volume using the volume group my_vg
lvcreate -L450 -n my_lv my_vg
· You can use the lvs command to display the logical volume
Lvs
· Create an XFS file system on the logical volume my_lv
mkfs.xfs /dev/my_vg/my_lv
· Determine which volume groups are currently configured on your local storage with the following command
vgs --noheadings -o vg_name
· Add the volume groups other than my_vg (the volume group you have just defined for the cluster) as entries to auto_activation_volume_list in the /etc/lvm/lvm.conf configuration file
vi /etc/lvm/lvm.conf
· Rebuild the initramfs boot image to guarantee that the boot image will not try to activate a volume group controlled by the cluster
dracut -H -f /boot/initramfs-$(uname -r).img $(uname -r)
Reboot the computer
· Create a folder name share
mkdir /share
· Mount to the share folder
mount /dev/my_vg/my_lv /share
· Set shared storage as a Cluster resource
pcs resource create My_VG ocf:heartbeat:LVM-activate vgname=my_vg activation_mode=exclusive vg_access_mode=system_id --group HA-LVM
pcs resource create My_FS Filesystem device=" /dev/my_vg/my_lv" directory="/share" fstype="ext4" --group HA-LVM
Configuring DMC’s with shared storage
· Move the below files to the shared storage folder
|
Path
|
Directory
|
/Config
|
Directory
|
/logs
|
Directory
|
/wlp/usr/servers/dsweb/resources/security
|
File
|
/wlp/usr/servers/dsweb/bootstrap.properties
|
File
|
/wlp/usr/servers/dsweb/server.env
|
File
|
/wlp/usr/servers/dsweb/jvm.options
|
File
|
/addons/drs/drs-agent/.env
|
File
|
/addons/drs/drs-agent/config.yaml
|
Directory
|
/addons/drs/drs-agent/insightdb
|
Directory
|
/addons/drs/drs-agent/logs
|
File
|
/addons/job-scheduler/config/config.json
|
Directory
|
/addons/job-scheduler/logs
|
Note: that you can change the copy period of log directories as needed in the syncup.sh script. For more info on the data types to be synced, refer to this doc.
· Replace the file from shared storage to the folders in the DMC folder using symbolic link
ln -s <path to the file/folder to be linked> <the path of the link to be created>
Eg: ln -s /opt/ibm/ibm-datamgmtconsole/logs /shared
Note: Here I have installed DMC on /opt/ibm folder and have created a shared mouting point in a folder /shared
· Adding pacemaker resources
· Create file dmc to be used as resource agent that manages high availability and failover for the DMC server. This includes start, stop, and monitor functions. DMC script is provided below.
· On both nodes, put the file to /usr/lib/ocf/resource.d/heartbeat/
· Change file permission
chmod +x dmc
- Create dmc and specify values to the required parameter pinstallpath, sinstallpath, datasyncpath. These are the paths to where DMC server was installed on the primary, secondary nodes, and sync scripts are placed, respectively. Change these paths accordingly based on your environment.
pcs resource create dmc ocf:heartbeat:dmc pinstallpath="<dmc_installed_path>/ibm-datamgmtconsole" sinstallpath="<dmc_installed_path>/ibm-datamgmtconsole" datasyncpath="<synch_scripts_path>"
· Adding VIP resource for the pacemaker
pcs resource create vip ocf:heartbeat:IPaddr2 ip=<VIP_node>
Testing the HA setup
During the test, you can use either the command line or the PCSD GUI where applicable. Make sure that DMC is running on the primary node and has been stopped on the standby node.
· Auto recover/restart DMC on active node
· Manually stop DMC
· On the online node, run the stop.sh script to stop the server.
· Run the status.sh script few times to verify that the server is stopped and then automatically restarted within couple minutes.
· Kill DMC process
· Look for DMC process:
ps ax | grep -v “grep” | grep java
kill -9 <process ID>
· Run the status.sh script few times to check that the server is stopped and then automatically restarted within couple minutes.
· Control failover to passive node
· On both nodes, verify which one is online and offline by running:
pcs status
status.sh
· Manually put online mode to standby:
pcs node standby <online_node_url>
· On both nodes, verify that the DMC server is switched from one node to the other, e.g., DMC is stopped on the online node and started on the standby node.
· Run the following command to turn primary node back to normal to continue to the next test.
pcs node unstandby prudery1.fyre.ibm.com
· Auto failover to passive node
· Via stopping current active DMC node.
· Run this command on the active node
pcs cluster stop <online_node_URL>
· On both nodes, check the cluster and DMC server status to verify the DMC server fails over to the other standby node
· Restart the stopped node to put the cluster back to normal to continue to the next test.
pcs cluster start <offline_node_URL>
· Via rebooting the active node.
· Run this command on the active node:
reboot
· While the active node is being rebooted, DMC server fails over to the other offline node.
· Further test
· Set up events or tasks such as alerts, jobs, blackout on the primary server and verify that these continue to work after the failover.
How To upgrade the DMC
1. Put both nodes on maintenance mode
2. Download DMC package in the same folder on DMC1
3. Unzip the file in the node (Note: make sure that the DMC shared storage is connected to the same node)
4. Run setup.sh file
5. Stop the DMC on the node
6. Move the share storage to the other node
7. Download DMC 3.1.11
8. Unzip it
9. In the same folder
10. Run the setup.sh file
11. Put the pacemaker out of maintenance mode
Provided scripts
#!/bin/sh
#
# dmc
# Resource agent that manages high availability/failover of
# IBM Db2 Data Management Console
#
# Copyright (c) 2004 SUSE LINUX AG, Lars Marowsky-Bree
# All Rights Reserved.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of version 2 of the GNU General Public License as
# published by the Free Software Foundation.
#
# This program is distributed in the hope that it would be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#
# Further, this software is distributed without any warranty that it is
# free of the rightful claim of any third person regarding infringement
# or the like. Any license provided herein, whether implied or
# otherwise, applies only to this software file. Patent licenses, if
# any, provided herein do not apply to combinations of this program with
# other software, or any other product whatsoever.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.
#
#######################################################################
# Initialization:
: ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat}
. ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs
#######################################################################
PINSTALLPATH=""
SINSTALLPATH=""
#DATASYNCPATH=""
USER=""
meta_data() {
cat <<END
<?xml version="1.0"?>
<!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
<resource-agent name="dmc">
<version>1.0</version>
<longdesc lang="en">
Resource agent for Db2 Data Management Console to handle high availability and failover.
</longdesc>
<shortdesc lang="en">Resource agent for Db2 Data Management Console</shortdesc>
<parameters>
<parameter name="pinstallpath" unique="1" required="1">
<longdesc lang="en">
Install path of DMC console on Primary
</longdesc>
<shortdesc lang="en">Primary DMC Install path</shortdesc>
<content type="string" default="" />
</parameter>
<parameter name="sinstallpath" unique="1" required="1">
<longdesc lang="en">
Install path of DMC console on Standby
</longdesc>
<shortdesc lang="en">Standby DMC Install path</shortdesc>
<content type="string" default="" />
</parameter>
<parameter name="user" unique="1" required="1">
<longdesc lang="en">
User that installed DMC and which will be used to start/stop/monitor console status
</longdesc>
<shortdesc lang="en">User</shortdesc>
<content type="string" default="" />
</parameter>
</parameters>
<actions>
<action name="start" timeout="300s" />
<action name="stop" timeout="180s" />
<action name="monitor" timeout="20s" interval="5s" depth="0" />
<action name="meta-data" timeout="5s" />
<action name="validate-all" timeout="20s" />
</actions>
</resource-agent>
END
}
#######################################################################
dmc_usage() {
cat <<END
usage: $0 {start|stop|monitor|validate-all|meta-data}
action:
start start DMC server
stop stop DMC server
monitor return status of DMC server
meta-data show meta data info
validate-all validate the instance parameters
END
}
dmc_start() {
ocf_log info "$(date): starting dmc server ..."
${PINSTALLPATH}/bin/startup.sh --clean >/dev/null 2>&1
ocf_log info "$(date): dmc - server started"
# chmod 644 ${PINSTALLPATH}/logs/*
# chmod 644 ${PINSTALLPATH}/addons/job-scheduler/logs/*
# chmod 644 ${PINSTALLPATH}/addons/drs/drs-agent/logs/*
# su - ${USER} -c "${DATASYNCPATH}/datasync start ${PINSTALLPATH} ${SINSTALLPATH} ${USER} >/dev/null 2>&1"
# ocf_log info "$(date): dmc - datasync started"
return $OCF_SUCCESS
}
dmc_stop() {
ocf_log info "$(date): stopping dmc server ..."
${PINSTALLPATH}/bin/stop.sh >/dev/null 2>&1
ocf_log info "$(date): dmc - server stopped"
# su - ${USER} -c "${DATASYNCPATH}/datasync stop ${PINSTALLPATH} ${SINSTALLPATH} ${USER} >/dev/null 2>&1"
# ocf_log info "$(date): dmc - datasync stopped"
return $OCF_SUCCESS
}
dmc_monitor() {
process=`ps -ef | grep java`
status=`${PINSTALLPATH}/bin/status.sh`
ocf_log info "process ****** $(date): ${process} "
ocf_log info "status ****** $(date): ${status} "
# active=`${PINSTALLPATH}/bin/status.sh | grep "no" | wc -l `
${PINSTALLPATH}/bin/status.sh
if [ $? -eq 1 ]
then
ocf_log info "$(date): dmc server is not running. Return code $OCF_NOT_RUNNING."
return $OCF_NOT_RUNNING
else
ocf_log info "$(date): dmc server is running. Return code $OCF_SUCCESS."
return $OCF_SUCCESS
fi
}
dmc_validate() {
check_parm_err=0
# check required dmc installpath parameter
if [ -z "$OCF_RESKEY_pinstallpath" ]
then
ocf_log err "Primary DMC required parameter pinstallpath is not set!"
#return $OCF_ERR_CONFIGURED
check_parm_err=1
fi
# check required dmc installpath parameter
if [ -z "$OCF_RESKEY_sinstallpath" ]
then
ocf_log err "Secondary DMC required parameter sinstallpath is not set!"
#return $OCF_ERR_CONFIGURED
check_parm_err=1
fi
# # check required data sync script path parameter
# if [ -z "$OCF_RESKEY_datasyncpath" ]
# then
# ocf_log err "Required parameter datasyncpath is not set!"
# #return $OCF_ERR_CONFIGURED
# check_parm_err=1
# fi
# check required parameter user
if [ -z "$OCF_RESKEY_user" ]
then
ocf_log err "Required parameter user is not set!"
#return $OCF_ERR_CONFIGURED
check_parm_err=1
fi
if [ $check_parm_err -eq 1 ]
then
# $$$ Temp - Set paths for testing by calling script directly
PINSTALLPATH="/data/Build/ibm-datamgmtconsole"
SINSTALLPATH="/data/Build/ibm-datamgmtconsole"
# DATASYNCPATH="/data/syncup"
# USER="root"
return $OCF_ERR_CONFIGURED
fi
PINSTALLPATH="$OCF_RESKEY_pinstallpath"
SINSTALLPATH="$OCF_RESKEY_sinstallpath"
# DATASYNCPATH="$OCF_RESKEY_datasyncpath"
USER="$OCF_RESKEY_user"
return $OCF_SUCCESS
}
case $__OCF_ACTION in
meta-data) meta_data
exit $OCF_SUCCESS
;;
start) dmc_validate
dmc_start
;;
stop) dmc_validate
dmc_stop
;;
monitor) dmc_validate
dmc_monitor
;;
validate) dmc_validate
;;
esac
rc=$?
ocf_log debug "${OCF_RESOURCE_INSTANCE} $__OCF_ACTION : $rc"
exit $rc
|