Data Management Global

Data Management Global

A hub for collaboration, learning, networking, and cultural exchange, and contributing to positive global engagement

 View Only

Setting up High Availability for IBM Data Management Console with Pacemaker using Shared Storage

By Narayanan V Eswar posted Mon November 27, 2023 02:56 AM

  

Setting up High Availability (HA) for IBM Db2 Data Management Console (DMC) using Pacemaker is a crucial step in ensuring continuous system operation and minimizing downtime in the event of server failures. The following outlines the steps involved in configuring a high-availability environment for Db2 DMC with Pacemaker with the help of shared storage.

Architecture

Host

Set up role

DMC 1

Primary

DMC 2

Secondary

Node 3

VIP

Node 4

Shared storage

Environment

The following products/technologies are used:

·                Red Hat Linux

·                Data Management Console (DMC)

·                Pacemaker

·                pcs (pacemaker configuration system)

·                shared storage

Overview

1.     High Availability Setup Overview:

·      In this High Availability (HA) configuration, two nodes are dedicated to DMC servers, each equipped with a Virtual IP (VIP) serving as a floating IP. All IPs reside within the same subnet to eliminating the need for additional port routing.

2.     Installation of DMC and Pacemaker:

·      DMC and Pacemaker are installed on both nodes of the HA environment.

3.     Active/Passive Mode for DMC Servers:

·      Both DMC servers share a common repository DB, but only one DMC server is active (online) at any given time, operating in an active/passive mode. The repository itself is hosted on a distinct independent node.

4.     Virtual IP Usage for Console Access:

·      A Virtual IP (VIP) is utilized as part of the URL to access the console, ensuring seamless access regardless of which DMC server is currently online.

5.     Pacemaker Monitoring and Failover:

·      Pacemaker performs periodic monitoring of the online DMC server status using the 'status.sh' script. Upon detecting a server outage, Pacemaker initiates an attempt to restart the DMC server using the 'restart.sh' script. If the DMC server cannot be revived within a specified timeframe, Pacemaker triggers a failover to the other node, bringing the DMC server online on that node.

6.     Shared Storage for File Access:

·      Shared storage facilitates the sharing of files between both DMC servers, ensuring synchronized access to the same set of files.

This setup ensures continuous availability of the DMC servers, offering redundancy and failover capabilities to maintain uninterrupted operation in the event of a server failure.

Setup hostnames

  • If needed

on each node, edit the hostname if they have different hostname, for example:

hostnamectl --static --transient set-hostname prudery2.fyre.ibm.com

  • On each node, edit /etc/hosts to map the hostname of all servers, for example:`

 

Set up Pacemaker

Installation

  • On each node in the cluster, install the Red Hat High Availability Add-On software packages along with all available fence agents from the High Availability channel.

 yum install pcs corosync pacemaker fence-agents-all

  • Set a password on each node for the user ID hacluster, which is the default pcs administration account created by the pcs install. It is recommended that the password for user hacluster be the same on both nodes

passwd hacluster

  • On each node in the cluster, execute the following commands to start the pcsd service (a pcs daemon which operates as a remote server for pcs) and to enable it at system start:

systemctl start pcsd.service

systemctl enable pcsd.service

  • On the node(s) from which you will be running pcs commands, authenticate the pcs user hacluster.  Enter username “hacluster” and password when prompted.

pcs host auth prudery1.fyre.ibm.com prudery2.fyre.ibm.com

  • You can manage the Pacemaker HA via command line or PCSD Web UI with the url: https://<server-ip>:2224/login,

for example, https://<your_ip>:2224/login or https://<your_ip>:2224/ui

Cluster creation

  • Create the two-node cluster named “my_cluster” that consists of nodes prudery1.fyre.ibm.com and prudery2.fyre.ibm.com. This will propagate the cluster configuration files to both nodes in the cluster.

pcs cluster setup --start my_cluster prudery1.fyre.ibm.com prudery2.fyre.ibm.com

  • Enable the cluster services to run on each node in the cluster when the node is booted.

pcs cluster enable all

pcs cluster status

Setting up shared storage

a. Configure iSCSI Target

·     Install administration tool

dnf -y install targetcli

·     Create a directory

mkdir /var/lib/iscsi_disks

·     Enter the admin console

targetcli

·     create a disk-image with the name [disk01] on [/var/lib/iscsi_disks/disk01.img] with 10G

cd backstores/fileio

create disk01 /var/lib/iscsi_disks/disk01.img 10G

·     create a target

·     naming rule : [ iqn.(year)-(month).(reverse of domain name):(any name you like)

cd /iscsi

create iqn.2019-10.world.srv:dlp.target01

·     Setting up luns

cd iqn.2019-10.world.srv:dlp.target01/tpg1/luns

create /backstores/fileio/disk01

·     set ACL (it's the IQN of an initiator you permit to connect)

·     You will need 2 initiators as 2 servers are to be connected

cd ../acls

create iqn.2019-10.world.srv:node01.initiator01

create iqn.2019-10.world.srv:node01.initiator02

·     set UserID and Password for authentication for initiator 1

cd iqn.2019-10.world.srv:node01.initiator01

set auth userid=admin

set auth password=password

·     set UserID and Password for authentication for initiator 2

cd ../ iqn.2019-10.world.srv:node01.initiator02

set auth userid= username

set auth password=password

exit

Note: The userid and password given can be changed. Make sure that same userid and password is used at the time of initiators setup also.

After configuring targetcli, we can enable listening using the below commands

ss -napt | grep 3260

systemctl enable target

b. Configure iSCSI Initiator

·     Configure iSCSI Initiator to connect to iSCSI Target (to be done on both the nodes that host pacemaker.

dnf -y install iscsi-initiator-utils

·     Change to the same IQN you set on the iSCSI target server

·     On one node

vi /etc/iscsi/initiatorname.iscsi

InitiatorName=iqn.2019-10.world.srv:node01.initiator01

·     On the other node

vi /etc/iscsi/initiatorname.iscsi

InitiatorName=iqn.2019-10.world.srv:node01.initiator02

·     On both the nodes enter

vi /etc/iscsi/iscsid.conf

·     Uncomment the following lines

·     line number 58

node.session.auth.authmethod = CHAP

·     line number 62,63. Also enter the correct user name and password entered in the target

node.session.auth.username = username

node.session.auth.password = password

·     discover target

iscsiadm -m discovery -t sendtargets -p <your_IP>

·       confirm status after discovery

iscsiadm -m node -o show

·       login to the target

iscsiadm -m node –login

·       confirm the established session

iscsiadm -m session -o show

·       confirm the partitions

cat /proc/partitions

c. On a Node in Cluster, Set LVM on shared storage.

·       Enter the below command

vi /etc/lvm/lvm.conf

·       Change line 1217 as below    

system_id_source = "uname"

·       The LVM system ID on the node matches the uname for the node

lvm systemid

uname -n

·       Create the LVM volume and create an XFS file system on that volume. Since the /dev/sdb1 partition is storage that is shared, you perform this part of the procedure on one node only.

·       Create an LVM physical volume on partition /dev/sdb1

pvcreate /dev/sdb1

·       Create the volume group my_vg that consists of the physical volume /dev/sdb1

·       For RHEL 8.5 and later,

vgcreate --setautoactivation n my_vg /dev/sdb1

·       For RHEL 8.4 and earlier,

vgcreate my_vg /dev/sdb1

·       Verify that the new volume group has the system ID of the node on which you are running

vgs -o+systemid

·       Create a logical volume using the volume group my_vg

lvcreate -L450 -n my_lv my_vg

·       You can use the lvs command to display the logical volume

Lvs

·       Create an XFS file system on the logical volume my_lv

mkfs.xfs /dev/my_vg/my_lv

·       Determine which volume groups are currently configured on your local storage with the following command

vgs --noheadings -o vg_name

·       Add the volume groups other than my_vg (the volume group you have just defined for the cluster) as entries to auto_activation_volume_list in the /etc/lvm/lvm.conf configuration file

vi /etc/lvm/lvm.conf

·       Rebuild the initramfs boot image to guarantee that the boot image will not try to activate a volume group controlled by the cluster

dracut -H -f /boot/initramfs-$(uname -r).img $(uname -r)

Reboot the computer

·       Create a folder name share

mkdir /share

·       Mount to the share folder

mount /dev/my_vg/my_lv /share

·       Set shared storage as a Cluster resource

pcs resource create My_VG ocf:heartbeat:LVM-activate vgname=my_vg activation_mode=exclusive vg_access_mode=system_id --group HA-LVM

pcs resource create My_FS Filesystem device=" /dev/my_vg/my_lv" directory="/share" fstype="ext4" --group HA-LVM

Configuring DMC’s with shared storage

·       Move the below files to the shared storage folder

Type

Path

Directory

/Config

Directory

/logs

Directory

/wlp/usr/servers/dsweb/resources/security

File

/wlp/usr/servers/dsweb/bootstrap.properties

File

/wlp/usr/servers/dsweb/server.env

File

/wlp/usr/servers/dsweb/jvm.options

File

/addons/drs/drs-agent/.env

File

/addons/drs/drs-agent/config.yaml

Directory

/addons/drs/drs-agent/insightdb

Directory

/addons/drs/drs-agent/logs

File

/addons/job-scheduler/config/config.json

Directory

/addons/job-scheduler/logs

Note: that you can change the copy period of log directories as needed in the syncup.sh script. For more info on the data types to be synced, refer to this doc.

·       Replace the file from shared storage to the folders in the DMC folder using symbolic link

ln -s <path to the file/folder to be linked> <the path of the link to be created>

Eg: ln -s /opt/ibm/ibm-datamgmtconsole/logs /shared

Note: Here I have installed DMC on /opt/ibm folder and have created a shared mouting point in a folder /shared

·       Adding pacemaker resources

·       Create file dmc to be used as resource agent that manages high availability and failover for the DMC server.  This includes start, stop, and monitor functions. DMC script is provided below.

·       On both nodes, put the file to /usr/lib/ocf/resource.d/heartbeat/

·       Change file permission

chmod +x dmc

  • Create dmc and specify values to the required parameter pinstallpathsinstallpathdatasyncpath.  These are the paths to where DMC server was installed on the primary, secondary nodes, and sync scripts are placed, respectively.  Change these paths accordingly based on your environment.

pcs resource create dmc ocf:heartbeat:dmc pinstallpath="<dmc_installed_path>/ibm-datamgmtconsole" sinstallpath="<dmc_installed_path>/ibm-datamgmtconsole" datasyncpath="<synch_scripts_path>"

·       Adding VIP resource for the pacemaker

pcs resource create vip ocf:heartbeat:IPaddr2 ip=<VIP_node>

Testing the HA setup

During the test, you can use either the command line or the PCSD GUI where applicable.  Make sure that DMC is running on the primary node and has been stopped on the standby node.

·       Auto recover/restart DMC on active node

·       Manually stop DMC

·       On the online node, run the stop.sh script to stop the server.

·       Run the status.sh script few times to verify that the server is stopped and then automatically restarted within couple minutes.

·       Kill DMC process

·       Look for DMC process:

ps ax | grep -v “grep” | grep java

kill -9 <process ID>

·       Run the status.sh script few times to check that the server is stopped and then automatically restarted within couple minutes.

·       Control failover to passive node

·       On both nodes, verify which one is online and offline by running:

pcs status

status.sh

·       Manually put online mode to standby:

pcs node standby <online_node_url>

·       On both nodes, verify that the DMC server is switched from one node to the other, e.g., DMC is stopped on the online node and started on the standby node.

·       Run the following command to turn primary node back to normal to continue to the next test.

pcs node unstandby prudery1.fyre.ibm.com

·       Auto failover to passive node

·       Via stopping current active DMC node.

·       Run this command on the active node

pcs cluster stop <online_node_URL>

·       On both nodes, check the cluster and DMC server status to verify the DMC server fails over to the other standby node

·       Restart the stopped node to put the cluster back to normal to continue to the next test.

pcs cluster start <offline_node_URL>

·       Via rebooting the active node.

·       Run this command on the active node:

reboot

·       While the active node is being rebooted, DMC server fails over to the other offline node.

·       Further test

·       Set up events or tasks such as alerts, jobs, blackout on the primary server and verify that these continue to work after the failover.

How To upgrade the DMC

1.    Put both nodes on maintenance mode

2.    Download DMC package in the same folder on DMC1

3.    Unzip the file in the node (Note: make sure that the DMC shared storage is connected to the same node)

4.    Run setup.sh file 

5.    Stop the DMC on the node

6.    Move the share storage to the other node

7.    Download DMC 3.1.11

8.    Unzip it

9.    In the same folder

10. Run the setup.sh file

11. Put the pacemaker out of maintenance mode

Provided scripts

#!/bin/sh

#

# dmc

# Resource agent that manages high availability/failover of

# IBM Db2 Data Management Console

#

# Copyright (c) 2004 SUSE LINUX AG, Lars Marowsky-Bree

#                    All Rights Reserved.

#

# This program is free software; you can redistribute it and/or modify

# it under the terms of version 2 of the GNU General Public License as

# published by the Free Software Foundation.

#

# This program is distributed in the hope that it would be useful, but

# WITHOUT ANY WARRANTY; without even the implied warranty of

# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

#

# Further, this software is distributed without any warranty that it is

# free of the rightful claim of any third person regarding infringement

# or the like.  Any license provided herein, whether implied or

# otherwise, applies only to this software file.  Patent licenses, if

# any, provided herein do not apply to combinations of this program with

# other software, or any other product whatsoever.

#

# You should have received a copy of the GNU General Public License

# along with this program; if not, write the Free Software Foundation,

# Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.

#

#######################################################################

# Initialization:

: ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat}

. ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs

#######################################################################

PINSTALLPATH=""

SINSTALLPATH=""

#DATASYNCPATH=""

USER=""

meta_data() {

            cat <<END

<?xml version="1.0"?>

<!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">

<resource-agent name="dmc">

<version>1.0</version>

<longdesc lang="en">

Resource agent for Db2 Data Management Console to handle high availability and failover.

</longdesc>

<shortdesc lang="en">Resource agent for Db2 Data Management Console</shortdesc>

<parameters>

<parameter name="pinstallpath" unique="1" required="1">

<longdesc lang="en">

Install path of DMC console on Primary

</longdesc>

<shortdesc lang="en">Primary DMC Install path</shortdesc>

<content type="string" default="" />

</parameter>

<parameter name="sinstallpath" unique="1" required="1">

<longdesc lang="en">

Install path of DMC console on Standby

</longdesc>

<shortdesc lang="en">Standby DMC Install path</shortdesc>

<content type="string" default="" />

</parameter>

<parameter name="user" unique="1" required="1">

<longdesc lang="en">

User that installed DMC and which will be used to start/stop/monitor console status

</longdesc>

<shortdesc lang="en">User</shortdesc>

<content type="string" default="" />

</parameter>

</parameters>

<actions>

<action name="start"        timeout="300s" />

<action name="stop"         timeout="180s" />

<action name="monitor"      timeout="20s" interval="5s" depth="0" />

<action name="meta-data"    timeout="5s" />

<action name="validate-all"   timeout="20s" />

</actions>

</resource-agent>

END

}

#######################################################################

dmc_usage() {

            cat <<END

usage: $0 {start|stop|monitor|validate-all|meta-data}

action:

  start         start DMC server

  stop          stop DMC server

  monitor       return status of DMC server

  meta-data     show meta data info

  validate-all  validate the instance parameters

END

}

dmc_start() {

    ocf_log info "$(date): starting dmc server ..."

    ${PINSTALLPATH}/bin/startup.sh --clean >/dev/null 2>&1

    ocf_log info "$(date): dmc - server started"

    # chmod 644 ${PINSTALLPATH}/logs/*

    # chmod 644 ${PINSTALLPATH}/addons/job-scheduler/logs/*

    # chmod 644 ${PINSTALLPATH}/addons/drs/drs-agent/logs/*

    # su - ${USER} -c "${DATASYNCPATH}/datasync start ${PINSTALLPATH} ${SINSTALLPATH} ${USER} >/dev/null 2>&1"

    # ocf_log info "$(date): dmc - datasync started"

    return $OCF_SUCCESS

}

dmc_stop() {

    ocf_log info "$(date): stopping dmc server ..."

    ${PINSTALLPATH}/bin/stop.sh >/dev/null 2>&1

    ocf_log info "$(date): dmc - server stopped"

    # su - ${USER} -c "${DATASYNCPATH}/datasync stop ${PINSTALLPATH} ${SINSTALLPATH} ${USER} >/dev/null 2>&1"

    # ocf_log info "$(date): dmc - datasync stopped"

    return $OCF_SUCCESS

}

dmc_monitor() {

    process=`ps -ef | grep java`

    status=`${PINSTALLPATH}/bin/status.sh`

   

    ocf_log info "process ****** $(date): ${process} "

    ocf_log info "status ****** $(date): ${status} "

#    active=`${PINSTALLPATH}/bin/status.sh | grep "no" | wc -l `

    ${PINSTALLPATH}/bin/status.sh   

    if [ $? -eq 1 ]

    then

      ocf_log info "$(date): dmc server is not running.  Return code $OCF_NOT_RUNNING."

      return $OCF_NOT_RUNNING

    else

      ocf_log info "$(date): dmc server is running.  Return code $OCF_SUCCESS."

      return $OCF_SUCCESS

    fi

}

dmc_validate() {

    check_parm_err=0

    # check required dmc installpath parameter

    if [ -z "$OCF_RESKEY_pinstallpath" ]

    then

        ocf_log err "Primary DMC required parameter pinstallpath is not set!"

        #return $OCF_ERR_CONFIGURED

            check_parm_err=1

    fi

    # check required dmc installpath parameter

    if [ -z "$OCF_RESKEY_sinstallpath" ]

    then

        ocf_log err "Secondary DMC required parameter sinstallpath is not set!"

        #return $OCF_ERR_CONFIGURED

            check_parm_err=1

    fi

    # # check required data sync script path parameter

    # if [ -z "$OCF_RESKEY_datasyncpath" ]

    # then

    #     ocf_log err "Required parameter datasyncpath is not set!"

    #     #return $OCF_ERR_CONFIGURED

    #     check_parm_err=1

    # fi

           

            # check required parameter user

    if [ -z "$OCF_RESKEY_user" ]

    then

        ocf_log err "Required parameter user is not set!"

        #return $OCF_ERR_CONFIGURED

        check_parm_err=1

    fi

    if [ $check_parm_err -eq 1 ]

    then

            # $$$ Temp -  Set paths for testing by calling script directly

            PINSTALLPATH="/data/Build/ibm-datamgmtconsole"

            SINSTALLPATH="/data/Build/ibm-datamgmtconsole"

            # DATASYNCPATH="/data/syncup"

            # USER="root"

            return $OCF_ERR_CONFIGURED

    fi 

    PINSTALLPATH="$OCF_RESKEY_pinstallpath"

    SINSTALLPATH="$OCF_RESKEY_sinstallpath"

    # DATASYNCPATH="$OCF_RESKEY_datasyncpath"

            USER="$OCF_RESKEY_user"

    return $OCF_SUCCESS

}

case $__OCF_ACTION in

meta-data)          meta_data

                        exit $OCF_SUCCESS

                        ;;

start)                       dmc_validate

                        dmc_start

                        ;;

stop)                        dmc_validate

                        dmc_stop

                        ;;

monitor)   dmc_validate

                        dmc_monitor

                        ;;

validate)   dmc_validate

                        ;;

esac

rc=$?

ocf_log debug "${OCF_RESOURCE_INSTANCE} $__OCF_ACTION : $rc"

exit $rc

0 comments
26 views

Permalink