If the cluster is behaving correctly as you say, then it would certainly make sense to open a new case with the IBM Support and let them check the erroneous db2cm report.
I can't think of anything else we (end users) could do from our side, as we don't really know how this report is created :-(
I was going to try and replicate your issue in my own R&D Pacemaker cluster, but then realised it has been scrapped since... if I find the time to recreate it, I'll check and send an update here.
Original Message:
Sent: Tue April 22, 2025 10:00 AM
From: Jean-Bernard Ngomiraronka
Subject: db2 pacemake setup has issue while doing
Hi Damir,
thank you for getting back to me.
The cluster is behaving correctly. It seems it is a bug in the db2cm -list report as you can see in db2cm -list and db2cm -status.
I ran the takeover on the standby,machine-2, machine-2 became primary(Master) and machine-1 became standby(slave).
machine-1:~ # db2cm -status
2025-04-22-09.40.10
-------------------
Cluster Summary:
* Stack: corosync (Pacemaker is running)
* Current DC: machine-2 (version 2.1.7+20240411.81041cf0b-1.1.db2pcmk-2.1.7+20240411.81041cf0b) - partition with quorum
* Last updated: Tue Apr 22 09:40:10 2025 on machine-1
* Last change: Mon Apr 21 16:52:10 2025 by root via root on machine-1
* 2 nodes configured
* 8 resource instances configured
Node List:
* Online: [ machine-1 machine-2 ]
Full List of Resources:
* db2_ethmonitor_machine-1_eth0 (ocf::heartbeat:db2ethmon): Started machine-1
* db2_ethmonitor_machine-2_eth0 (ocf::heartbeat:db2ethmon): Started machine-2
* db2_machine-1_db2inst1_0 (ocf::heartbeat:db2inst): Started machine-1
* db2_machine-2_db2inst1_0 (ocf::heartbeat:db2inst): Started machine-2
* Clone Set: db2_db2inst1_db2inst1_SAMPLE-clone [db2_db2inst1_db2inst1_SAMPLE] (promotable):
* Masters: [ machine-2 ]
* Slaves: [ machine-1 ]
* db2_db2inst1_db2inst1_SAMPLE-primary-VIP (ocf::heartbeat:IPaddr2): Started machine-2
* db2_db2inst1_db2inst1_SAMPLE-standby-VIP (ocf::heartbeat:IPaddr2): Started machine-1
QDevice Information:
* Connected: [ easi-sh2-sit-qdevice-1:5403 ]
machine-1:~ # db2cm -list
HA Model: HADR
Domain Information:
Domain name = SAMPLE_DOMAIN
Cluster Manager = Corosync
Cluster Manager Version = 3.1.8
Resource Manager = Pacemaker
Resource Manager Version = 2.1.7+20240411.81041cf0b-1.1.db2pcmk
Current domain leader = machine-2
Number of nodes = 2
Number of resources = 7
Host Information:
HOSTNAME STATE
------------------------ -----------
machine-1 ONLINE
machine-2 ONLINE
Fencing Information:
Fencing Configured: Not configured
Quorum Information:
Quorum Type: Qdevice with LMS Algorithm
Total Votes: 3
Quorum Votes: 2
Quorum Nodes:
----------------
machine-1
machine-2
Resource Information:
Resource Name = db2_db2inst1_db2inst1_SAMPLE
Resource Type = Database
DB Name = SAMPLE
Managed = True
HADR Primary Instance = db2inst1
HADR Primary Node =
HADR Primary State = Offline
HADR Standby Instance = db2inst1
HADR Standby Node = machine-1
HADR Standby State = Online
Thank you for your help.
------------------------------
Jean-Bernard Ngomiraronka
Original Message:
Sent: Tue April 22, 2025 03:31 AM
From: Damir Wilder
Subject: db2 pacemake setup has issue while doing
Hi Jean-Bernard,
Apart from the HADR Primary State showing "Offline" in the output of the db2cm -list command, is your cluster otherwise behaving as expected?
I.e. can you perform failovers between the HADR nodes without any issues?
Also, does Pacemaker automatically detect problems (when they occur) and perform a failover (if needed)?
Have you tried any of that (for example, shut down the primary and see what happens to the cluster)?
I'm wondering if the wrong status in the db2cm -list report is just a simple bug in the report, or has real implications on the cluster, which otherwise seems correctly set up.
Regards, Damir
------------------------------
Damir Wilder
Senior Consultant
Triton Consulting
London
Original Message:
Sent: Mon April 21, 2025 11:44 AM
From: Jean-Bernard Ngomiraronka
Subject: db2 pacemake setup has issue while doing
I need help to setup my Db2 Peacemaker cluster.
I setup the cluster successfully using db2cm utility. DB2 12.1.0 and the resources were created successfully.
db2cm -list is showing offline on DATABASE resource. why ?
The database HADR is in peer state, and connected.
db2pd -d SAMPLE -hadr | egrep HADR_STATE
HADR_STATE = PEER
db2cm -status looks good.
machine-1:~ # db2cm -status
2025-04-21-11.23.32
-------------------
Cluster Summary:
* Stack: corosync (Pacemaker is running)
* Current DC: machine-2 (version 2.1.7+20240411.81041cf0b-1.1.db2pcmk-2.1.7+20240411.81041cf0b) - partition with quorum
* Last updated: Mon Apr 21 11:23:32 2025 on machine-1
* Last change: Wed Apr 16 14:39:00 2025 by root via root on machine-1
* 2 nodes configured
* 8 resource instances configured
Node List:
* Online: [ machine-1 machine-2 ]
Full List of Resources:
* db2_ethmonitor_machine-1_eth0 (ocf::heartbeat:db2ethmon): Started machine-1
* db2_ethmonitor_machine-2_eth0 (ocf::heartbeat:db2ethmon): Started machine-2
* db2_machine-2_nginst_0 (ocf::heartbeat:db2inst): Started machine-2
* db2_machine-1_nginst_0 (ocf::heartbeat:db2inst): Started machine-1
* Clone Set: db2_nginst_nginst_SAMPLE-clone [db2_nginst_nginst_SAMPLE] (promotable):
* Masters: [ machine-1 ]
* Slaves: [ machine-2 ]
* db2_nginst_nginst_SAMPLE-primary-VIP (ocf::heartbeat:IPaddr2): Started machine-1
* db2_nginst_nginst_SAMPLE-standby-VIP (ocf::heartbeat:IPaddr2): Started machine-2
QDevice Information:
* Connected: [ easi-sh2-sit-qdevice-1:5403 ]
===========================================================
machine-1:~ # db2cm -list
HA Model: HADR
Domain Information:
Domain name = SAMPLE_DOMAIN
Cluster Manager = Corosync
Cluster Manager Version = 3.1.8
Resource Manager = Pacemaker
Resource Manager Version = 2.1.7+20240411.81041cf0b-1.1.db2pcmk
Current domain leader = machine-2
Number of nodes = 2
Number of resources = 7
Host Information:
HOSTNAME STATE
------------------------ -----------
machine-1 ONLINE
machine-2 ONLINE
Fencing Information:
Fencing Configured: Not configured
Quorum Information:
Quorum Type: Qdevice with LMS Algorithm
Total Votes: 3
Quorum Votes: 2
Quorum Nodes:
----------------
machine-1
machine-2
Resource Information:
Resource Name = db2_db2inst1_db2inst1_SAMPLE
Resource Type = Database
DB Name = SAMPLE
Managed = True
HADR Primary Instance = db2inst1
HADR Primary Node =
HADR Primary State = Offline
HADR Standby Instance = db2inst1
HADR Standby Node = machine-2
HADR Standby State = Online
Resource Name = db2_db2inst1_db2inst1_SAMPLE-primary-VIP
State = Online
Managed = True
Resource Type = IP
DB Name = SAMPLE
Role = Primary
Ip Address = 10.YYY.XXX.21
Current Host = machine-1
Resource Name = db2_db2inst1_db2inst1_SAMPLE-standby-VIP
State = Online
Managed = True
Resource Type = IP
DB Name = SAMPLE
Role = Standby
Ip Address = 10.YYY.XXX.22
Current Host = machine-2
Resource Name = db2_machine-1_db2inst1_0
State = Online
Managed = True
Resource Type = Instance
Node = machine-1
Instance = db2inst1
Resource Name = db2_machine-2_db2inst1_0
State = Online
Managed = True
Resource Type = Instance
Node = machine-2
Instance = db2inst1
Resource Name = db2_ethmonitor_machine-1_eth0
State = Online
Managed = True
Resource Type = Network Interface
Node = machine-1
Interface Name = eth0
Resource Name = db2_ethmonitor_machine-2_eth0
State = Online
Managed = True
Resource Type = Network Interface
Node = machine-2
Interface Name = eth0
Thank you for your advise.
------------------------------
Jean-Bernard Ngomiraronka
Original Message:
Sent: Mon October 07, 2024 12:29 PM
From: Gerry Sommerville
Subject: db2 pacemake setup has issue while doing
The interesting error is the following...
2024-10-04-10.11.58.746394 [execCmd][2848] Start /usr/sbin/corosync-qdevice-net-certutil -Q -n db2.prd.oppdeam.com prdmas-vm-db2-quorum-gd prdmas
-vm-db2-node1-rl prdmas-vm-db2-node2-xw
root@prdmas-vm-db2-node1-rl: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
root@prdmas-vm-db2-node1-rl: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
Node prdmas-vm-db2-node1-rl doesn't have /usr/sbin/corosync-qdevice-net-certutil installed
Specifically the last line does not come from db2cm script, instead it comes from https://github.com/corosync/corosync-qdevice/blob/main/qdevices/corosync-qdevice-net-certutil.sh#L237, and its a little misleading.
Reading the code it appears the node is attempting to ssh to itself, which results in the above "permission denied" error. Enabling root passwordless ssh for the local node should get you past this error.
------------------------------
Gerry Sommerville
Original Message:
Sent: Fri October 04, 2024 10:06 AM
From: Omkar Singh
Subject: db2 pacemake setup has issue while doing
is it possible you to help me?
------------------------------
Omkar Singh
Original Message:
Sent: Fri October 04, 2024 08:47 AM
From: Ivan Milojevic
Subject: db2 pacemake setup has issue while doing
Password-less ssh between all three servers (in both ways) is ok?
------------------------------
Ivan Milojevic
Belgrade
Original Message:
Sent: Fri October 04, 2024 06:56 AM
From: Omkar Singh
Subject: db2 pacemake setup has issue while doing
Yes it is open in Q server but not in db2 server
[root@prdmas-vm-db2-quorum-gd qnetd]# ss -tulpn | grep LISTEN | grep 5403
tcp LISTEN 0 10 *:5403 *:* users:(("corosync-qnetd",pid=185075,fd=8))
------------------------------
Omkar Singh
Original Message:
Sent: Fri October 04, 2024 06:47 AM
From: Damir Wilder
Subject: db2 pacemake setup has issue while doing
There's this in your dump:
corosync-qdevice-tool: Can't connect to QDevice socket
Did you check if the TCP port 5403 is open between the cluster hosts (including the QDevice host)?
------------------------------
Damir Wilder
Senior Consultant
Triton Consulting
London
Original Message:
Sent: Fri October 04, 2024 06:32 AM
From: Omkar Singh
Subject: db2 pacemake setup has issue while doing
Hi
Yes i've tried it
OS level :[root@prdmas-vm-db2-node1-rl qnetd]# cat /etc/redhat-release
Red Hat Enterprise Linux release 8.6 (Ootpa)
[root@prdmas-vm-db2-node1-rl qnetd]# cat /etc/redhat-release
Red Hat Enterprise Linux release 8.6 (Ootpa)
error in /tmp
2024-10-04-10.11.56.179280 [execCmd][1744] Start rpm -q corosync | grep db2pcmk
corosync-3.0.4-2.db2pcmk.el8.x86_64
2024-10-04-10.11.56.186590 [execCmd][1744] End
2024-10-04-10.11.56.187635 [execCmd][1744] Start rpm -q pacemaker | grep db2pcmk
pacemaker-2.0.5-10.db2pcmk.el8.x86_64
2024-10-04-10.11.56.194673 [execCmd][1744] End
2024-10-04-10.11.56.195708 [db2cm] Start db2cm -create -qdevice prdmas-vm-db2-quorum-gd
2024-10-04-10.11.56.198812 [execCmd][938] Start crm_node -l
1 prdmas-vm-db2-node1-rl member
2 prdmas-vm-db2-node2-xw member
2024-10-04-10.11.56.205657 [execCmd][938] End
2024-10-04-10.11.56.213096 [execCmd][2805] Start corosync-qdevice-tool -s
corosync-qdevice-tool: Can't connect to QDevice socket (is QDevice running?): No such file or directory
2024-10-04-10.11.56.215384 [execCmd][2805] End - Failed
2024-10-04-10.11.56.216270 [execCmd][1171] Start ssh -o PreferredAuthentications=publickey prdmas-vm-db2-node2-xw /bin/true
2024-10-04-10.11.56.410512 [execCmd][1171] End
2024-10-04-10.11.56.411631 [execCmd][1171] Start ssh -o PreferredAuthentications=publickey prdmas-vm-db2-quorum-gd /bin/true
2024-10-04-10.11.56.609958 [execCmd][1171] End
2024-10-04-10.11.56.610871 [execCmd][2820] Start rpm -qa | grep corosync-qdevice
corosync-qdevice-3.0.3-1.db2pcmk.el8.x86_64
corosync-qdevice-debuginfo-3.0.3-1.db2pcmk.el8.x86_64
corosync-qdevice-debugsource-3.0.3-1.db2pcmk.el8.x86_64
corosync-qdevice-devel-3.0.3-1.db2pcmk.el8.x86_64
2024-10-04-10.11.57.077182 [execCmd][2820] End
2024-10-04-10.11.57.078107 [execCmd][2825] Start ssh prdmas-vm-db2-node2-xw "rpm -qa | grep corosync-qdevice"
corosync-qdevice-3.0.3-1.db2pcmk.el8.x86_64
corosync-qdevice-debuginfo-3.0.3-1.db2pcmk.el8.x86_64
corosync-qdevice-debugsource-3.0.3-1.db2pcmk.el8.x86_64
corosync-qdevice-devel-3.0.3-1.db2pcmk.el8.x86_64
2024-10-04-10.11.57.742634 [execCmd][2825] End
2024-10-04-10.11.57.743470 [execCmd][2830] Start ssh prdmas-vm-db2-quorum-gd "rpm -qa | grep corosync-qnetd"
corosync-qnetd-3.0.3-1.db2pcmk.el8.x86_64
corosync-qnetd-debuginfo-3.0.3-1.db2pcmk.el8.x86_64
2024-10-04-10.11.58.449365 [execCmd][2830] End
totem {
version: 2
cluster_name: db2.prd.oppdeam.com
transport: knet
token: 10000
crypto_cipher: aes256
crypto_hash: sha256
}
nodelist {
node {
ring0_addr: prdmas-vm-db2-node1-rl
name: prdmas-vm-db2-node1-rl
nodeid: 1
}
node {
ring0_addr: prdmas-vm-db2-node2-xw
name: prdmas-vm-db2-node2-xw
nodeid: 2
}
}
quorum {
provider: corosync_votequorum
two_node: 1
}
logging {
to_logfile: yes
logfile: /var/log/cluster/corosync.log
to_syslog: yes
timestamp: on
function_name: on
fileline: on
}
2024-10-04-10.11.58.573237 [execCmd][601] End
2024-10-04-10.11.58.577021 [execCmd][2837] Start ssh prdmas-vm-db2-quorum-gd "test -f /etc/corosync/qnetd/nssdb/cluster-db2.prd.oppdeam.com.crt"
2024-10-04-10.11.58.735273 [execCmd][2837] End - Failed
2024-10-04-10.11.58.736404 [execCmd][2843] Start /bin/cp -f /etc/corosync/corosync.conf /tmp/db2cm.tmp.corosync.conf
2024-10-04-10.11.58.739187 [execCmd][2843] End
2024-10-04-10.11.58.746394 [execCmd][2848] Start /usr/sbin/corosync-qdevice-net-certutil -Q -n db2.prd.oppdeam.com prdmas-vm-db2-quorum-gd prdmas
-vm-db2-node1-rl prdmas-vm-db2-node2-xw
root@prdmas-vm-db2-node1-rl: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
root@prdmas-vm-db2-node1-rl: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
Node prdmas-vm-db2-node1-rl doesn't have /usr/sbin/corosync-qdevice-net-certutil installed
2024-10-04-10.11.58.897935 [execCmd][2848] End - Failed
2024-10-04-10.11.58.898949 [db2cm] End execution with exit code 1 on line 2850
Regards
______________________________
Omkar Singh
DevSecOps WW – Internal Delivery
Mobile: +91-9448344241
e-mail: omkar.singh@in.ibm.com
slack: @omkar.singh
Original Message:
Sent: 10/4/2024 6:07:00 AM
From: Damir Wilder
Subject: RE: db2 pacemake setup has issue while doing
Omkar,
What are your OS and DB2 levels?
Just wondering, because when I installed Pacemaker (DB2 v11.5.9 on RHEL9), I got the exact same packages as yours above, except my packages had "el9" in the name (whereas yours have "el8").
Also, what happens if you try executing the QDevice setup command using the IP address of the QDevice server, instead of the server's name?
Regards, Damir
------------------------------
Damir Wilder
Senior Consultant
Triton Consulting
London
Original Message:
Sent: Fri October 04, 2024 05:49 AM
From: Omkar Singh
Subject: db2 pacemake setup has issue while doing
[root@prdmas-vm-db2-node1-rl ~]# rpm -qa | grep corosync-qdevice
corosync-qdevice-3.0.3-1.db2pcmk.el8.x86_64
corosync-qdevice-debuginfo-3.0.3-1.db2pcmk.el8.x86_64
corosync-qdevice-debugsource-3.0.3-1.db2pcmk.el8.x86_64
corosync-qdevice-devel-3.0.3-1.db2pcmk.el8.x86_64
[root@prdmas-vm-db2-quorum-gd ~]# rpm -qa | grep corosync-qnetd
corosync-qnetd-3.0.3-1.db2pcmk.el8.x86_64
corosync-qnetd-debuginfo-3.0.3-1.db2pcmk.el8.x86_64
------------------------------
Omkar Singh
Original Message:
Sent: Fri October 04, 2024 04:06 AM
From: Damir Wilder
Subject: db2 pacemake setup has issue while doing
Also worth checking if the required packages are installed on the (relevant) cluster nodes:
1. corosync-qdevice - on all HADR nodes (primary, standby):
rpm -qa | grep corosync-qdevice
2. corosync-qnetd - on the QDevice host:
rpm -qa | grep corosync-qnetd
Hope this helps.
------------------------------
Damir Wilder
Senior Consultant
Triton Consulting
London
Original Message:
Sent: Fri October 04, 2024 03:29 AM
From: Omkar Singh
Subject: db2 pacemake setup has issue while doing
Yes you are right
db2 node1: prdmas-vm-db2-node1-rl
db2 node2 prdmas-vm-db2-node2-xw
Quorum Device : prdmas-vm-db2-quorum-gd
[root@prdmas-vm-db2-node1-rl bin]# ./db2cm -create -qdevice prdmas-vm-db2-quorum-gd
Error: Could not create qdevice via corosync-qdevice-net-certutil
[root@prdmas-vm-db2-node1-rl bin]# systemctl status corosync-qnetd
● corosync-qnetd.service - Corosync Qdevice Network daemon
Loaded: loaded (/usr/lib/systemd/system/corosync-qnetd.service; disabled; vendor preset: disabled)
Active: inactive (dead)
Docs: man:corosync-qnetd
------------------------------
Omkar Singh
Original Message:
Sent: Fri October 04, 2024 02:50 AM
From: Peter Schurr
Subject: db2 pacemake setup has issue while doing
Hi Omkar,
is the service "corosync-qnetd" running on the qdevice "prdmas-vm-db2-quorum-gd"?
You can check by "systemctl status corosync-qnetd".
Regards, Peter
------------------------------
Peter Schurr
Original Message:
Sent: Thu October 03, 2024 10:52 PM
From: Omkar Singh
Subject: db2 pacemake setup has issue while doing
Hi Ivan, Good morning.
Yes it is initial one and fresh installation
------------------------------
Omkar Singh
Original Message:
Sent: Thu October 03, 2024 01:58 PM
From: Ivan Milojevic
Subject: db2 pacemake setup has issue while doing
Hi Omkar,
Is this the initial creation of qdevice or existing cluster configuration where qdevice existed before, so you are trying to add it again?
Ivan
------------------------------
Ivan Milojevic
Belgrade
Original Message:
Sent: Thu October 03, 2024 03:50 AM
From: Omkar Singh
Subject: db2 pacemake setup has issue while doing
Hi All,
i'm getting below error while setup up cluster
[root@prdmas-vm-db2-node1-rl bin]# ./db2cm -create -qdevice prdmas-vm-db2-quorum-gd
Error: Could not create qdevice via corosync-qdevice-net-certutil
logs in /tmp
2024-10-03-07.27.37.261991 [execCmd][1744] Start rpm -q corosync | grep db2pcmk
corosync-3.0.4-2.db2pcmk.el8.x86_64
2024-10-03-07.27.37.272158 [execCmd][1744] End
2024-10-03-07.27.37.273321 [execCmd][1744] Start rpm -q pacemaker | grep db2pcmk
pacemaker-2.0.5-10.db2pcmk.el8.x86_64
2024-10-03-07.27.37.282157 [execCmd][1744] End
2024-10-03-07.27.37.283364 [db2cm] Start db2cm -create -qdevice prdmas-vm-db2-quorum-gd
2024-10-03-07.27.37.286677 [execCmd][938] Start crm_node -l
1 prdmas-vm-db2-node1-rl member
2 prdmas-vm-db2-node2-xw member
2024-10-03-07.27.37.294999 [execCmd][938] End
2024-10-03-07.27.37.302045 [execCmd][2805] Start corosync-qdevice-tool -s
corosync-qdevice-tool: Can't connect to QDevice socket (is QDevice running?): No such file or directory
2024-10-03-07.27.37.303859 [execCmd][2805] End - Failed
2024-10-03-07.27.37.304734 [execCmd][1171] Start ssh -o PreferredAuthentications=publickey prdmas-vm-db2-node2-xw /bin/true
2024-10-03-07.27.37.523086 [execCmd][1171] End
2024-10-03-07.27.37.524672 [execCmd][1171] Start ssh -o PreferredAuthentications=publickey prdmas-vm-db2-quorum-gd /bin/true
2024-10-03-07.27.37.741133 [execCmd][1171] End
2024-10-03-07.27.37.742178 [execCmd][2820] Start rpm -qa | grep corosync-qdevice
corosync-qdevice-3.0.3-1.db2pcmk.el8.x86_64
corosync-qdevice-debuginfo-3.0.3-1.db2pcmk.el8.x86_64
corosync-qdevice-debugsource-3.0.3-1.db2pcmk.el8.x86_64
corosync-qdevice-devel-3.0.3-1.db2pcmk.el8.x86_64
2024-10-03-07.27.38.231010 [execCmd][2820] End
2024-10-03-07.27.38.231982 [execCmd][2825] Start ssh prdmas-vm-db2-node2-xw "rpm -qa | grep corosync-qdevice"
corosync-qdevice-3.0.3-1.db2pcmk.el8.x86_64
corosync-qdevice-debuginfo-3.0.3-1.db2pcmk.el8.x86_64
corosync-qdevice-debugsource-3.0.3-1.db2pcmk.el8.x86_64
corosync-qdevice-devel-3.0.3-1.db2pcmk.el8.x8
------------------------------
Omkar Singh
------------------------------