Db2

Connect with Db2, open source, and other data experts to gain value from your data, share insights, and solve problems.

View Only

Back to discussions

Expand all | Collapse all

IBM db2 pureScale cluster peer domain

1. IBM db2 pureScale cluster peer domain

Like
jaison chipuka
Posted Sun November 02, 2025 10:46 AM

Reply
GPFS (IBM Spectrum Scale) cluster is operational - all file systems are mounted and the mmgetstate -a output shows every node as active.
However, the underlying RSCT (Reliable Scalable Cluster Technology) layer - which GPFS depends on for communication, quorum, and cluster membership - is not fully healthy.

What's Working

GPFS daemons (mmfsd) are running on all nodes.

File systems (db2fs1, logfs) are mounted across all nodes.

Cluster manager and filesystem managers are correctly assigned.

Data access and I/O are functional.

What's Broken

The RSCT peer domain (db2domain_20241208010205) is online only on node1-database and node1-database-cf.

The two node2 systems (node2-cbs-database, node2-database-cf) are offline in RSCT - lsrpnode shows them as Offline.

When attempting to rejoin or start them using startrpnode, the system returns:

→ This means RSCT security trust is broken between the node1 and node2 pairs.

2610-441 Permission is denied to access the resource class specified in this command. Network Identity UNAUTHENT requires 's' permission for the resource class IBM.PeerDomain on node node2-cbs-database.

Root Cause

During RSCT reconfiguration (recfgct) on the node2 systems:

Their RMC security credentials (certificates and keystores) were regenerated under /var/ct/cfg/.

The domain owner (node1) still holds the old certificates.

As a result, node1 and node2 can't authenticate each other in the RSCT peer domain.

RSCT therefore flags node2s as UNAUTHENTICATED (UNAUTHENT) and refuses to start them in the cluster.

Impact

RSCT cluster membership is partial - quorum and event management are unreliable.

GPFS continues to run using cached configuration, but:

No automatic failover or fencing will occur.

If a node restarts or GPFS restarts, it may fail to rejoin until RSCT is fixed.

DB2 pureScale or PowerHA services relying on RSCT communication will also fail to detect node state properly.

Next Steps (Resolution Path)

On the two offline nodes, reset RSCT security and configuration:

Stop RMC (rmcctrl -z)

Remove /var/ct/cfg/ctrmc.* files

Rebuild RSCT (recfgct)

Restart RMC (rmcctrl -A)

On all nodes (including node1s), restart RMC so new trust keys propagate.

Use startrpnode from node1 to rejoin the node2 systems.

Verify that lsrpnode shows all four nodes Online and that GPFS remains healthy.

Summary Statement for ReportThe GPFS cluster is operational, but RSCT domain membership is partially degraded due to broken RMC authentication between node1 and node2 pairs. This occurred after the RSCT reconfiguration regenerated new RMC certificates on node2 systems, invalidating trust with the domain owner.
Corrective action involves reinitializing RMC security and rejoining the node2 systems to the RSCT peer domain to restore full cluster quorum and event coordination.

Would you like me to phrase this as a formal "Incident Summary and Root Cause Analysis" section that you can paste directly into your maintenance or problem log (with "Cause," "Impact," "Resolution," "Verification" headings)?

------------------------------
jaison chipuka
------------------------------

Db2

Db2

IBM db2 pureScale cluster peer domain

1. IBM db2 pureScale cluster peer domain

What's Working

What's Broken

Root Cause

Impact

Additional
Resources

Office

Quick Links

Db2

Db2

IBM db2 pureScale cluster peer domain

1. IBM db2 pureScale cluster peer domain

What's Working

What's Broken

Root Cause

Impact

Related Content

The Book of Db2 Pacemaker - Chapter 4: Quorumania

pureScale with Pacemaker - Chapter 2: Have we reached quorum?

Webinar: pureScale with Pacemaker: Have we reached quorum?

BUFFER POOL ISSUE ON A DB2 PURESCALE ENVIRONMENT

DB2 PURESCALE DEPLOYMENT WITH HADR

Additional Resources

Office

Quick Links

Additional
Resources