Automation with Power

Power Business Continuity and Automation

Connect, learn, and share your experiences using the business continuity and automation technologies and practices designed to ensure uninterrupted operations and rapid recovery for workloads running on IBM Power systems.

#Power
#TechXchangeConferenceLab

View Only

Back to discussions

Expand all | Collapse all

Question on how disk heartbeat should work

1. Question on how disk heartbeat should work

Like
Archive User
Posted Thu August 13, 2009 05:58 PM

Reply
Originally posted by: Emittim3@Sirius

All:
I have a configuration where I have a DS3400 attached to two Power 6 p520 servers, directly attached (no fiber switches). I have a disk heartbeat network through concurrent volume groups for PowerHA, a second heartbeat through TCP/IP. I've run through all sorts of failure scenario testing, and everything is working great, except for one case. When I pull both cables from the active node (two node active/passive cluster), I/O does hang, and PowerHA eventually detects that the disk heartbeat network is down. However, it sits there with hung I/O without ever failing the resource group from the disconnected node to the stand by node. So my question is:

Should PowerHA detect that it does not have connectivity to storage on the active node, and fail the resource group over to the stand by node (as the stand by node does still have connectivity to storage). Or - is this type of loss of connectivity to storage only something that can be handled through a custom HA event?

Thanks in advance!

Cheers,
Philip Greer
#PowerHA-(Formerly-known-as-HACMP)-Technical-Forum
#PowerHAforAIX
2. Re: Question on how disk heartbeat should work

Like
Archive User
Posted Fri August 14, 2009 07:13 AM

Reply
Originally posted by: j.gann

do you have a shared volume group configured? hacmp detects storage failure through loss of quorum errorlog entries (see "odmget errnotify").

Joachim Gann
#PowerHA-(Formerly-known-as-HACMP)-Technical-Forum
#PowerHAforAIX
3. Re: Question on how disk heartbeat should work

Like
Archive User
Posted Fri August 14, 2009 07:18 AM

Reply
Originally posted by: RosieK

Hi Philip,

Does your rootvg also reside on the DS4k disk to which you have removed access?

Regards

Rosie Killeen
IBM PowerHA Specialist
#PowerHA-(Formerly-known-as-HACMP)-Technical-Forum
#PowerHAforAIX
4. Re: Question on how disk heartbeat should work

Like
Archive User
Posted Mon August 17, 2009 10:49 AM

Reply
Originally posted by: Emittim3@Sirius

Hey all! Thanks for the responses.

Yes, I do have a shared volume group (concurrently varied on on both systems).

My root volume group is on two internal SAS drives (AIX LVM mirrored).

Here's a synopsis:

root@sallreno:/ # lsvg -o
ezpickvg
rootvg
root@sallreno:/ # lspv
hdisk0 00caf8f42e098549 rootvg active
hdisk1 00caf8f4cd821fb1 None
hdisk2 00caf8f4c6db0134 ezpickvg concurrent
hdisk3 00caf8f4c6db021d ezpickvg concurrent
hdisk4 00caf8f4c6db031c ezpickvg concurrent
hdisk5 00caf8f4c6db03ad ezpickvg concurrent

root@sallren1:/ # lsvg -o
rootvg
root@sallren1:/ # lspv
hdisk0 00cc1ce444b00ac9 rootvg active
hdisk1 00cc1ce4cdf06756 rootvg active
hdisk2 00caf8f4c6db0134 ezpickvg concurrent
hdisk3 00caf8f4c6db021d ezpickvg concurrent
hdisk4 00caf8f4c6db031c ezpickvg concurrent
hdisk5 00caf8f4c6db03ad ezpickvg concurrent
So the above shows the volume group ezpickvg that is on concurrently on each system.
Here's the heartbeat networks:

root@sallreno:/usr/es/sbin/cluster/utilities # ./clhbs
NETWORK:en2 192.168.0.1 0 0
NETWORK:rhdisk2 255.255.10.1 87 87
UPTIME:411032

Here's the topology:
root@sallreno:/usr/es/sbin/cluster/utilities # ./cltopinfo
Cluster Name: ezpick
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
There are 2 node(s) and 2 network(s) defined

NODE sallren1:
Network ether1
ezreno 10.252.164.50
sallren1 192.168.0.2
Network net_diskhb_01
sallren1_hdisk2_01 /dev/hdisk2

NODE sallreno:
Network ether1
ezreno 10.252.164.50
sallreno 192.168.0.1
Network net_diskhb_01
sallreno_hdisk2_01 /dev/hdisk2

Resource Group ezpickrg
Startup Policy Online On Home Node Only
Fallover Policy Fallover To Next Priority Node In The List
Fallback Policy Never Fallback
Participating Nodes sallreno sallren1
Service IP Label ezreno

Total Heartbeats Missed: 87
Cluster Topology Start Time: 08/12/2009 11:32:55

Here's the resource group:
root@sallreno:/usr/es/sbin/cluster/utilities # ./clshowres

Resource Group Name ezpickrg
Participating Node Name(s) sallreno sallren1
Startup Policy Online On Home Node Only
Fallover Policy Fallover To Next Priority Node In The List
Fallback Policy Never Fallback
Site Relationship ignore
Dynamic Node Priority
Service IP Label ezreno
Filesystems ALL
Filesystems Consistency Check logredo
Filesystems Recovery Method parallel
Filesystems/Directories to be exported (NFSv2/NFSv3)/u/easy/ep2/controller /u/diskless
Filesystems/Directories to be exported (NFSv4)
Filesystems to be NFS mounted
Network For NFS Mount
Filesystem/Directory for NFSv4 Stable Storage
Volume Groups ezpickvg
Concurrent Volume Groups
Use forced varyon for volume groups, if necessary false
Disks
GMVG Replicated Resources
GMD Replicated Resources
PPRC Replicated Resources
ERCMF Replicated Resources
SVC PPRC Replicated Resources
Connections Services
Fast Connect Services
Shared Tape Resources
Application Servers ezpickas
Highly Available Communication Links
Primary Workload Manager Class
Secondary Workload Manager Class
Delayed Fallback Timer
Miscellaneous Data
Automatically Import Volume Groups false
Inactive Takeover
SSA Disk Fencing false
Filesystems mounted before IP configured true
WPAR Name
Run Time Parameters:

Node Name sallreno
Debug Level high
Format for hacmp.out Standard

Node Name sallren1
Debug Level high
Format for hacmp.out Standard
#PowerHA-(Formerly-known-as-HACMP)-Technical-Forum
#PowerHAforAIX
5. Re: Question on how disk heartbeat should work

Like
Archive User
Posted Mon August 17, 2009 01:28 PM

Reply
Originally posted by: Casey_B

Hello Philip,

To start with, a clarifying statement: Disk heartbeating is
an additional communication path between nodes to remove the
ip networking as a single point of failure.

It's good that you don't have your rootvg on the SAN, this can
be problematic as Rosie mentioned.

You didn't answer Joachim's question, and I think that is the most important
(In this case)

AIX uses the error notify objects to run commands when there are
new entries in the error report.

PowerHA will use those error notify objects to fallover the cluster
when there has been a loss of quorum in one of the volume groups.

A particularity of LVM is that in addition to quorum, you also need to have
one mirrored logical volume on the disks.

So, take a look at lsvg datavg
and lsvg -l datavg
to see if you have quorum and also have a mirrored lv.

If this doesn't match with your architecture, or design of the
cluster...Then don't worry, there are always several different ways to
architecture a system.

You can use an application monitor to query some important files
in your filesystem, and cause a fallover if the files are inaccessible,
or not as you expect.

Also, it is possible that the application monitor for your application
will also help to determine if the disk has failed in some way.

(For instance, you could have an application monitor that actually
makes a connection to a database, and makes counts on some rows...)

Hope this helps,
Casey
#PowerHA-(Formerly-known-as-HACMP)-Technical-Forum
#PowerHAforAIX

Automation with Power

Power Business Continuity and Automation

Question on how disk heartbeat should work

Archive UserThu August 13, 2009 05:58 PM

Archive UserFri August 14, 2009 07:13 AM

Archive UserFri August 14, 2009 07:18 AM

Archive UserMon August 17, 2009 10:49 AM

Archive UserMon August 17, 2009 01:28 PM

1. Question on how disk heartbeat should work

2. Re: Question on how disk heartbeat should work

3. Re: Question on how disk heartbeat should work

4. Re: Question on how disk heartbeat should work

5. Re: Question on how disk heartbeat should work

Additional
Resources

Office

Quick Links

Automation with Power

Power Business Continuity and Automation

Question on how disk heartbeat should work

Archive UserThu August 13, 2009 05:58 PM

Archive UserFri August 14, 2009 07:13 AM

Archive UserFri August 14, 2009 07:18 AM

Archive UserMon August 17, 2009 10:49 AM

Archive UserMon August 17, 2009 01:28 PM

1. Question on how disk heartbeat should work

2. Re: Question on how disk heartbeat should work

3. Re: Question on how disk heartbeat should work

4. Re: Question on how disk heartbeat should work

5. Re: Question on how disk heartbeat should work

Related Content

PowerHA SystemMirror for AIX references

FileSystem on HACMP

filesystem extend in HACMP

PowerHA 5.5 not Unmounting Filesystems

PowerHA 5.5 not Unmounting Filesystems

Additional Resources

Office

Quick Links

Additional
Resources