Originally posted by: Bonzodog
We are having an issue with a new installed HACMP system which, from being perfectly happy during testing, is now not behaving itself during failover.
There are a number of weird things going on which I will work through:
1) During resource release from machine A during resource failover to Machine B the rpc.lockd daemon is not dying. Looking at hacmp.out I can see repeated entries such as
+PRIMES3_RG:rg_move_complete
+271 [
24 -gt 0 ] +PRIMES3_RG:rg_move_complete
+267 lssrc -s rpc.lockd
+PRIMES3_RG:rg_move_complete
+267 LC_ALL=C
+PRIMES3_RG:rg_move_complete
+267 grep stopping
rpc.lockd nfs 294926 stopping
+PRIMES3_RG:rg_move_complete
+267 [
0 = 0 ] +PRIMES3_RG:rg_move_complete
+270 +PRIMES3_RG:rg_move_complete
+270 expr 24 - 1
COUNT=23
+PRIMES3_RG:rg_move_complete
+271 sleep 1
being repeated until the counter inexorably gets to zero and the node goes into error.
Running the "Recover From HACMP Script Failure" comman from smit causes pretty well immediate failover!
2) During cluster start on Machine A as the active node (this doesn't happen on B) only file systems which have been marked for NFS export (within the cluster resources) have actually been mounted!
This all points to NFS issues.
What has changed?
Tivoli has been installed since we performed all the tests!
Anyone got any idea (apart from remove Tivoli!) what on earth to do?
#PowerHAforAIX#PowerHA-(Formerly-known-as-HACMP)-Technical-Forum