Automation with Power

Power Business Continuity and Automation

Connect, learn, and share your experiences using the business continuity and automation technologies and practices designed to ensure uninterrupted operations and rapid recovery for workloads running on IBM Power systems. 


#Power
#TechXchangeConferenceLab

 View Only
  • 1.  HA/XD 6.1

    Posted Mon July 18, 2011 05:05 PM

    Originally posted by: edgvlad


    I 've configured a cluster XD with two nodes and GLVM. The cluster is UP but UNSTABLE.
    the second node is trying to acquire the resource.

    Cluster: clusterprog (1641668885)
    Mon Jul 18 15:48:40 CDT 2011
    State: UP Nodes: 2
    SubState: UNSTABLE
    Node: gsimx2 State: UP
    Interface: gsimx2 (3) Address: 108.100.100.2
    State: UP
    Interface: gsi2en2 (0) Address: 102.100.100.2
    State: UP
    Interface: gsi2en6 (1) Address: 106.100.100.2
    State: UP
    Interface: gsi2rpv (2) Address: 109.100.100.2
    State: UP
    Resource Group: rgprogress State: Acquiring (Secon
    dary)

    in hacmp.log shows:

    WARNING: Cluster clusterprog has been running recovery program 'TE_RG_MOVE_ACQUI
    RE_SECONDARY' for 4020 seconds. Please check cluster status.
    WARNING: Cluster clusterprog has been running recovery program 'TE_RG_MOVE_ACQUI
    RE_SECONDARY' for 4500 seconds. Please check cluster status.
    in clstrmgr.debug

    Mon Jul 18 15:59:34 PollAliasEvents: State not STABLE/RP_RUNNING or ibcasts, re
    turn
    Mon Jul 18 16:00:04 PollAliasEvents: State not STABLE/RP_RUNNING or ibcasts, re
    turn
    Mon Jul 18 16:00:34 PollAliasEvents: State not STABLE/RP_RUNNING or ibcasts, re
    turn

    AIX version is 5300-12-02-1036
    The GLVM its OK, it doesn't have STALE partitions.
    I' ve rebooted my servers and the cluster alwayas show UNSTABLE.
    Coud you help me???
    #PowerHAforAIX
    #PowerHA-(Formerly-known-as-HACMP)-Technical-Forum


  • 2.  Re: HA/XD 6.1

    Posted Mon July 18, 2011 06:51 PM

    Originally posted by: edgvlad


    I have more information.

    this the output from node2(Secondary)
    ================================================================
    gsimx2:/var/hacmp/log >lssrc -ls clstrmgrES

    Current state: ST_CBARRIER

    sccsid = "@(#)36 1.135.1.97 src/43haes/usr/sbin/cluster/hacmprd/main.C, hacmp.pe, 53haes_r610, 0933A_hacmp610 8/8/09 14:44:29"

    i_local_nodeid 1, i_local_siteid 2, my_handle 2

    ml_idx[1]=0 ml_idx[2]=1

    tp is 201a3378

    Events on event queue:

    te_type 1, te_nodeid 2, te_network -1

    te_type 36, te_nodeid 2, te_network 1

    te_type 10, te_nodeid 2, te_network -1

    There are 0 events on the Ibcast queue

    There are 0 events on the RM Ibcast queue

    CLversion: 11

    local node vrmf is 6100

    cluster fix level is "0"

    The following timer(s) are currently active:

    Event error node list: gsimx1

    Current DNP values

    DNP Values for NodeId - 0 NodeName - gsimx1

    PgSpFree = 0 PvPctBusy = 0 PctTotalTimeIdle = 0.000000

    DNP Values for NodeId - 0 NodeName - gsimx2

    PgSpFree = 0 PvPctBusy = 0 PctTotalTimeIdle = 0.000000

    gsimx2:/var/hacmp/log >

    =====================================================================
    This is the output from node1

    ========================================================================

    simx1:/usr/es/sbin/cluster>lssrc -ls clstrmgrES

    Current state: ST_RP_FAILED

    sccsid = "@(#)36 1.135.1.97 src/43haes/usr/sbin/cluster/hacmprd/main.C, hacmp.pe, 53haes_r610, 0933A_hacmp610 8/8/09 14:44:29"

    i_local_nodeid 0, i_local_siteid 1, my_handle 1

    ml_idx[1]=0 ml_idx[2]=1

    tp is 204c1418

    Events on event queue:

    te_type 1, te_nodeid 2, te_network -1

    te_type 36, te_nodeid 1, te_network 1

    te_type 36, te_nodeid 2, te_network 1

    te_type 10, te_nodeid 2, te_network -1

    There are 0 events on the Ibcast queue

    There are 0 events on the RM Ibcast queue

    CLversion: 11

    local node vrmf is 6100

    cluster fix level is "0"

    The following timer(s) are currently active:

    Event error node list: gsimx1

    Current DNP values

    DNP Values for NodeId - 0 NodeName - gsimx1

    PgSpFree = 0 PvPctBusy = 0 PctTotalTimeIdle = 0.000000

    DNP Values for NodeId - 0 NodeName - gsimx2

    PgSpFree = 0 PvPctBusy = 0 PctTotalTimeIdle = 0.000000

    gsimx1:/usr/es/sbin/cluster>

    =====================================================================
    When I tried to stop the node2 it sent the next message:

    ===========================================

    Command: failed stdout: yes stderr: no

    Before command completion, additional instructions may appear below.

    cl_clstop: ERROR: Node gsimx2 has 3 event(s) outstanding as reported by command

    'lssrc -ls clstrmgrES' and cannot be stopped until all outstanding events have c

    ompleted. The stop request has been aborted for all nodes. Please wait for all

    nodes to stabalize before attempting to stop cluster services again.
    =============================================================================
    How can I know what is happenning?? The problem is network???
    #PowerHA-(Formerly-known-as-HACMP)-Technical-Forum
    #PowerHAforAIX