Automation with Power

Power Business Continuity and Automation

Connect, learn, and share your experiences using the business continuity and automation technologies and practices designed to ensure uninterrupted operations and rapid recovery for workloads running on IBM Power systems. 


#Power
#TechXchangeConferenceLab

 View Only
Expand all | Collapse all

Cross-Site LVM Mirroring Problem

  • 1.  Cross-Site LVM Mirroring Problem

    Posted Wed April 24, 2013 04:11 PM

    Originally posted by: GregioRenato


     

    Hi everyone, 
     
    I´m claim for help for the following scenario:
    I have an environment AIX 6.1 with PowerHA Cross-Site LVM Mirroring with SAP and Oracle. it consists in 2 LPARs in different sites, and P795 and Cross-Site LVM Mirroring between 2 Storage Disk DS8700. My sites are connected by an DWDM Link.   Exactly like image attached.
     
    I have Cross-Site LVM Mirroring, my copies are with perfect state ( every PP in disk from Site A is cloned on PP in disk from Site B ), and all my VGs are with quorum Disabled.
    When my DWDM Link fail, they need 3 seconds to automaticaly migrate for alternate/redundant LINK. When it happens( link lost to other storage Disk), AIX generate an error in ERRPT database and my VG identify that i lost access to disks from remote site and mark LVOLS in stale state. 
    Last month i had that problem, losing link between sites for 3 seconds, consequently i lost access from redundant Storage and my systems remained accessing disks from local Storage Only.
    My HACMP didn´t detect errors, it was expected because i have Cross-Site LVM Mirroring, but i had a lot of other problems that cause a big impact for Oracle:
    LVOLs Marked as Stale State ( expected )
    AIX Generate error "PATH HAS FAILED" for disks from remote site (expected )
    AIX Generate error "I/O ERROR DETECTED BY LVM"  (Not Expected)
    Oracle can´t access filesystem and lock
     
    After stabilish the environment, i open an PMR at IBM, and i´m trying to identify "Why i have I/O ERROR DETECTED BY LVM if i have integrity in my Cross-Site LVM MIrroring implemented"
     

     

    I think that this problem can have relation with some disk tunning parameters, like "hcheck_interval" or "rw_timeout". Where disks wait a lot of time for second disk mirror response time and oracle can´t wait this amount of time. So, i´m planning do an tunning in these parameters, putting arount 3 seconds.
     
    Someone can help me to find solution for this problem?

    Thanks,
    Renato Gregio


    #PowerHAforAIX
    #PowerHA-(Formerly-known-as-HACMP)-Technical-Forum


  • 2.  Re: Cross-Site LVM Mirroring Problem

    Posted Fri May 03, 2013 02:33 PM

    Originally posted by: bodily


    Correct in the "My HACMP didn´t detect errors," as HACMP does not need to do anything in that case. This is purely AIX LVM. You would have the EXACT same results w/o HACMP in this scenario. I

    I have seen in testing i/o hangs to the primary/only copy left in the 3-5 minute range before. The PowerHA 6.1 Enterprise Edition redbook actually documented results of:

    "The status of the resource group pokrg is still available in node Zhifa for around 5 minutes, and during that time the application appears to be hung and the users cannot write or read to the disks."

    I would like to think there are some tuning parameters to help, but can't say I've had luck. I tried fast_fail on the fc adapter,  hcheck_interval, and queue_depth and results were still about 90% the same.

    My inclination, is its more fiber related. I have a long history with LVM mirroring and I don't recall seeing these significant delays in SCSI and SSA storage days. But I MAY have selective memory these days.

    I would be greatly curious if support does give you some options that help this as I would like make note of it and push it out in our pubs if possible.

     

     


    #PowerHAforAIX
    #PowerHA-(Formerly-known-as-HACMP)-Technical-Forum