PowerVM

 View Only
  • 1.  vNIC - failover takes long time

    IBM Champion
    Posted Mon January 15, 2024 10:39 AM

    Hello,

    I wonder if anyone from group experienced vNIC failover which takes a lot of time? We were testing vNIC for last few years with IBM i clients but apparently the technology does not offer an enterprise standard. We have seen several times that the failover takes too long (from 30 seconds to few minutes) which causing many of TCP jobs to crash. 

    We are not able to simulate the scenario to provide IBM enough logs, but we have seen long failover if there is an outage on a LAN switch or when SRIOV firmware upgrade is happening. If we initiate a failover manually, it is smooth and fast. 

    Unfortunately, this unpredicted behavior is stopping us with promotion vNIC as main network virtualization engine. 

    Does anyone else experienced same behavior on the other operating systems?  I've seen this document which seems to document that there is an issue somewhere. 

    Why vNIC failover/failback takes long time and uses high CPU on VIOS during failover/failback?

    Bart G,



    ------------------------------
    Bartlomiej Grabowski
    ------------------------------


  • 2.  RE: vNIC - failover takes long time

    Posted Tue January 16, 2024 02:57 AM

    Hi Bart,

    unfortunately we saw such situations with Linux in the past as well. The topic is quite complex and depends on different things (i.e. vNIC backing device distribution)

    IBM is already aware of. Please see IBM ideas PVMV-I-106

    If you did not already please vote.

    CHers Patrick



    ------------------------------
    Patrick Hügli
    ------------------------------



  • 3.  RE: vNIC - failover takes long time

    Posted Tue January 16, 2024 09:02 AM

    Hello Bart,

    I experience the same thing for a client running IBM i.  The Vnic failover takes up to 1 minute.  The client opened a support ticket and as of today, there is no way to solve this failover latency.  



    ------------------------------
    Gregory Vanbout
    ------------------------------



  • 4.  RE: vNIC - failover takes long time

    Posted Fri January 26, 2024 05:14 AM

    For your info, I mentioned IBM i because for my other clients running AIX and Linux we do not observe that issue.  One of my colleague specialised in this topic said there are different things to consider: be sure you split the LPAR to different physical ports (in order to have balanced port distribution).  If one port is "too busy" this might be the reason for the delay.  Do you have failover priorities?  VEPA mode also needs reflective relay enabled ...  So, I would advise to work with IBM lab services to help you.



    ------------------------------
    Gregory Vanbout
    ------------------------------



  • 5.  RE: vNIC - failover takes long time

    IBM Champion
    Posted Tue January 16, 2024 10:24 AM
    Edited by Bartlomiej Grabowski Tue January 16, 2024 10:24 AM

    It looks the problem is well know. I wonder why the tech support does not reply - "known behavior without a permanent fix". Instead they are playing the game that we are the first company to complain, and asking for collecting never ending logs. 

    I should be in IBM Rochester in few weeks, do you mind to share PMR numbers? Maybe we can get a bigger attention from the management. 



    ------------------------------
    Bartlomiej Grabowski
    ------------------------------



  • 6.  RE: vNIC - failover takes long time

    Posted Tue January 16, 2024 01:12 PM
    Could this be spanning tree on the switch? 30-60 seconds sounds like
    spanning tree pauses I have seen on other platforms.

    ------------------------------------------------------------------
    Russell Adams Russell.Adams@AdamsSystems.nl
    Principal Consultant Adams Systems Consultancy
    https://adamssystems.nl/




  • 7.  RE: vNIC - failover takes long time

    IBM Champion
    Posted Wed January 17, 2024 03:38 AM

    I assume not. If there is an RFE which clearly describes what must be improved, and IBM dev team is asking for "time allocation" thru RFE.



    ------------------------------
    Bartlomiej Grabowski
    ------------------------------