IBM i Global

 View Only
  • 1.  IBM i 7.3 multipathing to external SAN disk

    Posted Thu April 20, 2023 02:41 PM

    We have multiple IBM i 7.3 partitions using EMC Powermax SAN for disk drives that was implemented by an IBM business partner.  We are having poor performance during peak hours and now having our internal networking staff look at the IBM i server to disk connectivity via the SAN fabric switches.  They are asking some questions to which I cannot find an article that discusses multipathing in the detail that I need.  Some questions:
    1. Is multipathing native to the IBM i and does it have any configuration that needs to be performed to utilize it efficiently?
    2. We have configured the SAN switch to have 2 different IBM i fiber ports for one partition going to the same port on the Powermax unit.  Would that be a point of contention that could cause performance issues?
    Thanks,
    Mark Waring



    ------------------------------
    Mark Waring
    ------------------------------


  • 2.  RE: IBM i 7.3 multipathing to external SAN disk

    IBM Champion
    Posted Thu April 20, 2023 09:14 PM
    Edited by Satid Singkorapoom Thu April 20, 2023 09:31 PM

    Dear Mark

    I have an article that discusses one specific case of SAN disk performance issue that may or may not apply to your case here : https://www.itjungle.com/author/satid-singkorapoom/ . Look at case 3 – When Performance Issues Come From Without.  Case 5 may also add a bit more idea for you to chew on. I hope it may give you some idea on how to approach your problem.  For example, I mentions comparing disk response time report from SAN box itself with disk response time chart from IBM i PDI tool (of the same period of date+time) to help check if SAN switch adds any performance issue to the whole disk performance issue picture or not. (In  my case in the article, it was not significant.)

    The answer to your first question is yes and I see there's not much to do there for performance. (Except for the case you use a POWER7 machine or older models but I'm assuming you use a POWER8 or newer machine, right?)   My past experience with SAN disk performance issue for IBM i customers is  that it has to do more with SAN disk config and sometimes also involves SAN switch, but mostly to a lesser degree.

    As for your 2nd question, the case you mentioned is generally for when server's fiber port runs at slower speed than SAN box's port, say 8Gbps VS 16Gbps. Is this the case with you?  If so, it should not be much of a performance concern but I would say it is more sensible to look at performance report from SAN switch itself if it can provide one. 

    Do you use NPIV (through VIOS) for all your IBM i LPARs to connect to SAN box? If so please be informed that another possible contributing factor to SAN disk response time degradation for client LPARs is from VIOS having too much CPU utilization.  You can check VIOS CPU utilization from PDI chart named Physical System --> Logical Partitions Overview as shown in a sample below :


    You can see this chart only when you enable a check box on a feature named Allow Performance Data Collection in an IBM i LPAR profile from which you display this chart. When I comapre disk response time chart with this chart, I notice the disk response time seen by the IBM i LPAR degrades the most at the same period as when VIOS CPU is at its highest % Busy (almost 400% in the sample above- it uses Uncapped Partitioning here). 

    In summary, when your environment for SAN disk connection is complex, there are more factors you need to consider that can affect SAN disk access performance. Simplicity can bring ease of problem analysis but sometimes, you are not in control over this ! 

    Just curious, how did you confirm that the performance problem at your peak workload period is from disk response time?  Did you use PDI tool's disk-related charts?  If not, I strongly advise for it.   PDI charts on Wait Overview and Wait by Generic Jobs or Tasks are also useful in such a case to see which group of jobs suffers most from disk-related wait time.

    I wish you success in addressing your problem.

    ------------------------------
    Education is not the learning of facts but the training of the mind to think. -- Albert Einstein.
    ------------------------------
    Satid S.
    ------------------------------



  • 3.  RE: IBM i 7.3 multipathing to external SAN disk

    Posted Fri April 21, 2023 09:54 AM

    Hi Satid!
          Thanks for your information!  We are using Power9 and Dell PowerMax SAN with NPIV and SAN switches that are 16 Gbps.  We were using VIOS but due to major performance issues have made the change to not use VIOS and go directly to the SAN (via the switch).  We are using a combination of tools to review response times and those are PDI, Performance Navigator, charts from the PowerMax SAN, and now looking at data gathered by SolarWinds from the switches.  The reason for question #2 was strictly due to whether 2 ports on the IBM i (not using VIOS) going directly (via switch) to one port on the EMC PowerMax SAN storage would cause contention.  We do not see that the speed of 16 Gbps has been reached or exceeded on the switches but we still see a performance hit when running Mimix audit jobs that we did not see when we had internal disk.  Any thoughts there?
    Thanks,
    Mark Waring



    ------------------------------
    Mark Waring
    ------------------------------



  • 4.  RE: IBM i 7.3 multipathing to external SAN disk

    IBM Champion
    Posted Fri April 21, 2023 09:28 PM
    Edited by Satid Singkorapoom Fri April 21, 2023 09:48 PM

    Dear Mark

    >>>> We do not see that the speed of 16 Gbps has been reached or exceeded on the switches but we still see a performance hit when running Mimix audit jobs that we did not see when we had internal disk.   <<<<

    Based on your information, I would say there is no worry on contention point of view - unless average disk response time reported by PDI chart is much higher than that reported from SAN box itself. 

    Internal disk is a private property and is therefore easy to handle in terms of performance - easy to analyze, easy to solve the issue. Likewise if the SAN box is dedicated to one or two LPAR, not so when it serves too many LPARs. 

    I forgot to ask if you use spinning disk (HDD) or SSD in the SAN box?   As I mentioned in my article, for HDD, an average disk response time that is 5 milliisec. or less is considered good. For SSD, it's 2.5 millisec.   What is the average disk response time IBM i sees at peak workload?   If this value is near the average disk response time reported from SAN box at the same period and the value is significantly higher than the guideline I provide above, then it means you should focus on SAN box performance alone.

    As for MIMIX jobs running in IBM i, there is one group of jobs named CMPFIL* (I never knew their full job name - are these jobs the audit job you mentioned?) that sustains a lot of accumulated disk page fault wait time in each 24-hour period. I was told these jobs scan the entire database that MIMIX serves to check for some kind of data integrity and so it is natural for a lot of disk page fault wait accumulation because they access data on disk a lot.  The evidence is clear in these PDI charts  :




    I normally suggest my customers to run these MIMIX jobs less frequently (once a month?) or manually (can it? I do not remember) during low workload period  to help reduce the total disk IO workload in the systems that suffers bad performance due to bad disk response time.   If you see similar case as the charts above with CMPFIL* jobs and you cannot do much to improve disk response time yet, I suggest likewise.  If you run these MIMIX jobs in more than one IBM i LPAR that connect to the same SAN box, it should significantly reduce the disk IO burden to SAN box and you may expect the performance problem to reduce to a certain degree.


    If you can post PDI charts on Wait Overview and Wait by Generic Job and Wait by Subsystem and let me know average disk response time reported by IBM i and SAN box - all on the peak workload day, I may have more comments for you. 

    ------------------------------
    Education is not the learning of facts but the training of the mind to think. -- Albert Einstein.
    ------------------------------
    Satid S.
    ------------------------------