IBM i Global

 View Only
  • 1.  DB2 optimizer and number of disk arms - still relevant?

    Posted Tue December 13, 2022 10:19 AM
    In the days when IBM advised that we should use internal disk drives on IBM i for best performance, the DB2 optimizer would take in to consideration the number of disk controllers and disk arms available and would optimize SQL based up these resources (and other resources too).

    Most of my customers are now running on FlashSystem storage with FCMs but still have many small volumes mapped to their IBM i LPARs. For example, one customer has a total of 18TB storage assigned to an LPAR with 78 volume mappings.

    I'm looking at reducing the number of volumes to 36 x 500GB or 18 x 1TB. There's a requirement to add more storage in the near future so this LPAR may well grow to 22TB. I'd prefer to add storage in 1TB increments as and when they need this.

    My concern is that reducing the number of "disk arms" (volumes) will impact their SQL queries IF the optimizer still tales in to account the number of "disk arms" available. Or has the optimizer been updated to allow for today's faster storage? I can't see the latter as a volume from SVC\Storwze\FlashSystem is always presented as a type 2145.

    ------------------------------
    Glenn Robinson
    ------------------------------


  • 2.  RE: DB2 optimizer and number of disk arms - still relevant?

    IBM Champion
    Posted Tue December 13, 2022 02:11 PM
    Since we're in the process of migrating from internal disk to SAN I'd like to know this also.

    ------------------------------
    Robert Berendt
    ------------------------------



  • 3.  RE: DB2 optimizer and number of disk arms - still relevant?

    IBM Champion
    Posted Tue December 13, 2022 08:25 PM
    Edited by Satid Singkorapoom Wed December 14, 2022 05:09 AM
    Dear Glenn

    I try to provide my response based on my past experience in troubleshooting IBM i performance problem in the past 15 years or so.

    >>>>  DB2 optimizer would take in to consideration the number of disk controllers and disk arms available and would optimize SQL based up these resources <<<<

    This is NOT DB2i default behavior. You need to manually set the system value QQRYDEGREE to a value of *IO and parallel read will happen. Its performance benefit is limited to jobs that read large amount of data spanning multiple internal disk units or LUNs. (DB2 SMP is NOT needed for this to work)  Since flash disk's response time is at least 10 times faster than HDD (say, 0.2 msec  VS 2 msec), I found that *IO is no  longer very impactful when no HDD is used at all. 


    >>>> My concern is that reducing the number of "disk arms" (volumes) will impact their SQL queries IF the optimizer still tales in to account the number of "disk arms" available.  <<<<

    When using SAN disk, IBM i only sees and therefore takes into account the number of LUNs it is allocated. But the number of flash disk units is also important in disk IO performance but they are outside of IBM i's purview. The designer of SAN disk must handle this side of the matter and using Disk Magic tool to model its performance is the way to go.

    I used to engage in 2 IBM i performance problem cases that FlashSystem was used with POWER8 servers and both cases had only a few flash disk units (IIRC, not more than 10 units) but hundred of LUNs were created out of them. Disk response time was good at low disk IOPS workload but degraded fast when IOPS increased beyond some 5,000 IOPS (during nightly batch process) which was weird for FlashSystem.  

    Originally, I suspected the small number of flask disk units to be the cause but my IBM Storage Specialist colleague informed me that it should not have been a concern because IBM used SAN disk sizing tool named Disk Magic (hope you ever heard of this tool before) that can roughly predict disk response time based on SAN's HW and logical config and expected peak disk IOPS we want the SAN box to handle.  You need to look at PDI chart named "Physical Disk I/O Overview - Basic" to see the current peak disk IOPS of the LPAR (make sure it is one from a high workload day) and use this value for Disk Magic.  Increase this peak by any percentage you see fit for some headroom. 

    It turns out that what caused negatively dramatic disk response time degradation was the activation of a feature named DRP (Data Reduction Pool).  This is a disk space saving feature that imposes processing overhead to SAN box controllers. The higher the disk IOPS workload, the more the overhead. And DIsk Magic tool did not incorporate the performance overhead of DRP at that time to its performance modeling of the resulting response time of SAN HW config. (Please check if it does now or not.)  So, a golden rule for you is that when using FlashSystem SAN and the best possible disk response time is the utmost desired goal,

    • Ensure the vendor uses Disk Magic tool to do SAN box sizing and all input parameters reflect what will be used and see the resulting disk response time is desireable. Take time to compare one design with another (such as different number of flash disk units that you mentioned).
    • Do NOT use DRP.  Consult IBM DRP best practice information if DRP is really needed but be reminded that best and consistent disk response time can be obtained only without DRP activated.  Also check if Disk Magic tool now takes into account DRP performance impact.(I suspect not but am open to being wrong.)  If so, check to see how the resulting response time looks like with DRP.
    • Do NOT mix different size of disk units  (I suspect this may have less severe negative effect for flash disk but do not play with a match as it can burst into a fire when near accelerant)
    If you are interested in learning more, I discussed about FlashSystem and DRP in case 3 of my published articles here :  https://www.itjungle.com/author/satid-singkorapoom/    The article serves as a good sleeping pill if you read before bedtime :-)


    >>>> Or has the optimizer been updated to allow for today's faster storage?  <<<<

    Yes indeed !  In IBM i 7.2, there was an enhancement that SQE can sample the actual disk access time rather than using a hard-coded value that is far from actual time.



    ------------------------------
    Right action is better than knowledge; but in order to do what is right, we must know what is right.
    -- Charlemagne

    Satid Singkorapoom
    ------------------------------



  • 4.  RE: DB2 optimizer and number of disk arms - still relevant?

    IBM Champion
    Posted Wed December 14, 2022 03:24 AM

     

    Hi Glenn,

     As always, when we are talking about performance problem, the correct answer would be "it depends". However, there is an aspect that we have to be aware: disk queue and adapter queue. There is a limit for the IO's that can be queued for every LUN – it depends on the storage type. So, more LUN's means more queues, then more simultaneous IO's could be sent to the storage system. Also, there is a queue in the virtual adapter (NPIV configurations). Even the limit is 127 LUN's for every adapter, from the performance perspective, is better to have around 32 LUNs in each one. There are 2 improvements in newer TR's that help to improve the performance:

    May 2022 - IBM i 7.4 and IBM i 7.3

    IBM i leverages VIOS NPIV multiple queues

    November 2020 - IBM i 7.4 and IBM i 7.3 and IBM i 7.2

    Multiple Mapping of NPIV ports for Live Partition Mobility (LPM) operations

    IBM i Virtualization Details

    This article explains in details the concept of multiple queue:

    Modernization of PowerVM NPIV stack: from Single to Multiple Queue Support (ibm.com)

    Even those features improve the performance of external storage my conclusion is that it is better to have a large number of LUN's. To my knowledge, you already are in the top high size of LUN's (250 GB) in terms of performance.  Have you done a performance study to see the disk response time? For flash system with FCM, from my experience, everything above 2 ms in read and 0.5 ms in write (if you don't have remote sync mirror) is a bad signal and it must be analyzed for possible full queue problem.

     



    ------------------------------
    Nicolae Chirea
    System architect
    Saytel Informatica SL
    Madrid
    +34607868409
    ------------------------------



  • 5.  RE: DB2 optimizer and number of disk arms - still relevant?

    Posted Wed December 14, 2022 12:19 PM
    Hi Satid,

    Very good and thorough response. The only thing I would add is that your statement about setting QQRYDEGREE to *IO is no longer important now that almost all queries are processed through SQE (SQL query engine.) In the "old days", when queries went through CQE, setting QQRYDEGREE to *IO was necessary, but with SQE, the optimizer always uses all of the available disk arms for processing. The more disk arms/LUNs that a given file (table/index) is spread across, the more I/O parallelism the optimizer will try to drive. (Of course, as you point out, the SAN storage behind the LUNs has to be able to handle the I/O volume, too, so it's not a simple matter of allocating as many arms as possible.)

    Tim Clark
    SQE Developer

    ------------------------------
    TIMOTHY CLARK
    ------------------------------



  • 6.  RE: DB2 optimizer and number of disk arms - still relevant?

    IBM Champion
    Posted Wed December 14, 2022 08:01 PM
    Edited by Satid Singkorapoom Wed December 14, 2022 08:04 PM
    Dear Timothy

    Thanks a lot for your update that helps free some of my neurons from keeping this obsolete bit of information :-)  

    BTW, I wonder if Mr. Robert Bestgen still leads SQE developer team or not? If so, please convey to him my best regards.

    ------------------------------
    Right action is better than knowledge; but in order to do what is right, we must know what is right.
    -- Charlemagne

    Satid Singkorapoom
    ------------------------------



  • 7.  RE: DB2 optimizer and number of disk arms - still relevant?

    Posted Thu December 15, 2022 01:02 AM
    I think I'm still in a situation where I'll have to test this out to see the impact.

    ------------------------------
    Glenn Robinson
    ------------------------------



  • 8.  RE: DB2 optimizer and number of disk arms - still relevant?

    IBM Champion
    Posted Thu December 15, 2022 07:01 AM
    Dear Glenn

    I would agree totally with testing to see the impact and I see that DB2 SMP may help you do the test in varying ways as well if you are dealing with SQL workload. To see the impact on disk performance, the use of the following PDI charts should help you see and compare the impact :

    - Physical Disk I/O Overview - Basic
    - Disk Throughput Overview for Disk Pools
    - Read/Write Rate and Response Time
    - Disk Overview by Disk Unit
    - Disk Overview for Disk Pool  (if you have more than one ASP)

    ------------------------------
    Right action is better than knowledge; but in order to do what is right, we must know what is right.
    -- Charlemagne

    Satid Singkorapoom
    ------------------------------