I try to provide my response based on my past experience in troubleshooting IBM i performance problem in the past 15 years or so.
>>>> DB2 optimizer would take in to consideration the number of disk controllers and disk arms available and would optimize SQL based up these resources <<<<
This is NOT DB2i default behavior. You need to manually set the system value QQRYDEGREE to a value of *IO and parallel read will happen. Its performance benefit is limited to jobs that read large amount of data spanning multiple internal disk units or LUNs. (DB2 SMP is NOT needed for this to work) Since flash disk's response time is at least 10 times faster than HDD (say, 0.2 msec VS 2 msec), I found that *IO is no longer very impactful when no HDD is used at all.
>>>> My concern is that reducing the number of "disk arms" (volumes) will impact their SQL queries IF the optimizer still tales in to account the number of "disk arms" available. <<<<
When using SAN disk, IBM i only sees and therefore takes into account the number of LUNs it is allocated. But the number of flash disk units is also important in disk IO performance but they are outside of IBM i's purview. The designer of SAN disk must handle this side of the matter and using Disk Magic tool to model its performance is the way to go.
I used to engage in 2 IBM i performance problem cases that FlashSystem was used with POWER8 servers and both cases had only a few flash disk units (IIRC, not more than 10 units) but hundred of LUNs were created out of them. Disk response time was good at low disk IOPS workload but degraded fast when IOPS increased beyond some 5,000 IOPS (during nightly batch process) which was weird for FlashSystem.
Originally, I suspected the small number of flask disk units to be the cause but my IBM Storage Specialist colleague informed me that it should not have been a concern because IBM used SAN disk sizing tool named Disk Magic (hope you ever heard of this tool before) that can roughly predict disk response time based on SAN's HW and logical config and expected peak disk IOPS we want the SAN box to handle. You need to look at PDI chart named "Physical Disk I/O Overview - Basic" to see the current peak disk IOPS of the LPAR (make sure it is one from a high workload day) and use this value for Disk Magic. Increase this peak by any percentage you see fit for some headroom.
It turns out that what caused negatively dramatic disk response time degradation was the activation of a feature named DRP (Data Reduction Pool). This is a disk space saving feature that imposes processing overhead to SAN box controllers. The higher the disk IOPS workload, the more the overhead. And DIsk Magic tool did not incorporate the performance overhead of DRP at that time to its performance modeling of the resulting response time of SAN HW config. (Please check if it does now or not.) So, a golden rule for you is that when using FlashSystem SAN and the best possible disk response time is the utmost desired goal,
- Ensure the vendor uses Disk Magic tool to do SAN box sizing and all input parameters reflect what will be used and see the resulting disk response time is desireable. Take time to compare one design with another (such as different number of flash disk units that you mentioned).
- Do NOT use DRP. Consult IBM DRP best practice information if DRP is really needed but be reminded that best and consistent disk response time can be obtained only without DRP activated. Also check if Disk Magic tool now takes into account DRP performance impact.(I suspect not but am open to being wrong.) If so, check to see how the resulting response time looks like with DRP.
- Do NOT mix different size of disk units (I suspect this may have less severe negative effect for flash disk but do not play with a match as it can burst into a fire when near accelerant)
If you are interested in learning more, I discussed about FlashSystem and DRP in case 3 of my published articles here : https://www.itjungle.com/author/satid-singkorapoom/
The article serves as a good sleeping pill if you read before bedtime :-)
>>>> Or has the optimizer been updated to allow for today's faster storage? <<<<
Yes indeed ! In IBM i 7.2, there was an enhancement that SQE can sample the actual disk access time rather than using a hard-coded value that is far from actual time.
Right action is better than knowledge; but in order to do what is right, we must know what is right.
Sent: Tue December 13, 2022 10:18 AM
From: Glenn Robinson
Subject: DB2 optimizer and number of disk arms - still relevant?
In the days when IBM advised that we should use internal disk drives on IBM i for best performance, the DB2 optimizer would take in to consideration the number of disk controllers and disk arms available and would optimize SQL based up these resources (and other resources too).
Most of my customers are now running on FlashSystem storage with FCMs but still have many small volumes mapped to their IBM i LPARs. For example, one customer has a total of 18TB storage assigned to an LPAR with 78 volume mappings.
I'm looking at reducing the number of volumes to 36 x 500GB or 18 x 1TB. There's a requirement to add more storage in the near future so this LPAR may well grow to 22TB. I'd prefer to add storage in 1TB increments as and when they need this.
My concern is that reducing the number of "disk arms" (volumes) will impact their SQL queries IF the optimizer still tales in to account the number of "disk arms" available. Or has the optimizer been updated to allow for today's faster storage? I can't see the latter as a volume from SVC\Storwze\FlashSystem is always presented as a type 2145.