IBM i Global

IBM i Global

Connect, learn, share, and engage with IBM Power.

 View Only
  • 1.  slow response time

    Posted Mon May 05, 2025 09:36 AM

    Hello,

    3 days after installing new disks (new raid cluster) on an 8286 -41A, I'm experiencing latencies when connecting to the DB2 database. The disks added are of the same type as the old ones: 5B13. How can I check whether the response time problem could be linked to this addition?

    Thanks



    ------------------------------
    Philippe Jolly
    ------------------------------


  • 2.  RE: slow response time

    Posted Mon May 05, 2025 11:12 AM

    Hi Philippe,

    You can use the performance tools present in Navigator for i or the wrkdsksts command (press F5 to refresh the screen data). It is possible that one disk is abnormally solicited compared to another, hence the latency you are observing. I've already had a similar problem after adding a disk to a raid cluster on the same type of server with option balance data when adding disk. In the managementdisk status  screen, one disk was heavily used compared with the others, with 60% utilization, while the other disks showed 20%. Unfortunately, the only solution I found was to reinstall the partition. The STRASPBAL command didn't help me.

    Regards



    ------------------------------
    Virgile VATIN
    Head of IT Infrastructure
    DRP SOFTWARE
    Avelin
    ------------------------------



  • 3.  RE: slow response time

    Posted Tue May 06, 2025 02:09 AM

    Hi Virgile,

    Thanks for your help.



    ------------------------------
    Philippe Jolly
    ------------------------------



  • 4.  RE: slow response time

    Posted Tue May 06, 2025 02:52 AM
    Edited by Satid S Tue May 06, 2025 03:02 AM

    Dear Virgile 

    >>>> one disk was heavily used compared with the others, with 60% utilization, while the other disks showed 20%. <<<<

    When you see one (or sometimes 2 or 3) disk units being abnormally highly busy than any others like what you described, one possible cause in my 32-year experience is that there is a job (or 2 or 3 jobs corresponding to the number of highly busy units) having a lot of error messages that keep appearing in each second - say, hundred or thousand error messages per second. This is because IBM i normally writes all messages into the job log of a job to one disk unit only. 

    You can identify such a problematic job with WRKACTJOB command and press F11 until you see the column "Aux IO" and sort the entries in descending order of this column by moving the screen cursor to Aux IO column and press Shift+F4 (F16). You then press F5 to accumulate more IO count of each job.  Then you look into the job log of that high Aux IO job to see if it has abnormally high number of error messages that keep appearing and address the issue. 



    ------------------------------
    Satid S
    ------------------------------



  • 5.  RE: slow response time

    Posted Tue May 06, 2025 03:54 AM

    Hi Satid,

    Thanks for the clarification, I'll make a note of it for future reference, I'm sure it will be very useful.

    Virgile



    ------------------------------
    Virgile VATIN
    Head of IT Infrastructure
    DRP SOFTWARE
    Avelin
    ------------------------------



  • 6.  RE: slow response time

    Posted Mon May 05, 2025 11:53 AM

    Hello
    You may want to try to proceed in two steps:

    1. run STRASPBAL TYPE(*CAPACITY) so that all disks have the same used capacity
    2. run TRCASPBAL SET(*ON) during a normal workload time frame, followed by SET(*OFF), then STRASPBAL TYPE(*USAGE) so that all disks have the same I/O usage

    Sometimes, it helps to solve/reduce impact of this kind of issues.



    ------------------------------
    Marc Rauzier
    ------------------------------



  • 7.  RE: slow response time

    Posted Tue May 06, 2025 02:09 AM

    Hello Marc,

    thanks a lot.



    ------------------------------
    Philippe Jolly
    ------------------------------



  • 8.  RE: slow response time

    Posted Tue May 06, 2025 02:35 AM
    Edited by Satid S Tue May 06, 2025 03:05 AM

    Dear Phillippe 

    Are the added disk units the spinning hard disk or SSD?  Dis you use RAID-5 or RAID-6?

    When you mentioned "new RAID cluster", did you add a new disk controller for the new set of disk units?  Or did you use the existing disk controller and add a new RAID set (cluster)?  If the latter, please be informed that adding more RAID set for a disk controller increases the processing workload to the processing unit of the disk controller and can contribute to the overall disk response time degradation. This negative effect is more prominent for spinning hard disk than for SSD. 

    Another point is that, using RAID-6 (as opposed to RAID-5) decreases the overall disk IO workload throughput in which the good response time is maintained. RAID-6 is NOT a proper thing to use for high disk IO workload because it causes degraded disk response time at high disk IO workload period.  

    In IBM i Performance data Investigator PDI tool, you should look at the chart named "Disk Response Time by (or is it "for"? I do not remember precisely) Disk Unit" to see if each disk units reports similar response time or not. If not, check whether the units with high response time are the ones just added - the unit number shown in the chart is the same unit number that you see in WRKDSKSTS screen.   If so, STRASPBAL *CAPACITY should solve the issue.



    ------------------------------
    Satid S
    ------------------------------