Db2

 View Only
Expand all | Collapse all

DMC - Physical Reads metrics discrepancy between Database / Bufferpools level

  • 1.  DMC - Physical Reads metrics discrepancy between Database / Bufferpools level

    Posted Tue November 02, 2021 02:19 PM

    Investigating  a high IO condition here and noticed that DMC has an issue to calculate the Physical Reads/min metrics at database level. 

    It does not match with real OS metrics observations,  and do not match the metrics reported at buffer-pool level as well. 

    On this particular system we are hitting a 24k IOPS condition in every 2 minutes in average.  

    When we look at I/O Buffer-Pools metrics,  the values seems to be ok.  


    From the above screenshot , the Windows Performance Monitor shows 8k IOPs for 3 different diks (The 3 db storage paths)   giving us a 24k IOPs in total. 

    DMC reports 1,369,855 for one Bufferpool, and 318,901 for another one, summing them up, we have a total of   1,688,756 IOs/min, which give us a pretty similar value of 24 IOPs from windows metrics. 

    But,  when looking at Database level,  physical reads/min  metrics sows a way lower value of just few k reads per min. 
    I have reduced the monitor cycle to capture values for each 1 min. and the physical reads/min metric never breaches 70k reads/min, 

    This graph is from Monitor, Database, Database usage page. 
    The particular point from the same minute 11:45 from the previous view,  shows only 56k/min while the BPs one shows a value higher than 1M /min. 

    Something is definitely wrong here at the database level metric. 

    Regards

    Samuel Pizarro
    Db2 Database Administrator
    Kyndryl 





    ------------------------------
    Samuel Pizarro
    ------------------------------

    #Db2


  • 2.  RE: DMC - Physical Reads metrics discrepancy between Database / Bufferpools level

    Posted Tue November 02, 2021 08:35 PM
    The problem seems to be at DMC graph metric for db-level. 


    I have taken regular captures using MON__GET_DATABASE() table function and the values matches with the BP metrics.  

    Time Physical Reads Delta time Delta Phys Reads Phys Reads/s Phys Reads/min
    2021-11-02-16.49.31.956000 97778141163
    2021-11-02-16.50.05.515000 97778394170 00:00:34 253007                7.441               446.483
    2021-11-02-16.50.58.266000 97779829437 00:00:53 1435267              27.081             1.624.831
    2021-11-02-16.51.15.383000 97780214039 00:00:17 384602              22.624             1.357.419
    2021-11-02-16.51.41.878000 97780541235 00:00:26 327196              12.584               755.068
    2021-11-02-16.52.04.635000 97781105454 00:00:23 564219              24.531             1.471.876
    2021-11-02-16.52.30.769000 97781597971 00:00:26 492517              18.943             1.136.578
    2021-11-02-16.52.46.750000 97781638103 00:00:16 40132                2.508                 50.495
    2021-11-02-16.53.10.228000 97781658940 00:00:24 20837                   868                 52.093
    2021-11-02-16.53.25.714000 97781759275 00:00:15 100335                6.689               401.340
    2021-11-02-16.53.42.381000 97782173913 00:00:17 414638              24.390             1.463.428
    2021-11-02-16.53.56.961000 97782549313 00:00:14 375400              26.814             1.608.857
    2021-11-02-16.54.11.159000 97782901073 00:00:15 351760              23.451             1.407.040
    2021-11-02-16.54.26.416000 97783279778 00:00:15 378705              25.247             1.514.820
    2021-11-02-16.54.40.803000 97783627012 00:00:14 347234              24.802             1.488.146
    2021-11-02-16.55.01.092000 97784130825 00:00:21 503813              23.991             1.439.466
    2021-11-02-16.55.22.196000 97784659885 00:00:21 529060              25.193             1.511.600
    2021-11-02-16.55.41.034000 97784711088 00:00:19 51203                2.695               161.694
    2021-11-02-16.55.57.545000 97784951228 00:00:16 240140              15.009               900.525
    2021-11-02-16.56.14.615000 97784984262 00:00:17 33034               1.943                 16.591
    2021-11-02-16.56.26.452000 97785032395 00:00:12 48133                4.011               240.665

    peaks near to 24k read/s or 1.300 Millions/min  

    Regards

    ------------------------------
    Samuel Pizarro
    Db2 DBA
    Kyndryl
    Brazil / Sumaré - SP
    ------------------------------



  • 3.  RE: DMC - Physical Reads metrics discrepancy between Database / Bufferpools level

    Posted Tue February 15, 2022 12:33 AM
    Edited by System Fri January 20, 2023 04:17 PM
    Hi, Samuel, The default value of workloads does not contain CONSOLE's for Monitor, Database, Database usage page. If you want to check whole database, please switch from 'All user workloads' to 'All workloads' on the right side of page.




    ------------------------------
    Zhen Ling Yu
    ------------------------------



  • 4.  RE: DMC - Physical Reads metrics discrepancy between Database / Bufferpools level

    Posted Tue February 22, 2022 11:28 AM

    Sorry @Zhen Ling Yu 

    But that is not the case..   Do you really believe that the console (DMC monitoring workload)  would be contributing with this huge GAP ?

    DMC workload is negligible compared with the user workload here. 

    The metrics between 'All user workloads' and  'All Workloads" filters are pretty much the same. 

    User workload:   peak of 105,000/min   



    All workloads:  same graph,  and same values:  



    And again,  if I take manual queries from mon_Get_database() table function, in the same interval  , physical reads goes ​​up to 1,300 Millions/min  





    ------------------------------
    Samuel Pizarro
    Db2 DBA
    Kyndryl
    Brazil / Sumaré - SP
    ------------------------------



  • 5.  RE: DMC - Physical Reads metrics discrepancy between Database / Bufferpools level

    Posted Fri March 04, 2022 08:11 PM
    Hello Samuel,

    The Database Usage page gets the data by calling mon_get_workload function and data is stored in IBMCONSOLE.WORKLOAD table.  The charts for "All user workload" and "All workloads" is differed by data coming from CONSOLE_WORKLOAD.  

    Console collects data every 5 mins by default.  You can compare the data in IBMCONSOLE.WORKLOAD and the data you collected by MON_GET_DATABASE and see if there are discrepancies.

    ------------------------------
    Jason Sizto
    ------------------------------



  • 6.  RE: DMC - Physical Reads metrics discrepancy between Database / Bufferpools level

    Posted Mon March 07, 2022 11:27 AM
    Hi @Jason Sizto

    Thanks for that info. It helped a lot to determine the gap.

    It seems mon_get_workload() table function is not working properly.  if you sum all workloads from mon_get_workload,   it does not match metrics for the entire db  reported by mon_get_database().   Even making deltas. ​ 

    I have attached two files. Chek out values captured from both table functions in same interval.  The values from mon_get_workload matches what is saved in reposotiry  IBMCONSOLE.WORKLOAD,   but unfortunatelly,  the values aere not reflecting the overall database reality. 

    Can you engage Db2  support internally for this ?  

    Thanks

    ------------------------------
    Samuel Pizarro
    Db2 DBA
    Kyndryl
    Brazil / Sumaré - SP
    ------------------------------

    Attachment(s)

    txt
    mon_get_database.txt   1 KB 1 version
    txt
    mon_get_workload.txt   2 KB 1 version


  • 7.  RE: DMC - Physical Reads metrics discrepancy between Database / Bufferpools level

    Posted Mon March 07, 2022 09:28 PM
    Thank you @Samuel Pizarro

    I will check with Db2 team to understand more.  ​

    ------------------------------
    Jason Sizto
    ------------------------------



  • 8.  RE: DMC - Physical Reads metrics discrepancy between Database / Bufferpools level

    Posted Thu March 10, 2022 02:57 PM

    Hi @Samuel Pizarro

    ​I checked with Db2 team and for the metrics you are looking related to bufferpool read and write metrics. 

    There are differences between how they are collected from MON_GET_DATABASE and MON_GET_WORKLOAD.  ​For MON_GET_DATABASE, it includes metrics from asynchronous read and write operations, e.g. from page cleaners and prefetchers.  And for MON_GET_WORKLOAD, it only includes metrics from fore ground agents.  See more in https://www.ibm.com/docs/en/db2/11.5?topic=elements-time-spent-monitor-element-hierarchy for asynchronous metrics for MON_GET_DATABASE.




    ------------------------------
    Jason Sizto
    ------------------------------



  • 9.  RE: DMC - Physical Reads metrics discrepancy between Database / Bufferpools level

    Posted Fri March 11, 2022 02:49 PM
    Hi @Jason Sizto

    Oh my God..   I can't believe that..   That is really , really a bad thing !  I would prefer to hear that it was bugged then hearing something like this.  ​​

    This means that Db2 dev team deliberately removed the prefetching (async IOs)  metrics from workload/management classes level ?  If we have an IO intensive system,   where the db is performing too much IO we can't use the Workload/service-classe monitoring metrics to determine which of them are performing too much IO in a consistent manner.. 

    This lack of consistence between the SAME metrics,  should be better documented..   In fact it should never exist at 1sr place..  I can't see a reason why  physical_reads would account asynch metrics in one dimension,  but not in others..   this makes troubleshooting even harder than what is already is..  

    Anyway...  this is not a DMC problem..   But it seems that DMC team was not aware of that GAP as well.   

    Based on this..   I believe when we choose "All workloads"  in this page,   DMC should pull metrics from mon_get_database instead,  in order to provide better and consistent metrics..  Or , add a new option like "Entire Database"  or something like that.  

    Regards

    ------------------------------
    Samuel Pizarro
    Db2 DBA
    Kyndryl
    Brazil / Sumaré - SP
    ------------------------------



  • 10.  RE: DMC - Physical Reads metrics discrepancy between Database / Bufferpools level

    Posted Sat March 12, 2022 02:04 PM
    Hi @Samuel Pizarro

    ​Regarding how mon_get_database and mon_get_workload bufferpool metrics difference in Db2, it is likely due to how the metrics is being accounted for with db2 agents vs page cleaner, prefetcher agents etc.  I agree, it is not intuitive for DMC Database page to use workload data. 

    For DMC, we understand the Database page is the high level page for users to first eyeball and identify problems.  We will look into your suggestion on how we can distinguish mon_get_database and mon_get_workload discrepancies and how to make the representation clearer.  Thank you for your suggestion.

    I see you are already looking at the monitor > I/O > bufferpool and prefetcher pages for I/O metrics.  These are the pages where you find further breakdown of async I/O, page cleaner and prefetcher ​stats.  Let us know if I/O pages have information you are looking for.

    ------------------------------
    Kind Regards,
    Jason Sizto
    ------------------------------



  • 11.  RE: DMC - Physical Reads metrics discrepancy between Database / Bufferpools level

    Posted Mon March 14, 2022 03:28 PM

    Hi @Jason Sizto

    ​For DMC, we understand the Database page is the high level page for users to first eyeball and identify problems.  We will look into your suggestion on how we can distinguish mon_get_database and mon_get_workload discrepancies and how to make the representation clearer. 

    Yeah, maybe in the right-upper corner combo, to select the workloads,,  have a new option to grab metrics from mon_ge_database instead..   "Entire Database" ,  or something like that.   
    And..   as we now know,  that metrics from workloads does not have assync IOs ,  we should add a new plot just for that metric,  beyond physical IO. 

    The the I/O pages,  the metrics are good.  

    Only thing that I noticed,  is that  for Prefetchers tab,  the option to look back is blocked..   just realtime values are available..   

    Regards 



    ------------------------------
    Samuel Pizarro
    Db2 DBA
    Kyndryl
    Brazil / Sumaré - SP
    ------------------------------



  • 12.  RE: DMC - Physical Reads metrics discrepancy between Database / Bufferpools level

    Posted Tue April 05, 2022 09:12 AM
    Edited by System Fri January 20, 2023 04:12 PM
    Hi @Jason Sizto

    In a Case discussion with IBM Db2 support regarding this..  TS008647857
    ​They suggested to use other set of metrics to track workload async requests..  

    Different agents in different service classes or workloads could request same data page be prefetched. Prefetcher would load page once and this page could be used by multiple workloads/service classes. And for page cleaners; similar thing; 2 or more workload could touch/modify a page. Due to this, the decision was made to not track it workload/service class level.

    Note that for prefetching there are metrics in mon_get_workload (e.g. pool_queued_async_data_pages/requests) that are counted; tracking pages requested; so you can tell at a workload level (also service class) which workload is driving the prefetching.


    So,  I would suggest to add a new graphic for summing up these pool_queued_async_*_reqs,   , so we can have a better idea , how the workload is contributing to the overall IO (physical reads/writes ).   

    This would be a nice addition..   Maybe we can plot both lines into the same graphic , using different colors..  Just an idea..   but for sure, these set of metrics should be added there.  

    And , regarding the previous suggestion,  to add a new option to see the entire DB,   if that happens,  the async IOS will be already counted in the physical reads/writes..  so in that case,  we won't  have this new metric.   

    Thanks 



    ------------------------------
    Samuel Pizarro
    Db2 DBA
    Kyndryl
    Brazil / Sumaré - SP
    ------------------------------