High Performance Computing Group

 View Only
  • 1.  GPU usage in bacct

    Posted Mon June 07, 2021 09:58 AM
    Hi,
    we are running one of our A100 nodes in MIG setup, and wanted to look at GPU resources usage with 'bacct -gpu -l JOBID'.  All we get on GPU information looks like this (sorry for the formatting, but this editor doesn't allow indenting):


    Host based accounting information about this job:
    HOST CPU_T MEM SWAP
    hostA 409.00 76M 0M
    GPU ID: 259:1
    Total Execution Time: -
    Energy Consumed: -
    SM Utilization (%): -
    Memory Utilization (%): -
    Max GPU Memory Used: -
    GPU ID: 1
    Total Execution Time: -
    Energy Consumed: -
    SM Utilization (%): -
    Memory Utilization (%): -
    Max GPU Memory Used: -
    GPU ID: 149:10:n
    Total Execution Time: -
    Energy Consumed: -
    SM Utilization (%): -
    Memory Utilization (%): -
    Max GPU Memory Used: -
    GPU Energy Consumed: -
    First of all: why do we see three GPUs, though the job was running in one CI, only?  And it looks like we do not get any usage information ... Are we missing something here?

    ------------------------------
    Bernd Dammann
    ------------------------------

    #SpectrumComputingGroup


  • 2.  RE: GPU usage in bacct

    Posted Mon June 07, 2021 06:51 PM
    GPU usage info requests LSF integration with DCGM.  DCGM 2.1.7 support was released through LSF patch in last Quarter. I suggest to create a support case for more info.

    ------------------------------
    YI SUN
    ------------------------------



  • 3.  RE: GPU usage in bacct

    Posted Tue June 08, 2021 02:39 AM
    Thanks!  We have applied the patches (we were one of the customers requesting it), and we tried with DCGM 2.1.7 and also the newer 2.2.3, as the 2.1.7 creates a lot of warnings.  The result is the same in both cases, as shown above.  No usage information, but 3 GPU IDs ...
    I will open a support case then!

    ------------------------------
    Bernd Dammann
    ------------------------------



  • 4.  RE: GPU usage in bacct

    Posted Tue June 08, 2021 11:02 AM
    BTW, only CPU job with following  setting will have GPU resource usage collected.
     mode=exclusive_process, or  "shared;j_exclusive=yes"

    ------------------------------
    YI SUN
    ------------------------------



  • 5.  RE: GPU usage in bacct

    Posted Wed June 09, 2021 07:35 AM
    We used

    #BSUB -gpu "num=1:mig=1:mode=exclusive_process:aff=no"

    for our test, that created the above accounting information!

    ------------------------------
    Bernd Dammann
    ------------------------------