LSF can collects GPU job mem usage and power consumption through DCGM integration. But it seems Nvidia hasn't had a workable solution for MIG in DCGM mode.
Original Message:
Sent: Thu June 20, 2024 03:01 AM
From: Bernd Dammann
Subject: GPUs: MIG mode and accounting information
Thanks! That did the trick! ">>8 & 0xFF" for the placement, and "& 0xFF" for the size, respectively!
Now, we can get information about the MIG sizes at runtime, but lack still the information in the accounting, but can probably use the EFFECTIVE_GPU_REQ field in the accounting information, to get this. So far, so good - but what we really lack is more detailed information, like GPU memory usage, power consumption, etc, when in MIG mode. The latter might be an issue, as Nvidia doesn't provide job based power usage for MIGs, but what about the memory usage? This is accessible via nvidia-smi, and thus via libnvml, too. Can this be added as a feature to LSF?
------------------------------
Bernd Dammann
Original Message:
Sent: Tue June 18, 2024 07:54 PM
From: YI SUN
Subject: GPUs: MIG mode and accounting information
Try convert the data this way, e.g.
giIdSize>>8&0xFF
------------------------------
YI SUN
Original Message:
Sent: Tue June 18, 2024 03:41 AM
From: Bernd Dammann
Subject: GPUs: MIG mode and accounting information
Thanks! I have already look at the lsbatch.h file, and dug my way through the nested structures. I also found two variables in the migJobInfo struct, giIdSize and ciIdSize, but when I print them, their values are sometimes in the range of 1025-1028, and not in the range I would expect, e.g. when comparing with the bjobs/bhost output. Is there some mask, that needs to be applied? That's not at all clear, when looking at the header file, only!
------------------------------
Bernd Dammann
Original Message:
Sent: Mon June 17, 2024 12:47 PM
From: YI SUN
Subject: GPUs: MIG mode and accounting information
With regarding to bjobs and C API, you may look into struct jobinfoEnt->(struct extJobInfoEnt). Detail can be found in lsbatch.h shipped with LSF package.
------------------------------
YI SUN
Original Message:
Sent: Mon June 17, 2024 04:30 AM
From: Bernd Dammann
Subject: GPUs: MIG mode and accounting information
Hi,
to increase the throughput on our GPU cluster, we run some GPUs with 'MIG mode' enabled. When it comes to accounting, we face some problems here:
- when using MIG mode, the GPU usage information of a job is lost, i.e. energy and memory usage (we are aware of, that this is a limitation in Nvidia's DCGM implementation)
- we lack a simple way to extract information about the assigned MIG in bacct (or via the C API), to be able to tell the user, that the job didn't run on a full GPU, but only a part of it. If we were to do "real" billing, the users would probably not want to pay for a full H100, if their jobs ran on 1/7 of the H100, or another partition size, only!
We are aware of, that we can get some information from bacct via the '-gpu' option, e.g. a per-task information like this:
GPU_ALLOCATION: HOST TASK GPU_ID GI_PLACEMENT/SIZE CI_PLACEMENT/SIZE MODEL MTOTAL FACTOR MRSV SOCKET NVLINK/XGMI hostA 0 0 4/3 4/3 NVIDIAH100PC 79.6G 9.0 0M 0 -
We don't have a 'JSON' option for bacct, and parsing the line based output above can be difficult.
We have our own in-house implementations of 'bacct', though, that produces 'JSON' output, which we then can feed into our accounting setup. However, there is no API documentation how to access this GPU/MIG information via the C API!
BTW, it would also be nice to get this information while the job is running, i.e. via bjobs!
Anybody else, having this issue? We can't be the only HPC site, using MIG mode with LSF, and having this problem.
Any hints from the IBM LSF experts/developers, how to get access to this information via the C API documentation?
Thanks!
------------------------------
Bernd Dammann
------------------------------