High Performance Computing Group

 View Only
  • 1.  LSF: decoding GPU information in lsb.acct

    Posted Wed July 17, 2024 07:16 AM

    Hi,

    is there some documentation, or even better, an API call, to decode the GPU_ALLOC_COMPAT string in the lsb.acct file?  We have our own internal version of 'bacct', that provides e.g. JSON output, we can feed into our internal accounting/billing tool.  With different models in the GPU queues, we also need to get at least the GPU model, and eventually the 'mig' value from this string.

    Thanks!



    ------------------------------
    Bernd Dammann
    ------------------------------


  • 2.  RE: LSF: decoding GPU information in lsb.acct

    Posted Thu July 18, 2024 09:13 AM

    Please refer to the structure "struct gpuJobData" in lsbatch.h. This string is generated from gpuJobData but add some more information.



    ------------------------------
    Ji Shan Xing
    ------------------------------



  • 3.  RE: LSF: decoding GPU information in lsb.acct

    Posted Tue July 23, 2024 08:15 AM

    Hi,

    thanks for the hint, but I don't think it helps!  I can recognize some of the information from the 'struct gpuJobData', and the enclosed other data structures, but this "add some more information" is the problem.  How to decode this extra information, if there is no documentation how it was created? Example:  this is what I get from the "struct jobFinishLog" when passing the lsb.acct file, for two different GPU jobs.  The first one was dispatched to a GPU in MIG mode, the second to a GPU in non-MIG mode:

    	GPU_ALLOC_COMPAT: 480:1,1,370:10:hostA,5,0,0,4,79:0,0,0,1,68:1,4,513,513,-1,-1,0,1,43:NVIDIAA100_PCIE_40GB,40960,8,0,40239,0,1,1,79:1,0,0,1,68:1,4,513,513,-1,-1,0,1,43:NVIDIAA100_PCIE_40GB,40960,8,0,40239,0,1,1,79:2,0,0,1,68:1,4,513,513,-1,-1,0,1,43:NVIDIAA100_PCIE_40GB,40960,8,0,40239,0,1,1,79:3,0,0,1,68:1,4,513,513,-1,-1,0,1,43:NVIDIAA100_PCIE_40GB,40960,8,0,40239,0,1,1,1,HOST_NVIDIA_CNT:2,1,EFFECTIVE_GPU_REQ:num=1:mode=exclusive_process:mps=no:j_exclusive=yes:aff=no:gvendor=nvidia:mig=1/1,
    
    	GPU_ALLOC_COMPAT: 454:0,1,352:9:hostB,1,4,0,4,75:0,1,50:0,1,41:NVIDIAA10080GBPCIe,81920,8,0,81038,0,0,0,0,1,GPU_MEM_RSV:0,0,75:1,1,50:0,1,41:NVIDIAA10080GBPCIe,81920,8,0,81038,0,0,0,0,1,GPU_MEM_RSV:0,0,75:2,1,50:0,1,41:NVIDIAA10080GBPCIe,81920,8,0,81038,0,0,0,0,1,GPU_MEM_RSV:0,0,75:3,1,50:0,1,41:NVIDIAA10080GBPCIe,81920,8,0,81038,0,0,0,0,1,GPU_MEM_RSV:0,0,1,HOST_NVIDIA_CNT:2,1,EFFECTIVE_GPU_REQ:num=1:mode=exclusive_process:mps=no:j_exclusive=yes:aff=no:gvendor=nvidia,

    How to interpret this two very different lines?



    ------------------------------
    Bernd Dammann
    ------------------------------