High Performance Computing

 View Only

Extending the Spectrum LSF GUI to display job GPU metrics

By Gábor Samu posted Tue September 08, 2020 01:37 PM

  

I’ve previously written about accounting for GPU workloads in Spectrum LSF using Nvidia DCGM to collect granular metrics including energy consumed, memory used, and overall GPU utilization. Spectrum LSF collects the information and it is made available through the familiar bhist and bacct commands. For people who rely upon the web-based user interface (UI) of the Spectrum LSF Suite or Spectrum LSF Application Center, how can you view the job GPU utilization metrics without leaving the comfort of the UI?  It turns out that this is quite straightforward to achieve, by customizing the Spectrum LSF web-based UI. Here we will provide a simple example showing how: 

  • Administrators can add custom tabs in the Spectrum LSF web-based UI
  • Display GPU accounting information (on a per job basis) in the Spectrum LSF web-based UI
Note: The following assumes that Nvidia DCGM support has been enabled in Spectrum LSF and that you are running an edition of the Spectrum LSF Suite or Spectrum LSF Application Center.

The Spectrum LSF web-based UI provides the ability for GUI administrators to create new tabs with a user specified URL or command. Here we will create a new tab which runs a command (script) which will in turn run the Spectrum LSF bhist command to display the GPU metrics for a given job.  The sample script provided contains some simple logic to distinguish between a GPU and non-GPU job.

A.  To begin, we’ll require a simple script to display the detailed historical data of a given jobID, including GPU metrics using the Spectrum LSF bhist command. An example simple script is provided below which is saved with filename gpu_acct.sh.

#!/bin/sh
if [ -z "$1" ]
then
    echo "Usage $0 <jobID>"
else
OUTPUT=`bhist -a -l -gpu $1`
grep -q 'GPU Energy Consumed' <<< $OUTPUT && bhist -a -l -gpu $1 || echo "Not a GPU job."
fi

As the Spectrum LSF administrator, create the above script in the $LSF_BINDIR directory with permissions 755.


B.  Next, login to the Spectrum LSF web-based job management interface as a user with administrative permissions.  Navigate to Workload > Workload. Note that the user must have the Application Center Administrator privilege.




C.  It’s now necessary to select one of the jobs in the job list in order to view the job detail view. This is the page where we will be adding the GPU accounting tab.



D.  Click the edit (pencil) dropdown that can be found at the top right of the Spectrum LSF web-based interface and select Edit Page



This will display the Create New Tab window which will be filled in during the next step.


E.  In the Create New Tab window, specify the following:

  • Tab Label: GPU accounting
  • Content From: Command and specify the command gpu_acct.sh %J
Click the Apply button to complete the addition of the new tab on the job detail page. 




F.  Finally, click the Edit Page dropdown on the top right corner of the interface and select Apply and exit Pages Editing to make the changes take effect. You will now see a new GPU accounting tab in the job detail view. Here I’ve selected a GPU job that has been run previously through Spectrum LSF. We see the full bhist output displayed here including the detailed GPU accounting.

 






For jobs that have not requested a GPU resource through Spectrum LSF, we will see the message “Not a GPU job.” displayed when the GPU accounting tab is selected.




That concludes this simple example showing how the Spectrum LSF web-based interface can be customized.  Find your use case today and get customizing!


    ​​​
    #LSF
    #SpectrumComputingGroup
    #Spectrum-LSF
    0 comments
    13 views

    Permalink