High Performance Computing Group

 View Only
  • 1.  GPU models: short names, full name, etc

    Posted Tue May 18, 2021 09:39 AM
    According to the bsub documentation, one can request a GPU model with the gmodel keyword. From the man page:

    The gmodel keyword supports the following formats:

    gmodel=model_name
    Requests GPUs with the specified brand and
    model name (for example, TeslaK80).

    gmodel=short_model_name
    Requests GPUs with a specific brand name (for
    example, Tesla, Quadro, NVS) or model type name
    (for example, K80, P100).

    gmodel=model_name-mem_size
    Requests GPUs with the specified brand name and
    total GPU memory size. The GPU memory size
    consists of the number and its unit, which
    includes M, G, T, MB,
    GB, and TB (for example,
    12G).

    What are the valid model_name and short_model_name, e.g. for our A100 PCIE 40GB GPUs?  In bhosts -gpu they appear as "UnknownNVIDIAA100_PCIE_40GB", and the only two strings LSF accepts for gmodel are the long name as in the bhosts output, and "NVIDIAA100_PCIE_40GB".  I would have expected something like "NVIDIAA100" or "A100" to work, too, but those jobs will never start, with the pending reason "Not enough GPUs with the required GPU model on the hosts".  Is there a way, to make LSF show the "valid" model names?



    ------------------------------
    Bernd Dammann
    ------------------------------

    #SpectrumComputingGroup


  • 2.  RE: GPU models: short names, full name, etc

    Posted Tue May 18, 2021 12:51 PM
    I found this link for gmodel, https://www.ibm.com/support/pages/node/888403.

    For A100 support, you should install following patches if you haven't done so.

    http://www.ibm.com/support/fixcentral/swg/selectFixes?product=ibm/Other+software/IBM+Spectrum+LSF&release=All&platform=All&function=fixId&fixids=lsf-10.1-build600061&includeSupersedes=0

    http://www.ibm.com/support/fixcentral/swg/selectFixes?product=ibm/Other+software/IBM+Spectrum+LSF&release=All&platform=All&function=fixId&fixids=lsf-10.1-build600212&includeSupersedes=0



    ------------------------------
    YI SUN
    ------------------------------



  • 3.  RE: GPU models: short names, full name, etc

    Posted Wed May 19, 2021 04:47 AM
    Thanks - this page is very useful.  It looks like, that the short name contains more "information" than I expected.  I'd suggest to update the man pages, because the models used there (K80, etc) are no longer state of the art GPUs.

    ------------------------------
    Bernd Dammann
    ------------------------------