According to the bsub documentation, one can request a GPU model with the gmodel keyword. From the man page:
The gmodel keyword supports the following formats:
gmodel=model_name
Requests GPUs with the specified brand and
model name (for example, TeslaK80).
gmodel=short_model_name
Requests GPUs with a specific brand name (for
example, Tesla, Quadro, NVS) or model type name
(for example, K80, P100).
gmodel=model_name-mem_size
Requests GPUs with the specified brand name and
total GPU memory size. The GPU memory size
consists of the number and its unit, which
includes M, G, T, MB,
GB, and TB (for example,
12G).
What are the valid model_name and short_model_name, e.g. for our A100 PCIE 40GB GPUs? In bhosts -gpu they appear as "UnknownNVIDIAA100_PCIE_40GB", and the only two strings LSF accepts for gmodel are the long name as in the bhosts output, and "NVIDIAA100_PCIE_40GB". I would have expected something like "NVIDIAA100" or "A100" to work, too, but those jobs will never start, with the pending reason "Not enough GPUs with the required GPU model on the hosts". Is there a way, to make LSF show the "valid" model names?
------------------------------
Bernd Dammann
------------------------------
#SpectrumComputingGroup