You are right, those parameters are now enabled by default. So I guess lshosts -gpu doesn't show anything about AMD GPU in your cluster. Probably you should reach LSF support to run debug on LSF lim service. Is there any indication in lim log ?
Original Message:
Sent: Tue August 29, 2023 09:12 AM
From: Bernd Dammann
Subject: LSF and AMD GPUs
To be on the safe side, we added the parameter to lsf.conf and restarted the cluster. Still no success!
Is there any guide/documentation, what one should be aware of, when using AMD GPUs? How does LSF find them, and where does it look for it?
------------------------------
Bernd Dammann
Original Message:
Sent: Tue August 29, 2023 02:45 AM
From: Bernd Dammann
Subject: LSF and AMD GPUs
Thanks! We will try to add this to lsf.conf. The ROCM library is installed.
BTW, the documentation about LSF_GPU_RESOURCE_IGNORE is not very clear: it says "Default: Y", which made us believe it is set to 'Y', if not present in lsf.conf - which is obviously not true!
------------------------------
Bernd Dammann
Original Message:
Sent: Mon August 28, 2023 05:17 PM
From: YI SUN
Subject: LSF and AMD GPUs
Add LSF_GPU_RESOURCE_IGNORE=Y in lsf.conf then restart the cluster. Also make sure you have installed ROCM SMI library on AMD GPU node.
------------------------------
YI SUN
Original Message:
Sent: Sun August 27, 2023 12:23 PM
From: Bernd Dammann
Subject: LSF and AMD GPUs
We run the latest fix pack (14):
$ lim -V
EGO 3.4.0 build 601547, April 20 2023
$ sbatchd -V
IBM Spectrum LSF 10.1.0.0 build 601547, April 20 2023
binary type: linux3.10-glibc2.17-x86_64
------------------------------
Bernd Dammann
Original Message:
Sent: Sun August 27, 2023 12:16 PM
From: YI SUN
Subject: LSF and AMD GPUs
What version of LSF is in use? E.g. run lim -V and sbatchd -V.
------------------------------
YI SUN
Original Message:
Sent: Fri August 25, 2023 08:54 AM
From: Bernd Dammann
Subject: LSF and AMD GPUs
Hi,
we added a new host with two AMD GPUs to our cluster - but they are not detected by LSF! Is that supported at all by LSF? We can find a lot of references to AMD GPUs in the documentation for bsub, etc, but nothing for how to configure LSF to support/detect AMD GPUs. We use
LSB_GPU_NEW_SYNTAX=extend
LSF_GPU_AUTOCONFIG=Y
in lsf.conf.
Any hints, what we are missing?
------------------------------
Bernd Dammann
------------------------------