IBM Storage The online community where IBM Storage users meet, share, discuss, and learn. Join the Community
I'm running Spectrum LSF Community Ed 10.1.0 and a number of my users are seeing their jobs pending (often for an hour or more) despite the requested resources (typically memory) being available on one or more hosts in the cluster.
Running a bjobs - lp on one of the affected JobIDs just provides the following:
Job requirements for reserving resource (mem) not satisfied: 4 hosts;
despite sufficient memory being available on a number of hosts (determined as physical RAM minus max(RAM in use, sum of LSF requested RAM)) to satisfy their resource requirement.
How can I debug the issue further? The vast majority of jobs submitted with resource requirements run correctly with little or no time spent pending. It's a seemingly sporadic issue that I'm struggling to diagnose.
Any help would be appreciated,
One possible thing is that a running job requesing memory uses rusage[mem=n] in its resource requirement, so LSF reserves memory (requested minus current in use) for running job even though the job is not currently consuming this much of memory. Reserved memory cannot be used by other jobs. You can use bhosts -l <host name> to check if there is memory reservation and actual available memory available on the host for scheduling.