I'm running Spectrum LSF Community Ed 10.1.0 and a number of my users are seeing their jobs pending (often for an hour or more) despite the requested resources (typically memory) being available on one or more hosts in the cluster.
Running a bjobs - lp on one of the affected JobIDs just provides the following:
Job requirements for reserving resource (mem) not satisfied: 4 hosts;
despite sufficient memory being available on a number of hosts (determined as physical RAM minus max(RAM in use, sum of LSF requested RAM)) to satisfy their resource requirement.
How can I debug the issue further? The vast majority of jobs submitted with resource requirements run correctly with little or no time spent pending. It's a seemingly sporadic issue that I'm struggling to diagnose.
Any help would be appreciated,