IBM Storage The online community where IBM Storage users meet, share, discuss, and learn. Join the Community
I've seen this problem with multiple tools with bjobs using multiple cores.
Basically, in these tools, there is an option to specify the number of cores desired, and if left blank, the tool will just use however many it thinks is right for the job. Also within the tool, you can specify how many cores to request via LSF and that forms the resource string for bjob submission.
When the job is submitted to LSF, LSF will put the job on a machine with enough cores to satisfy the LSF request, but then, once on that machine, the tool will use however many cores it wants, unless the user has specifically said to match the tool # of cores request to the LSF # of cores request.
Even worse, LSF does not know that the tool is actually using more cores than allocated, so LSF will continue to put jobs on that grid machine even though the bjob running on it is actually using all of the available cores. This leads to all of the jobs running much slower.
Is there any option in LSF to restrict the bjob from taking more cores than specified in LSF job request?
Use LSF CPU affinity and cgroup integration to restrict jobs taking more cores