High Performance Computing Group

 View Only

 bjobs customized output: slots vs. nalloc_slot and number of processes?

Frank Thommen's profile image
Frank Thommen posted Tue March 18, 2025 12:25 PM

Hi,

I have two questions regarding bjobs' customized output (`bjobs -o...`) fields:

  • does anyone know, what the difference between the fields "slots" and "nalloc_slot" is? "nalloc_slot" sounds like "number of allocated slots", i.e. the number of slots, that my job got from LSF. But what is "slot"? In all cases I tried, both fields show the same value
  • does anyone know, how to list the number of processes (not threads, that would be "nthreads") started by a job? I need a number, hence "pids" or "alloc_slot" doesn't help in my case

Thanks for any pointers

Frank

YI SUN's profile image
YI SUN

I am guessing for resize job nalloc_slots may represent the original allocation when job starts and slots may represent number of slots actually used by the job at that time.

pid info is very dynamic, if you want to limit number of processes job can run at any time, PROCESSLIMIT or bsub -p can be used. To distinguish process or thread you may have to go through /proc/<pid> and /proc/<pid>/task (you will need to do this periodically as process and thread may change when job is running)

Bernd Dammann's profile image
Bernd Dammann

Hi,

"slots" reflects what you request with the '-n' option of bsub, and for most jobs, "slots" and "nalloc_slot" will be the same, unless you work with ranges for '-n'.

If you use "affinity",  where you can request a number of cores per slot, e.g. for MPI-OpenMP jobs, then you will see the difference:

bsub -n 4 -R "affinity[core(4)]" ...
bjobs -o "jobid: slots: nalloc_slot:"
JOBID      SLOTS NALLOC_SLO
    5249   4     16

Regards, Bernd

Frank Thommen's profile image
Frank Thommen

Thanks Yi Sun and Bernd Damman,

for some reason I don't seem able to respond to your answers :-(, hence I reply to my own question...

I think what reflects the requested number of slots in a job (`bsub -n`) is nreq_slot, not slots, isn't it? Or what would then nreq_slot be? Since slot ranges are not used in our environment, I will probably never see a difference between slots and nalloc_slot :-). But thanks for the interesting example with the affinity, which also explains possible differences in these numbers (in our cluster, we have a fixed and enforced affinity to threads (affinity[thread(1)]) for all the queues).

Generally I think, that most of the fields which can be used in bjobs' customized output, are unfortunately poorly documented, even though they could be extremely valuable for monitoring and optimizing LSF. Also there is not very much consistency in their names. I'm still hoping for improvement in future LSF versions. :-)

(Bernd, we are still using your bstat. Thanks!)

Frank

Bernd Dammann's profile image
Bernd Dammann

Hi Frank,

yes - the web interface here is broken, when it comes to replying to messages!  Also the draft function doesn't work - I lost my first answer yesterday, and had to type it again! :-(

About nreq_slot: that was actually added to bjobs by request of me! :-)  When a job is pending, slots and nalloc_slot are empty (or zero), like here:

$ bjobs -a -o "jobid: slots: nalloc_slot: nreq_slot:"
JOBID      SLOTS NALLOC_SLO NREQ_SLOT
24471380     -   0          16
24471381     -   0          4

The first job is an affinity job, with 4 slots and core(4) affinity, the second a simple '-n 4' job.  While LSF is trying to find suitable hosts, nalloc_slot might change from zero a number, that is smaller than the number of requested slots, as it reflects the number of already allocated slots for the job on suitable hosts, before the job has started.  Let's say you need 2 hosts to satisfy the request, and LSF is already blocking one node with 8 cores/slots, and waiting for another node to be able to dispatch, then nalloc_slot will be 8 during that period.

nreq_slot however reflects the total number of slots/cores, the job has requested at submission time.   This is useful information, we needed in our monitoring tools, and therefore we opened an RFE to get this added to LSF some time ago.  nreq_slot works only well for simple jobs and affinity, but has its limitations when people use ranges with '-n', or with compound requirements.  The limitations can be found here: https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=notes-limitations

I agree with you on the naming scheme for the output options of bjobs, but as they have been added over time, e.g. by customer/user requests, the names are somewhat "random".  On the other hand, now that they are there, you cannot change them easily, as this would break a lot of customized scripts, like ours!  What could be done, though: add a more consistent naming scheme as aliases for the existing ones.  

Regards, Bernd