High Performance Computing Group

Connect with HPC subject matter experts and discuss how hybrid cloud HPC Solutions from IBM meet today's business needs.

View Only

Back to discussions

Expand all | Collapse all

mem=xxx/job (per job) resource request string problem with RESOURCE_RESERVE_PER_TASK=Y

1. mem=xxx/job (per job) resource request string problem with RESOURCE_RESERVE_PER_TASK=Y

Like
Colin Rudakiewicz
Posted Mon April 28, 2025 03:17 AM

Reply
Hi all,

Needing some help please with IBM Spectrum Platform LSF 10.1.0.10 (plan to install fix pack 14 in the next maintenance window).

Using the following example, host1 has 300GB, if we submit a job using: bsub -n 4 -R "rusage[mem=20000/job]" -R "span[stripe]" -R "hname==host1" sleep 60, and with RESOURCE_RESERVE_PER_TASK=Y (confdir/lsb.params) the scheduler _initially_ wants to allocate 80000 (80GB) mem (4x20GB), this can be seen by watching mem resource using bhosts -l (linux watch), curiously 10 seconds into execution mem drops back to 20000 (20GB). The main problem is If we increase -R "rusage[mem=200000/job]" (extra 0 i.e. 200GB) the job will PEND as it cannot initially allocate 800GB (200GBx4) even though the job only needs 200GB to run.

If we modify RESOURCE_RESERVE_PER_TASK=N then the problem does not occur. It appears that the scheduler initially wants to allocate memory per task as per RESOURCE_RESERVE_PER_TASK=Y, initially ignoring job RES_REQ mem=xxx/job (per job) until upto 10 seconds into execution, this initial per task/per job mismatch can cause jobs to incorrectly PEND.

Please advise is this a bug?

Many Thanks,

Colin

------------------------------
Colin Rudakiewicz
------------------------------
2. RE: mem=xxx/job (per job) resource request string problem with RESOURCE_RESERVE_PER_TASK=Y

Like
YI SUN
Posted Mon April 28, 2025 05:28 PM

Reply
For parallel job cross nodes, rusage[mem/job] is ambiguous as LSF doesn't know the memory should be reserved on each node. Or you may argue that reserving memory on node by # of tasks on it. Then can you use rusage[mem] as you set RESOURCE_RESERVE_PER_TASK=Y, or rusage[mem/task]. For weird behavior you are observing, maybe create a support case for more clarification.

------------------------------
YI SUN
------------------------------

Original Message
3. RE: mem=xxx/job (per job) resource request string problem with RESOURCE_RESERVE_PER_TASK=Y

Like
Colin Rudakiewicz
Posted Tue April 29, 2025 07:17 AM

Reply
Hi Yi,

Thank you for replying.

The behaviour observed is for a single host, in a test lab here Fix Pack 10 (build 545500) added /job or /task sub options for mem resource in the request string.

Please can you clariffy, our understanding is -R "rusage[mem=20000/job]" should override RESOURCE_RESERVE_PER_TASK=Y?

Many Thanks,

Colin

------------------------------
Colin Rudakiewicz
------------------------------

Original Message
4. RE: mem=xxx/job (per job) resource request string problem with RESOURCE_RESERVE_PER_TASK=Y

Like
Colin Rudakiewicz
Posted Thu May 08, 2025 07:47 AM

Reply
Same problem observed with Fix Pack 14 (601547)

------------------------------
Colin Rudakiewicz
------------------------------

Original Message

High Performance Computing Group

High Performance Computing Group

mem=xxx/job (per job) resource request string problem with RESOURCE_RESERVE_PER_TASK=Y

Colin RudakiewiczMon April 28, 2025 03:17 AM

YI SUNMon April 28, 2025 05:28 PM

Colin RudakiewiczTue April 29, 2025 07:17 AM

Colin RudakiewiczThu May 08, 2025 07:47 AM

1. mem=xxx/job (per job) resource request string problem with RESOURCE_RESERVE_PER_TASK=Y

2. RE: mem=xxx/job (per job) resource request string problem with RESOURCE_RESERVE_PER_TASK=Y

3. RE: mem=xxx/job (per job) resource request string problem with RESOURCE_RESERVE_PER_TASK=Y

4. RE: mem=xxx/job (per job) resource request string problem with RESOURCE_RESERVE_PER_TASK=Y

Additional
Resources

Office

Quick Links

High Performance Computing Group

High Performance Computing Group

mem=xxx/job (per job) resource request string problem with RESOURCE_RESERVE_PER_TASK=Y

Colin RudakiewiczMon April 28, 2025 03:17 AM

YI SUNMon April 28, 2025 05:28 PM

Colin RudakiewiczTue April 29, 2025 07:17 AM

Colin RudakiewiczThu May 08, 2025 07:47 AM

1. mem=xxx/job (per job) resource request string problem with RESOURCE_RESERVE_PER_TASK=Y

2. RE: mem=xxx/job (per job) resource request string problem with RESOURCE_RESERVE_PER_TASK=Y

3. RE: mem=xxx/job (per job) resource request string problem with RESOURCE_RESERVE_PER_TASK=Y

4. RE: mem=xxx/job (per job) resource request string problem with RESOURCE_RESERVE_PER_TASK=Y

Related Content

Advanced LSF resource connector configuration on IBM Cloud - part II

GPU usage information for jobs in IBM Spectrum LSF

Job Starvation in Your HPC Cluster

Advanced LSF resource connector configuration on IBM Cloud - part III

DynaMIG management of NVIDIA DGX A100 with IBM Spectrum LSF

Additional Resources

Office

Quick Links

Additional
Resources