Primary Storage

 View Only
  • 1.  LSF is not giving correct used slots info

    Posted Tue June 15, 2021 08:15 AM

    Have submitted jobs with rusage[cpu=4], stress job(stress.o -c 4) and could see on execution host top cmd 4 vcpus are used. All bhosts, bslots and bjobs showing only 1 slots allocated/used.

    Expecting LSF to block 4 slots and show 4 used slots in bhosts in RUN.

    Sharing bhosts, bjobs, bslots output info,

    job submission cmd -- [ec2-user~]$ bsub -q 4core "/usr/local/share/LSF_TOP/tools/stress.o -c 4 -m 1 --vm-bytes 240000 --timeout 5m"

    Job <165175> is submitted to queue <4core>.

    [ec2-user~]$ bjobs -w 165175

    JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME

    165175 ec2-user RUN 4core ip-10-0-1-167.us-east-2.compute.internal ip-178-1-255-150.us-east-2.compute.internal stress.o -c 4 -m 1 --vm-bytes 240000 --timeout 5m Jun 15 08:05

    [ec2-user~]$ bslots -l

    SLOTS: 10094

    RUNTIME: UNLIMITED

    HOSTS: 3*ip-178-1-255-150.us-east-2.compute.internal

    [ec2-user~]$ bjobs -l 165175

    Job <165175>, User <ec2-user>, Project <default>, Status <RUN>, Queue <4core>,

    Command </usr/local/share/LSF_TOP/tools/stress.o -c 4 -m 1

    --vm-bytes 240000 --timeout 5m>

    Tue Jun 15 08:05:55: Submitted from host <ip-10-0-1-167.us-east-2.compute.inter

    nal>, CWD <$HOME>;

    Tue Jun 15 08:05:56: Started 1 Task(s) on Host(s) <ip-178-1-255-150.us-east-2.c

    ompute.internal>, Allocated 1 Slot(s) on Host(s) <ip-178-1

    -255-150.us-east-2.compute.internal>, Execution Home </hom

    e/ec2-user>, Execution CWD </home/ec2-user>;

    Thanks


    #Support
    #SupportMigration
    #Spectrum


  • 2.  RE: LSF is not giving correct used slots info

    Posted Wed June 16, 2021 06:17 PM
    I presume you have LSB_ENABLE_HPC_ALLOCATION set. Do you LSF cpu affinity feature? You need to show us complete bjobs -l output so we can see combined and effective resource requirement. Also bqueues -l 4core is useful as well.

    I don't think rusage[cpu=4] means reserve for cores. In your example job itself can use up to 4 cores (stress -4), but LSF doesn't know, you should let LSF reserve 4 cores for you, this can be done either through bsub -n 4 and/or job cpu affinity -aff option.


    #Spectrum
    #SupportMigration
    #Support


  • 3.  RE: LSF is not giving correct used slots info

    Posted Fri June 18, 2021 06:47 PM

    Thanks for revert.


    Yes, LSB_ENABLE_HPC_ALLOCATION is set to Y and not using cpu affinity feature.


    Here is complete bjobs -l output and cpu usage on execution host. We are trying to get total available, currently used (not as per -n, but actual usage of slots on exec host) and blocked slots information in a given cluster.


    [ec2-user~]$ bjobs -l 165260


    Job <165260>, User <ec2-user>, Project <default>, Status <RUN>, Queue <4core>,

               Command </usr/local/share/LSF_TOP/tools/stress.o -c 4 -m 1

               --vm-bytes 240000 --timeout 5m>

    Fri Jun 18 18:37:56: Submitted from host <ip-10-0-1-167.us-east-2.compute.inter

               nal>, CWD <$HOME>;

    Fri Jun 18 18:37:57: Started 1 Task(s) on Host(s) <ip-178-1-113-245.us-east-2.c

               ompute.internal>, Allocated 1 Slot(s) on Host(s) <ip-178-1

               -113-245.us-east-2.compute.internal>, Execution Home </hom

               e/ec2-user>, Execution CWD </home/ec2-user>;

    Fri Jun 18 18:38:18: Resource usage collected.

               The CPU time used is 108 seconds.

               MEM: 2 Mbytes; SWAP: 0 Mbytes; NTHREAD: 11

               PGID: 13727; PIDs: 13727 13728 13730 13731 13732 13733

               13734 13735 13736 13737


     MEMORY USAGE:

     MAX MEM: 2 Mbytes; AVG MEM: 2 Mbytes


     SCHEDULING PARAMETERS:

          r15s  r1m r15m  ut   pg  io  ls  it  tmp  swp  mem

     loadSched  -   -   -   -    -   -  -   -   -   -   -

     loadStop  -   -   -   -    -   -  -   -   -   -   -


     RESOURCE REQUIREMENT DETAILS:

     Combined: select[type == local] order[r15s:pg]

     Effective: select[type == local] order[r15s:pg]


    [ec2-user~]$ bhosts -w ip-178-1-113-245

    HOST_NAME     STATUS     JL/U  MAX NJOBS  RUN SSUSP USUSP  RSV

    ip-178-1-113-245.us-east-2.compute.internal ok       -   4   1   1   0   0   0


    #SupportMigration
    #Support
    #Spectrum


  • 4.  RE: LSF is not giving correct used slots info

    Posted Mon June 21, 2021 03:14 PM

    bjobs output shows you only let LSF know your job use one slot and LSF doesn't know your job will use up to four cores. In your case you should either use bsub -n4 or use cpu affinity to make LSF bind your job to 4 cores.


    #SupportMigration
    #Spectrum
    #Support