High Performance Computing Group

High Performance Computing Group

Connect with HPC subject matter experts and discuss how hybrid cloud HPC Solutions from IBM meet today's business needs.

 View Only
  • 1.  The suspended jobs does not display suspending reasons with bjobs customized output

    Posted Wed January 17, 2024 01:37 PM
    I have downloaded and completed the installation LSF Community Edition 10.1.0.12, Jun 10 2021 in CentOS 8
    lsfadmin ~ $ lsid
    IBM Spectrum LSF Community Edition 10.1.0.12, Jun 10 2021
    Copyright IBM Corp. 1992, 2016. All rights reserved.
    I have found behaves differently on `bjobs -l` and `bjobs -o 'jobid stat pend_reason suspend_reason'` for suspended jobs(SSUSP, USUSP and PSUSP), specific fields output does not display suspended reason
    lsfadmin ~ $ bsub sleep 1000
    Job <102> is submitted to default queue <normal>.
    lsfadmin ~ $ bjobs -o 'jobid stat'
    JOBID STAT
    102 RUN
    lsfadmin ~ $ bstop 102
    Job <102> is being stopped
    lsfadmin ~ $ bjobs -l 102
    Job <102>, User <lsfadmin>, Project <default>, Status <USUSP>, Queue <normal>,
                         Command <sleep 1000>, Share group charged </lsfadmin>
    ... ...
     SUSPENDING REASONS:
     Job was suspended by an administrator or root;
    ... ...
    
    lsfadmin ~ $ bjobs -o 'jobid stat pend_reason suspend_reason'
    JOBID STAT PEND_REASON SUSPEND_REASON
    102 USUSP - -
    `bstop -C '...' <jobId>` will display customized reason, but I would like to get information about `bjobs -l` reason. Because sometimes some administrators will only `bstop <jobId>`(no -C). Common normal user does not also display reason for suspending job.
    lsfadmin ~ $ bstop -C 'testreason' 103
    Job <103> is being stopped
    lsfadmin ~ $ bjobs -l 103
    Job <103>, User <lsfadmin>, Project <default>, Status <USUSP>, Queue <normal>,
    ... ...
     SUSPENDING REASONS:
     Job was suspended by an administrator or root;
    ... ...
    lsfadmin ~ $ bjobs -o 'jobid stat pend_reason suspend_reason'
    JOBID STAT PEND_REASON SUSPEND_REASON
    103 USUSP - testreason
    For PEND job(resource requirements are not met, e.g.: ` -R affinity[core(100)]' `) will display same reason message for `bjob -l` and `bjob -o ...`
     
    I have below questions:
    1) Is this a bug or a feature for bjobs customized output with suspended job, is there any documentation to further explain this behavior
    2) How to display similar `bjobs -l` suspending reason with specific field for bjobs
    Thanks


    ------------------------------
    t4tkq t4tkq
    ------------------------------


  • 2.  RE: The suspended jobs does not display suspending reasons with bjobs customized output

    Posted Wed January 17, 2024 07:00 PM

    Could you try bjobs -p1 -o 'jobid stat pend_reason suspend_reason'?



    ------------------------------
    YI SUN
    ------------------------------



  • 3.  RE: The suspended jobs does not display suspending reasons with bjobs customized output

    Posted Wed January 17, 2024 11:18 PM

    I have already tried it, according to command output, will filter PEND job not [SUP]SUSP

    lsfadmin ~ $  bstop 1033
    Job <1033> is being stopped
    lsfadmin ~ $ bjobs -p1 -o 'jobid stat pend_reason suspend_reason'
    No pending job found
    lsfadmin ~ $ bjobs -s -o 'jobid stat pend_reason suspend_reason'
    JOBID STAT PEND_REASON SUSPEND_REASON
    1033 USUSP - -
    



    ------------------------------
    t4tkq t4tkq
    ------------------------------



  • 4.  RE: The suspended jobs does not display suspending reasons with bjobs customized output

    Posted Thu January 18, 2024 02:39 PM

    Understood. It seems LSF command behavior is not consistent, suggest to create a case with Support.



    ------------------------------
    YI SUN
    ------------------------------