High Performance Computing Group

High Performance Computing Group

Connect with HPC subject matter experts and discuss how hybrid cloud HPC Solutions from IBM meet today's business needs.

 View Only
Expand all | Collapse all

LSF bsub -Q maximal number of automatic requeue

  • 1.  LSF bsub -Q maximal number of automatic requeue

    Posted Mon August 23, 2021 09:40 AM

    Hi all,

    I would like to know how to set the maximal number of automatic requeue for a job when using bsub -Q.  
    I saw here [link] that "Specifying a job-level exit value using bsub -Q overrides all MAX_JOB_REQUEUE settings."  

    I am using the following command to submit jobs

    bsub -Q "all ~0" MyCommand

    It there a way to set that MAX_JOB_REQUEUE to 2 for example for a specific job ?

    Many thanks in advance for your help!
    Best,



    ------------------------------
    Romain Bouquet
    ------------------------------

    #SpectrumComputingGroup


  • 2.  RE: LSF bsub -Q maximal number of automatic requeue

    Posted Mon August 23, 2021 11:02 AM
    As the document says, you also can configure MAX_JOB_REQUEUE in lsb.applications for an application profile, e.g.
    Begin Application
    NAME =  myrequeujob
    DESCRIPTION = only requeue two times
    MAX_JOB_REQUEUE = 2
    End Application

    Then submit your job like this, bsub -Q "all ~0" -app myrequeuejob <myjob>

    ------------------------------
    YI SUN
    ------------------------------



  • 3.  RE: LSF bsub -Q maximal number of automatic requeue

    Posted Tue August 24, 2021 05:32 AM
    Edited by System Admin Fri January 20, 2023 04:35 PM

    Hi @YI SUN,

    Thanks for your reply, I had a look but it seems to me that one has to be administrator to create an application profile [link].
    Unfortunately I am just a regular user.

    Would there be a solution to create an application profile being a regular user ?

    Thanks again,
    Best



    ------------------------------
    Romain Bouquet
    ------------------------------



  • 4.  RE: LSF bsub -Q maximal number of automatic requeue

    Posted Tue August 24, 2021 06:20 PM
    It doesn't seem LSF has job level control on this. You will have to ask your admin to add this for you.


    ------------------------------
    YI SUN
    ------------------------------



  • 5.  RE: LSF bsub -Q maximal number of automatic requeue

    Posted Thu August 26, 2021 03:21 AM

    Hi @YI SUN,

    Alright it is unfortunate but thanks a lot for your help!

    Best,​​



    ------------------------------
    Romain Bouquet
    ------------------------------



  • 6.  RE: LSF bsub -Q maximal number of automatic requeue

    Posted Thu August 26, 2021 11:30 AM
    Roman,

    It's simple to do.  Here is a section from the man pages for lsb.queues

    MAX_JOB_REQUEUE

    Specifies the maximum number of times to requeue a job automati‐
    cally.

    Syntax
    MAX_JOB_REQUEUE=integer

    Valid values
    0 < MAX_JOB_REQUEUE < INFINIT_INT

    INFINIT_INT is defined in lsf.h.

    Default
    Not defined. The number of requeue times is unlimited

    ------------------------------
    Larry Adams
    ------------------------------



  • 7.  RE: LSF bsub -Q maximal number of automatic requeue

    Posted Fri August 27, 2021 03:43 AM
    Edited by System Admin Fri January 20, 2023 04:08 PM

    Hi @Larry Adams,

    Maybe there is something I do not understand,
    according to this link it seems to me one has to be administrator to change the lsb.queues file. Am I wrong?
    ​Sadly I am just a regular user.

    Would there be a way to use MAX_JOB_REQUEUE for a regular user?
    For instance something like (or similar syntax to retry jobs only twice)

    export MAX_JOB_REQUEUE=2
    bsub -Q "all ~0" <myjob>

    So that the LSF/bsub command takes into account the value MAX_JOB_REQUEUE ?

    Thanks in advance,

    Best,



    ------------------------------
    Romain Bouquet
    ------------------------------



  • 8.  RE: LSF bsub -Q maximal number of automatic requeue

    Posted Fri August 27, 2021 12:32 PM
    Edited by System Admin Fri January 20, 2023 04:26 PM
    Romain,

    That would be easy enough to test.  Just have the script exit.  Make the script command be "exit 1".  That should cause a re-queue loop if the setting is taken.  I know that several variables can be re-read from the environment, but I'm not too certain about this one as it's more of a mbatchd things vs. a sbatchd thing.  I could be wrong though.  Like I said, easy enough to test.

    Larry

    ------------------------------
    Larry Adams
    ------------------------------



  • 9.  RE: LSF bsub -Q maximal number of automatic requeue

    Posted Fri August 27, 2021 12:38 PM
    Just tested, that did not go over too well:

    [lsfadmin@vmhost6 configdir]$ bhist -l 101 | grep "Running with" | wc -l
    80
    ​

    But as soon as I set the max value in the queue, the job went suspended.  I would suggest you talk to the LSF admin and suggest they add that setting as it will prevent the requeue loop for other users.



    ------------------------------
    Larry Adams
    ------------------------------



  • 10.  RE: LSF bsub -Q maximal number of automatic requeue

    Posted Mon August 30, 2021 06:06 AM

    HI @Larry Adams,

    Alright thanks a lot for your help,

    Best,




    ------------------------------
    Romain Bouquet
    ------------------------------