IBM Spectrum Computing Group

Expand all | Collapse all

LSF bsub -Q maximal number of automatic requeue

  • 1.  LSF bsub -Q maximal number of automatic requeue

    Posted Mon August 23, 2021 09:40 AM

    Hi all,

    I would like to know how to set the maximal number of automatic requeue for a job when using bsub -Q.  
    I saw here [link] that "Specifying a job-level exit value using bsub -Q overrides all MAX_JOB_REQUEUE settings."  

    I am using the following command to submit jobs

    bsub -Q "all ~0" MyCommand

    It there a way to set that MAX_JOB_REQUEUE to 2 for example for a specific job ?

    Many thanks in advance for your help!
    Best,



    ------------------------------
    Romain Bouquet
    ------------------------------


  • 2.  RE: LSF bsub -Q maximal number of automatic requeue

    Posted Mon August 23, 2021 11:02 AM
    As the document says, you also can configure MAX_JOB_REQUEUE in lsb.applications for an application profile, e.g.
    Begin Application
    NAME =  myrequeujob
    DESCRIPTION = only requeue two times
    MAX_JOB_REQUEUE = 2
    End Application

    Then submit your job like this, bsub -Q "all ~0" -app myrequeuejob <myjob>

    ------------------------------
    YI SUN
    ------------------------------



  • 3.  RE: LSF bsub -Q maximal number of automatic requeue

    Posted 30 days ago
    Edited by Romain Bouquet 30 days ago

    Hi @YI SUN,

    Thanks for your reply, I had a look but it seems to me that one has to be administrator to create an application profile [link].
    Unfortunately I am just a regular user.

    Would there be a solution to create an application profile being a regular user ?

    Thanks again,
    Best



    ------------------------------
    Romain Bouquet
    ------------------------------



  • 4.  RE: LSF bsub -Q maximal number of automatic requeue

    Posted 29 days ago
    It doesn't seem LSF has job level control on this. You will have to ask your admin to add this for you.


    ------------------------------
    YI SUN
    ------------------------------



  • 5.  RE: LSF bsub -Q maximal number of automatic requeue

    Posted 28 days ago

    Hi @YI SUN,

    Alright it is unfortunate but thanks a lot for your help!

    Best,​​



    ------------------------------
    Romain Bouquet
    ------------------------------



  • 6.  RE: LSF bsub -Q maximal number of automatic requeue

    Posted 28 days ago
    Roman,

    It's simple to do.  Here is a section from the man pages for lsb.queues

    MAX_JOB_REQUEUE

    Specifies the maximum number of times to requeue a job automati‐
    cally.

    Syntax
    MAX_JOB_REQUEUE=integer

    Valid values
    0 < MAX_JOB_REQUEUE < INFINIT_INT

    INFINIT_INT is defined in lsf.h.

    Default
    Not defined. The number of requeue times is unlimited

    ------------------------------
    Larry Adams
    ------------------------------



  • 7.  RE: LSF bsub -Q maximal number of automatic requeue

    Posted 27 days ago
    Edited by Romain Bouquet 27 days ago

    Hi @Larry Adams,

    Maybe there is something I do not understand,
    according to this link it seems to me one has to be administrator to change the lsb.queues file. Am I wrong?
    ​Sadly I am just a regular user.

    Would there be a way to use MAX_JOB_REQUEUE for a regular user?
    For instance something like (or similar syntax to retry jobs only twice)

    export MAX_JOB_REQUEUE=2
    bsub -Q "all ~0" <myjob>

    So that the LSF/bsub command takes into account the value MAX_JOB_REQUEUE ?

    Thanks in advance,

    Best,



    ------------------------------
    Romain Bouquet
    ------------------------------



  • 8.  RE: LSF bsub -Q maximal number of automatic requeue

    Posted 26 days ago
    Edited by Larry Adams 26 days ago
    Romain,

    That would be easy enough to test.  Just have the script exit.  Make the script command be "exit 1".  That should cause a re-queue loop if the setting is taken.  I know that several variables can be re-read from the environment, but I'm not too certain about this one as it's more of a mbatchd things vs. a sbatchd thing.  I could be wrong though.  Like I said, easy enough to test.

    Larry

    ------------------------------
    Larry Adams
    ------------------------------



  • 9.  RE: LSF bsub -Q maximal number of automatic requeue

    Posted 26 days ago
    Just tested, that did not go over too well:

    [lsfadmin@vmhost6 configdir]$ bhist -l 101 | grep "Running with" | wc -l
    80
    ​

    But as soon as I set the max value in the queue, the job went suspended.  I would suggest you talk to the LSF admin and suggest they add that setting as it will prevent the requeue loop for other users.



    ------------------------------
    Larry Adams
    ------------------------------



  • 10.  RE: LSF bsub -Q maximal number of automatic requeue

    Posted 24 days ago

    HI @Larry Adams,

    Alright thanks a lot for your help,

    Best,




    ------------------------------
    Romain Bouquet
    ------------------------------