High Performance Computing Group

 View Only
Expand all | Collapse all

LSF bsub -Q maximal number of automatic requeue

  • 1.  LSF bsub -Q maximal number of automatic requeue

    Posted Mon August 23, 2021 09:40 AM

    Hi all,

    I would like to know how to set the maximal number of automatic requeue for a job when using bsub -Q.  
    I saw here [link] that "Specifying a job-level exit value using bsub -Q overrides all MAX_JOB_REQUEUE settings."  

    I am using the following command to submit jobs

    bsub -Q "all ~0" MyCommand

    It there a way to set that MAX_JOB_REQUEUE to 2 for example for a specific job ?

    Many thanks in advance for your help!
    Best,



    ------------------------------
    Romain Bouquet
    ------------------------------

    #SpectrumComputingGroup


  • 2.  RE: LSF bsub -Q maximal number of automatic requeue

    Posted Mon August 23, 2021 11:02 AM
    As the document says, you also can configure MAX_JOB_REQUEUE in lsb.applications for an application profile, e.g.
    Begin Application
    NAME =  myrequeujob
    DESCRIPTION = only requeue two times
    MAX_JOB_REQUEUE = 2
    End Application

    Then submit your job like this, bsub -Q "all ~0" -app myrequeuejob <myjob>

    ------------------------------
    YI SUN
    ------------------------------



  • 3.  RE: LSF bsub -Q maximal number of automatic requeue

    Posted Tue August 24, 2021 05:32 AM
    Edited by System Fri January 20, 2023 04:35 PM

    Hi @YI SUN,

    Thanks for your reply, I had a look but it seems to me that one has to be administrator to create an application profile [link].
    Unfortunately I am just a regular user.

    Would there be a solution to create an application profile being a regular user ?

    Thanks again,
    Best



    ------------------------------
    Romain Bouquet
    ------------------------------



  • 4.  RE: LSF bsub -Q maximal number of automatic requeue

    Posted Tue August 24, 2021 06:20 PM
    It doesn't seem LSF has job level control on this. You will have to ask your admin to add this for you.


    ------------------------------
    YI SUN
    ------------------------------



  • 5.  RE: LSF bsub -Q maximal number of automatic requeue

    Posted Thu August 26, 2021 03:21 AM

    Hi @YI SUN,

    Alright it is unfortunate but thanks a lot for your help!

    Best,​​



    ------------------------------
    Romain Bouquet
    ------------------------------



  • 6.  RE: LSF bsub -Q maximal number of automatic requeue

    Posted Thu August 26, 2021 11:30 AM
    Roman,

    It's simple to do.  Here is a section from the man pages for lsb.queues

    MAX_JOB_REQUEUE

    Specifies the maximum number of times to requeue a job automati‐
    cally.

    Syntax
    MAX_JOB_REQUEUE=integer

    Valid values
    0 < MAX_JOB_REQUEUE < INFINIT_INT

    INFINIT_INT is defined in lsf.h.

    Default
    Not defined. The number of requeue times is unlimited

    ------------------------------
    Larry Adams
    ------------------------------



  • 7.  RE: LSF bsub -Q maximal number of automatic requeue

    Posted Fri August 27, 2021 03:43 AM
    Edited by System Fri January 20, 2023 04:08 PM

    Hi @Larry Adams,

    Maybe there is something I do not understand,
    according to this link it seems to me one has to be administrator to change the lsb.queues file. Am I wrong?
    ​Sadly I am just a regular user.

    Would there be a way to use MAX_JOB_REQUEUE for a regular user?
    For instance something like (or similar syntax to retry jobs only twice)

    export MAX_JOB_REQUEUE=2
    bsub -Q "all ~0" <myjob>

    So that the LSF/bsub command takes into account the value MAX_JOB_REQUEUE ?

    Thanks in advance,

    Best,



    ------------------------------
    Romain Bouquet
    ------------------------------



  • 8.  RE: LSF bsub -Q maximal number of automatic requeue

    Posted Fri August 27, 2021 12:32 PM
    Edited by System Fri January 20, 2023 04:26 PM
    Romain,

    That would be easy enough to test.  Just have the script exit.  Make the script command be "exit 1".  That should cause a re-queue loop if the setting is taken.  I know that several variables can be re-read from the environment, but I'm not too certain about this one as it's more of a mbatchd things vs. a sbatchd thing.  I could be wrong though.  Like I said, easy enough to test.

    Larry

    ------------------------------
    Larry Adams
    ------------------------------



  • 9.  RE: LSF bsub -Q maximal number of automatic requeue

    Posted Fri August 27, 2021 12:38 PM
    Just tested, that did not go over too well:

    [lsfadmin@vmhost6 configdir]$ bhist -l 101 | grep "Running with" | wc -l
    80
    ​

    But as soon as I set the max value in the queue, the job went suspended.  I would suggest you talk to the LSF admin and suggest they add that setting as it will prevent the requeue loop for other users.



    ------------------------------
    Larry Adams
    ------------------------------



  • 10.  RE: LSF bsub -Q maximal number of automatic requeue

    Posted Mon August 30, 2021 06:06 AM

    HI @Larry Adams,

    Alright thanks a lot for your help,

    Best,




    ------------------------------
    Romain Bouquet
    ------------------------------