IBM Spectrum Computing Group

Expand all | Collapse all

[Question] ngpus_excl_p in LSF

  • 1.  [Question] ngpus_excl_p in LSF

    Posted Mon June 22, 2020 04:46 PM
    Hi,
    I am new to the user group if this is not the appropriate place to ask questions please direct me to the correct section in this community.

    ngpus_excl_p is the Number of GPUs in exclusive process mode.
    when we use this configuration can multiple users processes from different users share a single GPU? 


    ------------------------------
    Abhishek Malvankar
    ------------------------------


  • 2.  RE: [Question] ngpus_excl_p in LSF

    Posted Mon June 22, 2020 05:16 PM
    Abhishek,

    The only status that you can share a GPU with other users is to use the GPU option:

    bsub -gpu "num=x:mode=shared" ./a.out

    You can also reserve memory on the GPU, but it's not enforced.  Then it becomes important to have the memory reservation align with what your code is doing.  If you put two user processes on the same GPU without a proper reservation, your jobs will crash.

    ------------------------------
    Larry Adams
    ------------------------------



  • 3.  RE: [Question] ngpus_excl_p in LSF

    Posted Mon June 22, 2020 06:28 PM
    https://www.ibm.com/support/knowledgecenter/SSWRJV_10.1.0/lsf_gpu/lsf_gpu_config.html
    Doc link of LSF GPU job submission

    ------------------------------
    YI SUN
    ------------------------------



  • 4.  RE: [Question] ngpus_excl_p in LSF

    Posted Mon June 22, 2020 05:26 PM
    Thanks. 

    Assuming a user might not use GPU to the fullest and hence the I was thinking about a scenario where we place multiple processes on the same GPU. 

    Placement of multiple processes on the same GPU is done through MPS I believe, does LSF make sure to start MPS according to the configuration requested?

    ------------------------------
    Abhishek Malvankar
    ------------------------------



  • 5.  RE: [Question] ngpus_excl_p in LSF

    Posted Mon June 22, 2020 05:40 PM
    To my knowledge, the mps server is for a single user and not multiple users, but I could and have been known to be wrong in the past.  The Knowledge Center is a good place to start.

    ------------------------------
    Larry Adams
    ------------------------------



  • 6.  RE: [Question] ngpus_excl_p in LSF

    Posted Mon June 22, 2020 05:42 PM

    Please read the following:


    https://docs.nvidia.com/deploy/mps/index.html



    ------------------------------
    Larry Adams
    ------------------------------