IBM Spectrum Computing Group

Expand all | Collapse all

LSF with PowerAI docker from dockerhub

  • 1.  LSF with PowerAI docker from dockerhub

    Posted Wed February 26, 2020 10:28 AM

    Hi,

    I can do this without issues.
    $docker run -ti --env LICENSE=yes ibmcom/powerai:1.7.0-snap-ml-ubuntu18.04-py36-x86_64 bash

    I been trying to get this working via LSF. 

    I followed the steps and configured docker in lsb.applications

    Begin Application
    NAME = powerai
    DESCRIPTION = Example PowerAI application
    CONTAINER = docker[image(docker.io/ibmcom/powerai:1.7.0-snap-ml-ubuntu18.04-py36-x86_64) \
    options(--rm --net=host --ipc=host --env LICENSE=yes \
    -v MLDL_TOP:MLDL_TOP \
    -v /opt/mldl:/opt/mldl \
    /opt/mldl/scripts/dockerPasswd.sh \
    ) starter(root) ]
    EXEC_DRIVER = context[user(gilbert)] \
    starter[/opt/ibm/lsf/10.1/linux3.10-glibc2.17-x86_64/etc/docker-starter.py] \
    controller[/opt/ibm/lsf/10.1/linux3.10-glibc2.17-x86_64/etc/docker-control.py] \
    monitor[/opt/ibm/lsf/10.1/linux3.10-glibc2.17-x86_64/etc/docker-monitor.py]
    End Application

    Also set up LSF as specified in here - https://www.ibm.com/support/knowledgecenter/SSWRJV_10.1.0/lsf_docker/lsf_docker_prepare.html

    But when I try to submit a job with

    $bsub -Is -app powerai bash

    (base) gilbert@gilbert-aa:/opt/mldl/scripts$ bjobs -d -l 418

    Job <418>, User <gilbert>, Project <default>, Application <powerai>, Status <EX
    IT>, Queue <interactive>, Interactive pseudo-terminal shel
    l mode, Command <bash>, Share group charged </gilbert>
    Wed Feb 26 23:02:30: Submitted from host <gilbert-aa>, CWD </opt/mldl/scripts>;
    Wed Feb 26 23:02:30: Started 1 Task(s) on Host(s) <gilbert-aa>, Allocated 1 Slo
    t(s) on Host(s) <gilbert-aa>;
    Wed Feb 26 23:02:30: Exited with exit code 127. The CPU time used is 0.0 second
    s.
    Wed Feb 26 23:02:30: Completed <exit>.


    SCHEDULING PARAMETERS:
    r15s r1m r15m ut pg io ls it tmp swp mem
    loadSched - - - - - - - - - - -
    loadStop - - - - - - - - - - -

    RESOURCE REQUIREMENT DETAILS:
    Combined: select[(defined(docker)) && (type == any)] order[r15s:pg]
    Effective: select[(defined(docker)) && (type == any)] order[r15s:pg]

    What am I doing wrong? 

    gilbert is also the lsf admin. 



    ------------------------------
    GILBERT THOMAS
    ------------------------------


  • 2.  RE: LSF with PowerAI docker from dockerhub

    Posted Wed February 26, 2020 12:38 PM
    Be sure to perform the setup step related to the EXEC_DRIVER at this URL:

    https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_config_ref/lsb.queues.5.html#lsb_queues_exec_driver

    Try the docker run as lsfadmin

     docker run -ti docker.io/ibmcom/centos:latest bash

    1) lsfadmin must be able to run Docker commands on each machine with Docker. Consult the Docker documentation on Managing Docker as a non-root user at the following website:

     https://docs.docker.com/install/linux/linux-postinstall/

    2) check and make sure the docker scripts in $LSF_SERVERDIR are owned by lsfadmin had have permission 700 or 500:

     $LSF_SERVERDIR/docker*.py



    ------------------------------
    John Welch
    ------------------------------



  • 3.  RE: LSF with PowerAI docker from dockerhub

    Posted Thu February 27, 2020 01:25 PM
    You need an "@" in front of script.
    @/opt/mldl/scripts/dockerPasswd.sh \
    Also, you can try to just mount /etc/passwd and /etc/group in container.  
     
    For example
     
    -v /etc/passwd:/etc/passwd \
    -v /etc/group:/etc/group \

    For more details on the subject, please see this article/blog:

    https://community.ibm.com/community/user/imwuc/blogs/john-welch/2018/11/26/adding-docker-container-options-to-an-ibm-spectrum
     
    John


    ------------------------------
    John Welch
    ------------------------------