High Performance Computing

High Performance Computing Group

Connect with HPC subject matter experts and discuss how hybrid cloud HPC Solutions from IBM meet today's business needs.

 View Only

Compiling OpenMPI with IBM Spectrum LSF in a Docker container image

By John Welch posted Mon November 26, 2018 03:39 PM

  
Since 2016 LSF 10.1 has been providing deep container integration, which has made it easier to build and maintain environments running containerized workloads.  Additionally, LSF integrates with most MPI implementations, including Open MPI, via an adaptable and scalable distributed application framework.  LSF job submissions are extremely flexible including affinity, topology directives and other resources requirements which are integrated with Open MPI.  This blog will explorer compiling Open MPI with LSF inside an existing Docker container. More specifically, you will learn how to extending an existing Docker container and compile Open MPI to work with LSF.  Beyond this example, the same basic steps can be applied to other container images such as the NVIDIA Deep Learning Tensorflow container images.

Prerequisites

This blog assumes you have installed IBM Spectrum LSF and Docker and both are up and running on nodes in your cluster. To start with you will need the following:
Component Version Edition
IBM Spectrum LSF 10.1.0.8+ Standard Edition or Suite
Docker Community or Enterprise Edition
Docker Engine 1.12+
Verify your Docker Engine version with this command:
$ docker version | grep API

Build a new Docker container with Open MPI compile with LSF

Login as a user with the ability to run docker commands. The steps below assume your working directory (pwd) will remain the same through out the steps below.

Prepare minimal LSF files for Open MPI compile

The goal is to prepare the minimal files from your LSF environment necessary to compile Open MPI with LSF inside a Docker container. Copy the script below and paste into a file called mktmplsf.sh. This script will generate a directory called "lsf" with the LSF libraries, include files and configuration file. The files in the "lsf" directory will be used in the next step.

Get a copy of mktmplsf.sh from github.

Here are the steps to run the script and see the directories and files created:
$ chmod +x mktmplsf.sh
$ ./mktmplsf.sh
$ ls -R lsf
lsf:
10.1 conf

lsf/10.1:
include lib

lsf/10.1/include:
lsf

lsf/10.1/include/lsf:
lsbatch.h lsf.h

lsf/10.1/lib:
libbat.a libbat.so liblsf.a liblsf.so

lsf/conf:
lsf.conf
$

Create a Dockerfile

Copy the text below and paste into a file called Dockerfile.
FROM ubuntu:bionic

# Update and install missing packages
# Note, the missing packages are specific for this Ubuntu image and will be different packages
# for other images.
RUN export DEBIAN_FRONTEND=noninteractive && apt-get update && \
apt-get install -y tzdata wget git lsb-release tk8.6 debhelper chrpath tcl tcl8.5 flex gfortran dpatch libgfortran3 automake bison m4 autoconf tk autotools-dev graphviz net-tools iproute2 && \
rm -rf /var/lib/apt/lists/*

SHELL ["/bin/bash", "-c"]

# Copy in minimal LSF components
ADD lsf /tmp/lsf

# Setup the temporary LSF paths
ENV LSF_ENVDIR /tmp/lsf/conf
ENV LSF_LIBDIR /tmp/lsf/10.1/lib

ARG OMPI_VER="4.0.3"
# Compile openmpi with LSF
RUN mkdir /tmp/openmpi && \
cd /tmp/openmpi && \
wget https://www.open-mpi.org/software/ompi/v4.0/downloads/openmpi-$OMPI_VER.tar.gz && \
tar zxf openmpi-$OMPI_VER.tar.gz && \
cd openmpi-$OMPI_VER && \
./configure --prefix=/usr/local/mpi --enable-orterun-prefix-by-default --disable-getpwuid --with-lsf && \
make -j $(nproc) all && \
make install && \
ldconfig && \
rm -rf /tmp/openmpi

# Compile openmpi with LSF
# Additional parameter to consider although the appropriate packages must be in the original container or added to this container
# for CUDA use --with-cuda
# for OpenFabrics i.e. Mellanox use --with-verbs
#

# Note, when running an LSF job in a container, your LSF_LIBDIR will be set according to your LSF installation
ENV LD_LIBRARY_PATH $LD_LIBRARY_PATH:$LSF_LIBDIR

RUN ldconfig

RUN apt-get update

# Install hello_world as a test app
RUN mkdir /tmp/hello-world
WORKDIR /tmp/hello-world
ENV PATH /usr/local/mpi/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
RUN git clone https://github.com/wesleykendall/mpitutorial && \
cd mpitutorial/tutorials/mpi-hello-world/code && \
make && \
cp /tmp/hello-world/mpitutorial/tutorials/mpi-hello-world/code/mpi_hello_world /usr/local/bin


# Clean up
RUN rm -rf /tmp/lsf
RUN rm -rf /tmp/hello-world
RUN cat /dev/null > /usr/local/mpi/etc/openmpi-mca-params.conf

WORKDIR /

Build a new Docker container

Use the command below to create a new container with Open MPI compile with LSF. It will take many minutes to perform all the steps to create the new container image, which will be called "ubuntu:bionic-openmpi-lsf". Note, both the Dockerfile file and lsf directory should be in your current working directory.

docker build -t ubuntu:bionic-openmpi-lsf .
Sending build context to Docker daemon 92.24MB
Step 1/19 : FROM ubuntu:bionic
---> 172193eb06e7



Step 19/19 : WORKDIR /
---> 6125eafb9b1b
Removing intermediate container 73c379285294
Successfully built 6125eafb9b1b
Now, run docker images and your new container image should be there unless the command above failed.
$ docker images | grep bionic
ubuntu bionic-openmpi-lsf 6125eafb9b1b 8 minutes ago 771 MB
docker.io/ubuntu bionic 172193eb06e7 4 months ago 106 MB
$

You can repeat the above docker build process on every Docker enabled compute node in your LSF cluster or you can use other methods such as publishing the container to your internal Docker Registry or use the docker save and the docker load commands.

Setting up LSF with Docker

1). Prepare IBM Spectrum LSF to run jobs in Docker container by following these steps: LSF docker integration instruction.

2). Configure LSF Docker Application profile for the new Docker container image by adding the following lines (while changing the LSF_SERVERDIR to your specific $LSF_SERVERDIR directory location) to the end of lsb.applications file (and then run badmin reconfig or badmin mbdrestart on the LSF Master):

Begin Application
NAME = test
DESCRIPTION = Example Test for OpenMPI
CONTAINER = docker[image(ubuntu:bionic-openmpi-lsf) \
   options(--rm --net=host --ipc=host \
     -v /etc/passwd:/etc/passwd \
     -v /etc/group:/etc/group \
   ) starter(root) ]
EXEC_DRIVER = context[user(lsfadmin)] \
   starter[LSF_SERVERDIR/docker-starter.py] \
   controller[LSF_SERVERDIR/docker-control.py] \
   monitor[LSF_SERVERDIR/docker-monitor.py]
End Application

Testing the new container with LSF

$ bsub -app test -Is echo "should be inside a container"
Job <19773> is submitted to default queue <interactive>.
<<Waiting for dispatch ...>>
<<Starting on ac922c>>

should be inside a container
$

 

Testing the new container with MPI Hello World

Make sure MPI hello world is working as expected before attempting to run other MPI based applications.

Example of running MPI hello world on a single node with 1 message

$ bsub -app test -I mpirun /usr/local/bin/mpi_hello_world
Job <19774> is submitted to default queue <interactive>.
<<Waiting for dispatch …>>
<<Starting on ac922c>>
Hello world from processor ac922c, rank 0 out of 1 processors
$

Example of running MPI hello world on a single node with 2 job slots or 2 messages.

$ bsub -app test -I -n 2 -R "span[hosts=1]" mpirun /usr/local/bin/mpi_hello_world
Job <19775> is submitted to default queue <interactive>.
<<Waiting for dispatch …>>
<<Starting on ac922c>>
Hello world from processor ac922c, rank 0 out of 2 processors
Hello world from processor ac922c, rank 1 out of 2 processors
$

Example of running MPI hello world on 2 nodes with 1 message per node.

$ bsub -app test -I -n 2 -R "span[ptile=1]" mpirun /usr/local/bin/mpi_hello_world
Job <19776> is submitted to default queue <interactive>.
<<Waiting for dispatch …>>
<<Starting on ac922c>>
Hello world from processor ac922c, rank 0 out of 2 processors
Hello world from processor ac922b, rank 1 out of 2 processors
$

Conclusion

Now, you have a new container image that is ready to run MPI jobs submitted to LSF. 


#SpectrumComputingGroup
0 comments
53 views

Permalink