Data and AI on Power

IBM Power systems provide a robust and scalable platform for a wide range of data and AI workloads, offering benefits in performance, security, and ease of use.

View Only

Back to Blog List

Install and use RocketCE in a Linux LPAR

By Sebastian Lehrig posted Fri February 09, 2024 05:33 AM

I regularly get questions like "How do I setup a simple data science environment on IBM Power?", "How can I use OpenCE/RocketCE libraries?", "How can I make use of IBM Power-optimized libraries?", and "How do I get started with AI on IBM Power easily?". Time to put the answer to (e)paper - because it is really not so hard. Just follow this step-by-step guide for a quick start. Enjoy trying it out and leave comments if you have some additional tips and tricks!

Step-by-Step Guide

Spawn a Linux LPAR. You can freely choose the OS (RHEL, Ubuntu, ...). I typically use the latest RHEL.
SSH into your LPAR.

Install micromamba (note there's an interactive configuration script you have to run through):

dnf install bzip2 libxcrypt-compat vim -y
"${SHELL}" <(curl -L micro.mamba.pm/install.sh)
echo "micromamba activate base" >> ${HOME}/.bashrc
source ${HOME}/.bashrc

Configure micromamba to use RocketCE:

cat > ~/.condarc <<'EOF'
# Conda configuration see https://conda.io/projects/conda/en/latest/configuration.html
auto_update_conda: false
show_channel_urls: true
channel_priority: flexible
channels:
  - rocketce
  - defaults
EOF

Install Power-optimized Python packages (e.g., Python, PyTorch, ...):
```
micromamba install --yes python=3.10 pytorch-cpu mamba conda pip
```

Give it a try:

python -c 'import torch; print(torch.__version__)'

Caution: Adding Anaconda's defaults channel to above configuration requires an Anaconda licence if you use it in a commercial context.

Tip: Using conda-forge

Some packages might be unavailable via default and RocketCE Conda channels. In this case, you can try installing packages via the conda-forge channel. Be aware that the conda-forge channel includes community-build packages; whereas the default and rocketce channels provide enterprise-grade builds and support.

Install a packages from conda-forge (e.g., accelerate):
```
mamba install --yes 'conda-forge::accelerate'
```

Tip: Add pip to the mix

Some packages still might be unavailable via Conda channels. In those cases, I recommend trying to install those packages via pip (try to minimize using pip where possible as to keep your Conda environment clean and well-managed). I typically preconfigure pip to use pre-build Python wheels from a repository by Power champion Marvin Gießing who precompiled some useful wheels, which speeds up package installations:

Optional: configure pip with Marvin's repos (recommended for rapid testing):

mkdir ~/.pip && \
echo "[global]" >> ~/.pip/pip.conf && \
echo "extra-index-url = https://repo.fury.io/mgiessing" >> ~/.pip/pip.conf

Install pre-requisites from conda channels (in this example, these are needed for librosa):
```
mamba install --yes 'conda-forge::msgpack-python' 'conda-forge::soxr-python'
```

Install packages:

pip install --prefer-binary \
    "librosa" \
    "openai-whisper"

Tip: Use JupyterLab as IDE

JupyterLab is a great IDE for rapid prototyping and often comes in handy:

Install Jupyter Lab:

mamba install --yes jupyterlab
mkdir notebooks

Start JupyterLab:

nohup jupyter lab --notebook-dir=${HOME}/notebooks --ip=0.0.0.0 --no-browser --allow-root --port=8888 --NotebookApp.allow_origin='*' --NotebookApp.token='' --NotebookApp.password='' &

Connect to JupyterLab:
```
http(s)://<server>:8888/lab
```

Technologies Used

logical partition (LPAR): a virtualized separate computer from a subset of a computer's hardware resources, similar but more efficient than a virtual machine.
RHEL: Red Hat Enterprise Linux, an enterprise-grade linux distribution.
micromamba: for managing and resolving Python packages; using Mamba is typically more efficient than the alternative Conda (it's implemented in C++ and way faster than Conda...).
OpenCE: an open-source community for build scripts of 300+ Python packages for AI (such as PyTorch, TensorFlow, and even Python itself), ensuring packages are compatible and optimized for IBM Power where possible.
RocketCE: a free enterprise-grade distribution of OpenCE; commercial support options are available.
JupyterLab: a web-based IDE for (Python) notebooks, code, and data.

5 comments

135 views

Permalink

https://community.ibm.com/community/user/blogs/sebastian-lehrig/2024/02/08/rocketce

Comments

Sebastian Lehrig

Tue March 26, 2024 01:39 PM

For sizing & configuration, I have just published a dedicated blog: https://community.ibm.com/community/user/powerdeveloper/blogs/sebastian-lehrig/2024/03/26/sizing-for-ai

Sebastian Lehrig

Mon March 25, 2024 06:25 AM

@K_G: This will be sufficient for POCs/demos. I have often deployed my GenAI demos in such environments that allow to showcase already a lot (e.g., using Llama2 and Mistral models on Power10): https://github.com/lehrig/genai-on-ibm-power-demos/

Kevin Gee

Fri March 22, 2024 01:25 PM

@Sebastian - I have two 24-way S1024 servers running a mix of things for customer POCs and customer-facing demos. We've got a bunch of RHEL, AIX, Rocky Linux, and an OCP cluster. I can probably devote 6 right now. 1TB of memory available.

Sebastian Lehrig

Mon March 04, 2024 08:27 AM

@K G: For AI workloads, my main recommendation is to use a NUMA setup, as to optimize memory<->core bandwidth:

Confirm the P10 module (e.g., a 2x12 core DCM and hence 6 cores per chip)
Setup an LPAR that allocates the max. number of cores available on the chip (so if you have 12 cores on the socket with a dual-chip module, allocate 6 dedicated cores to the LPAR). This LPAR then corresponds to a so-called „NUMA node”. Configure the LPAR as dedicated (and not shared) via the HMC. Enable Power Mode in HMC (for full frequency exploitation). Set SMT to 2 (but eventually try experimenting with 4 and 8). I have also seen the recommendation of using NUMA distance=10 but have not experimented with it. Would be great if you cross-check and document how you executed those steps.
(Re)start the machine while ensuring that the LPAR from 2. is started the first; other LPARs should follow later (VIO does not seem to cause conflicts here). This will ensure that it is allocating only cores from a single chip.
Test via lscpu (or numactl --show) whether it worked. Ideally you have only 1 NUMA node - only NUMA node0 - with its assigned CPUs.

Given that, the best option for cores would be 12 or 15 core SCMs (E1080), a 24 core DCM (E1050/S1024/L1024) is the second best option, followed by 20 core DCM (S1022/L1022).

Memory just needs to be sufficient for what you want to do. If you are planning to work with LLMs, I've seen demands around 80 GB for a 20B parameter model. So I often size LPARs for 256 GB. However, it is important to populate all slots with DIMMs for maximizing memory<->core bandwidth (so rather use a few smaller DIMMs than a single big one).

1 TB disk space will typically easily suffice.

Hope that helps, I should probably write a blog on that better sooner than later :)

Edit: done - https://community.ibm.com/community/user/powerdeveloper/blogs/sebastian-lehrig/2024/03/26/sizing-for-ai

Kevin Gee

Sat March 02, 2024 12:47 AM

For grins... how big did you make your Linux VM? Core/memory/disk space... any other considerations for swap and file system config?

Data and AI on Power

Data and AI on Power

Install and use RocketCE in a Linux LPAR

By Sebastian Lehrig posted Fri February 09, 2024 05:33 AM

Step-by-Step Guide

Tip: Using conda-forge

Tip: Add pip to the mix

Tip: Use JupyterLab as IDE

Technologies Used

Permalink

Comments

Additional
Resources

Office

Quick Links

Data and AI on Power

Data and AI on Power

Install and use RocketCE in a Linux LPAR

By Sebastian Lehrig posted Fri February 09, 2024 05:33 AM

Step-by-Step Guide

Tip: Using conda-forge

Tip: Add pip to the mix

Tip: Use JupyterLab as IDE

Technologies Used

Permalink

Comments

Additional Resources

Office

Quick Links

Additional
Resources