I regularly get questions like "How do I setup a simple data science environment on IBM Power?", "How can I use OpenCE/RocketCE libraries?", "How can I make use of IBM Power-optimized libraries?", and "How do I get started with AI on IBM Power easily?". Time to put the answer to (e)paper - because it is really not so hard. Just follow this step-by-step guide for a quick start. Enjoy trying it out and leave comments if you have some additional tips and tricks!
Step-by-Step Guide
- Spawn a Linux LPAR. You can freely choose the OS (RHEL, Ubuntu, ...). I typically use the latest RHEL.
- SSH into your LPAR.
- Install
micromamba
(note there's an interactive configuration script you have to run through):
dnf install bzip2 libxcrypt-compat vim -y
"${SHELL}" <(curl -L micro.mamba.pm/install.sh)
echo "micromamba activate base" >> ${HOME}/.bashrc
source ${HOME}/.bashrc
- Configure
micromamba
to use RocketCE:
cat > ~/.condarc <<'EOF'
# Conda configuration see https://conda.io/projects/conda/en/latest/configuration.html
auto_update_conda: false
show_channel_urls: true
channel_priority: flexible
channels:
- rocketce
- defaults
EOF
-
Install Power-optimized Python packages (e.g., Python, PyTorch, ...):
micromamba install --yes python=3.10 pytorch-cpu mamba conda pip
- Give it a try:
python -c 'import torch; print(torch.__version__)'
Caution: Adding Anaconda's defaults
channel to above configuration requires an Anaconda licence if you use it in a commercial context.
Tip: Using conda-forge
Some packages might be unavailable via default and RocketCE Conda channels. In this case, you can try installing packages via the conda-forge
channel. Be aware that the conda-forge
channel includes community-build packages; whereas the default
and rocketce
channels provide enterprise-grade builds and support.
- Install a packages from
conda-forge
(e.g., accelerate):
mamba install --yes 'conda-forge::accelerate'
Tip: Add pip to the mix
Some packages still might be unavailable via Conda channels. In those cases, I recommend trying to install those packages via pip
(try to minimize using pip
where possible as to keep your Conda environment clean and well-managed). I typically preconfigure pip
to use pre-build Python wheels from a repository by Power champion Marvin Gießing who precompiled some useful wheels, which speeds up package installations:
- Optional: configure pip with Marvin's repos (recommended for rapid testing):
mkdir ~/.pip && \
echo "[global]" >> ~/.pip/pip.conf && \
echo "extra-index-url = https://repo.fury.io/mgiessing" >> ~/.pip/pip.conf
- Install pre-requisites from conda channels (in this example, these are needed for
librosa
):
mamba install --yes 'conda-forge::msgpack-python' 'conda-forge::soxr-python'
- Install packages:
pip install --prefer-binary \
"librosa" \
"openai-whisper"
Tip: Use JupyterLab as IDE
JupyterLab is a great IDE for rapid prototyping and often comes in handy:
- Install Jupyter Lab:
mamba install --yes jupyterlab
mkdir notebooks
- Start JupyterLab:
nohup jupyter lab --notebook-dir=${HOME}/notebooks --ip=0.0.0.0 --no-browser --allow-root --port=8888 --NotebookApp.allow_origin='*' --NotebookApp.token='' --NotebookApp.password='' &
- Connect to JupyterLab:
http(s)://<server>:8888/lab
Technologies Used
- logical partition (LPAR): a virtualized separate computer from a subset of a computer's hardware resources, similar but more efficient than a virtual machine.
- RHEL: Red Hat Enterprise Linux, an enterprise-grade linux distribution.
- micromamba: for managing and resolving Python packages; using Mamba is typically more efficient than the alternative Conda (it's implemented in C++ and way faster than Conda...).
- OpenCE: an open-source community for build scripts of 300+ Python packages for AI (such as PyTorch, TensorFlow, and even Python itself), ensuring packages are compatible and optimized for IBM Power where possible.
- RocketCE: a free enterprise-grade distribution of OpenCE; commercial support options are available.
- JupyterLab: a web-based IDE for (Python) notebooks, code, and data.