InstructLab on IBM Power Servers
Introduction
InstructLab, an open-source project developed by IBM and Red Hat, enhances large language models (LLMs) for generative AI applications. It offers a cost-effective solution for aligning LLMs with business needs, allowing those with minimal machine learning experience to contribute. InstructLab's innovative approach uses fewer resources by leveraging synthetic data generation and iterative tuning, enabling continuous improvement from community contributions. Notably, InstructLab can be run on IBM Power machines, providing powerful infrastructure for regular model retraining and enhancing AI capabilities.[1]
In this blog post, we'll focus on serving Large Language Models using InstructLab and exploring the ilab chat
& ilab serve
capabilities.
IBMers & Business Partners can actually try this out themselves by reserving a Power10 instance on TechZone!
--> Reserve Power10 LPAR on TechZone!
For the best performance use 16vCPUs, 32GB RAM & RHEL9.x. Additional storage is not required.
Run on IBM Power
To run InstructLab on an IBM Power System we have different options:
- Running 'bare-metal' on a Linux LPAR (tested with RHEL 8.x & 9.x)
- Run within a container
This blog post focus on the latter one, as a container has already been provided & compiled with all the required libraries & frameworks optimized for the Power10 Hardware and thus making use of hardware acceleration for vector operations (VSX/Altivec) for encoding & decoding as well as matrix operations (MMA) through the OpenBLAS library (just for prompt processing).
The backend InstructLab uses is llama.cpp, particularly the Python binding llama-cpp-python.
Step 1: Setup
Log in to you IBM Power10 LPAR using your terminal and the provided credentials and install podman:
ssh cecuser@<IP>
# Install podman if not available
sudo dnf install podman wget jq -y
In TechZone we don't have many options on configuring the LPAR to our needs, but what we can do is controlling the threads (SMT => Simultaneous Multithreading). AI workload performs best with SMT=2, so we set that:
sudo ppc64_cpu
#Check if everything is correctly applied
ppc64_cpu
If you have access to the Hardware Management Console (HMC), ensure an ideal NUMA setup to maximize performance. For detailed guidance, refer to the excellent blog of Dr. Sebastian Lehrig: Sizing for AI.
Step 2: Download IBM Granite 7b model
Next, we'll download a sample model to our system and mount the model into our container to avoid redownloading it after the container is stopped :) In our case we go with the IBM Granite-7b-lab model, already converted to the GGUF format:
mkdir -p ${HOME}/models
wget https://huggingface.co/instructlab/granite-7b-lab-GGUF/resolve/main/granite-7b-lab-Q4_K_M.gguf?download=true -O ${HOME}/models/granite-7b-lab-Q4_K_M.gguf
Step 3: Run InstructLab in chat mode
Now, let's start the ilab instance and use the interactive chat mode:
export MODEL=${HOME}/models/granite-7b-lab-Q4_K_M.gguf
podman run -ti --rm \
-p 8000:8000 \
-v ${MODEL}:/opt/models/granite-7b-lab-Q4_K_M.gguf \
quay.io/mgiessing/ilab
ilab init --non-interactive --model-path /opt/models/granite-7b-lab-Q4_K_M.gguf
ilab chat
╭───────────────────────────────────────────────────────── system ──────────────────────────────────────────────────────────╮
│ Welcome to InstructLab Chat w/ GRANITE-7B-LAB-Q4_K_M (type /h for help) │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
>>> INFO 2024-05-21 18:21:31,845 server.py:206 Starting server process, press CTRL+C to shutdown server... [S][default]
INFO 2024-05-21 18:21:31,845 server.py:207 After application startup complete see http:
>>> [S][default]
>>> Write a python program calculating nth fibonacci using recursion! [S][default]
╭─────────────────────────────────────────────────────────────────────── granite-7b-lab-Q4_K_M ────────────────────────────────────────────────────────────────────────╮
│ │
│ Certainly! Here's an example Python program that calculates the nth Fibonacci number recursively: │
│ │
│ ```python │
│ def fib_recursive(n): │
│ """ │
│ Calculate the nth Fibonacci number recursively. │
│ │
│ :param n: The position of the Fibonacci number in the sequence. │
│ :return: The nth Fibonacci number. │
│ """ │
│ if n <= 0: │
│ raise ValueError("Input should be a positive integer.") │
│ elif n == 1: │
│ return 0 │
│ elif n == 2: │
│ return 1 │
│ else: │
│ return fib_recursive(n - 1) + fib_recursive(n - 2) │
│ │
│ # Example usage: │
│ n = 10 │
│ print(f"The {n}th Fibonacci number is: {fib_recursive(n)}") │
│ ``` │
│ │
│ In this program, the `fib_recursive` function takes an integer `n` as input and returns the nth Fibonacci number. The function first checks if `n` is less than or │
│ equal to 0, in which case it raises a `ValueError`. If `n` is 1 or 2, it returns 0 or 1, respectively. Otherwise, it calculates the sum of the (n-1)th and (n-2)th │
│ Fibonacci numbers, recursively calls itself with `n - 1`, and stores the result in a variable. The function then returns the calculated Fibonacci number. │
│ │
│ The example usage at the end of the program demonstrates how to use the `fib_recursive` function to calculate the 10th Fibonacci number. │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── elapsed 29.977 seconds ─╯
You can stop the interactive chat using Ctrl+D
.
Step 4: Run Instructlab in serve mode
Another option is to create an endpoint to use in other applications. To do this, we'll start an OpenAI-compatible web server using the ilab serve command. First, ensure the model can be accessed from outside the container by setting the host_port
variable in our config.yaml to 0.0.0.0
. Use the oneliner below, then run the ilab serve
command:
sed -i "s/127\.0\.0\.1/0\.0\.0\.0/g" config.yaml
ilab serve
Now, we can access our model using the IP of our LPAR with tools like cURL, Python requests, or the Python OpenAI library, as we have an OpenAI-compatible web server.
- Using cURL on the same system in a different terminal session:
export MODEL=
export IP=
curl http://${IP}:8000/v1/completions \
-H \
-H \
-d model\${MODEL}\prompt\Why is the sky blue?\temperature\max_tokens\ | jq
## Response should look like this
{
: ,
: ,
: 1716325732,
: ,
: [
{
: ,
: 0,
: null,
:
}
],
: {
: 6,
: 100,
: 106
}
}
- Using Python and the openai library from our local laptop
Create a file called ilab_stream.py
(make sure you change to IP accordingly):
from openai import OpenAI
import os
IP="129.40.95.113"
client = OpenAI(
api_key="EMPTY",
base_url=f"http://{IP}:8000/v1",
)
response = client.chat.completions.create(
model='granite-7b-lab-Q4_K_M.gguf',
messages=[
{'role': 'user', 'content': 'Why is the sky blue?'}
],
temperature=0,
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.content, end="", flush=True)
Install the OpenAI client and run the program
(watsonx) mgiessing@Marvins-MBP ~ % pip3 install -U openai
(watsonx) mgiessing@Marvins-MBP ~ % python3 ilab_stream.py
The blue color of the sky is primarily due to a phenomenon called Rayleigh scattering. When sunlight, which is made up of different colors, encounters molecules in Earth's atmosphere (such as nitrogen and oxygen), it causes the light to scatter in various directions. Shorter wavelengths (blue and violet light) are scattered more than longer wavelengths (red, orange, and yellow light). However, our eyes are more sensitive to blue light and less sensitive to violet light, and sunlight reaches us more abundantly in the blue part of the spectrum. As a result, we perceive the sky as blue during the daytime.
At sunrise and sunset, the sunlight has to pass through a greater thickness of the Earth's atmosphere, causing the shorter blue and violet wavelengths to scatter even more, allowing the longer wavelengths like red, orange, and yellow to reach our eyes and make the sky appear redder or orange. This fascinating display of color is a result of the interaction between sunlight, Earth's atmosphere, and our perception.
Congratulations - you're now an InstructLab expert for serving models!
In the next blog we'll have a look on how to fine-tune a model with new knowledge added to taxonomy :)
[1] https://www.redhat.com/en/topics/ai/what-is-instructlab