Data and AI on Power

 View Only

InstructLab on IBM Power10 Servers

By Marvin Gießing posted Fri May 24, 2024 04:02 AM

  

InstructLab on IBM Power Servers

Introduction

InstructLab, an open-source project developed by IBM and Red Hat, enhances large language models (LLMs) for generative AI applications. It offers a cost-effective solution for aligning LLMs with business needs, allowing those with minimal machine learning experience to contribute. InstructLab's innovative approach uses fewer resources by leveraging synthetic data generation and iterative tuning, enabling continuous improvement from community contributions. Notably, InstructLab can be run on IBM Power machines, providing powerful infrastructure for regular model retraining and enhancing AI capabilities.[1]

In this blog post, we'll focus on serving Large Language Models using InstructLab and exploring the ilab chat & ilab serve capabilities.

IBMers & Business Partners can actually try this out themselves by reserving a Power10 instance on TechZone!

--> Reserve Power10 LPAR on TechZone!

For the best performance use 16vCPUs, 32GB RAM & RHEL9.x. Additional storage is not required.

Run on IBM Power

To run InstructLab on an IBM Power System we have different options:

  • Running 'bare-metal' on a Linux LPAR (tested with RHEL 8.x & 9.x)
  • Run within a container

This blog post focus on the latter one, as a container has already been provided & compiled with all the required libraries & frameworks optimized for the Power10 Hardware and thus making use of hardware acceleration for vector operations (VSX/Altivec) for encoding & decoding as well as matrix operations (MMA) through the OpenBLAS library (just for prompt processing).

The backend InstructLab uses is llama.cpp, particularly the Python binding llama-cpp-python.

Step 1: Setup

Log in to you IBM Power10 LPAR using your terminal and the provided credentials and install podman:

ssh cecuser@<IP>

# Install podman if not available
sudo dnf install podman wget jq -y

In TechZone we don't have many options on configuring the LPAR to our needs, but what we can do is controlling the threads (SMT => Simultaneous Multithreading). AI workload performs best with SMT=2, so we set that:

sudo ppc64_cpu --smt=2

#Check if everything is correctly applied
ppc64_cpu --smt

If you have access to the Hardware Management Console (HMC), ensure an ideal NUMA setup to maximize performance. For detailed guidance, refer to the excellent blog of Dr. Sebastian Lehrig: Sizing for AI.

Step 2: Download IBM Granite 7b model

Next, we'll download a sample model to our system and mount the model into our container to avoid redownloading it after the container is stopped :) In our case we go with the IBM Granite-7b-lab model, already converted to the GGUF format:

mkdir -p ${HOME}/models
wget https://huggingface.co/instructlab/granite-7b-lab-GGUF/resolve/main/granite-7b-lab-Q4_K_M.gguf?download=true -O ${HOME}/models/granite-7b-lab-Q4_K_M.gguf

Step 3: Run InstructLab in chat mode

Now, let's start the ilab instance and use the interactive chat mode:

export MODEL=${HOME}/models/granite-7b-lab-Q4_K_M.gguf

podman run -ti --rm \
    -p 8000:8000 \
    -v ${MODEL}:/opt/models/granite-7b-lab-Q4_K_M.gguf \
    quay.io/mgiessing/ilab

ilab init --non-interactive --model-path /opt/models/granite-7b-lab-Q4_K_M.gguf

ilab chat

╭───────────────────────────────────────────────────────── system ──────────────────────────────────────────────────────────╮
│ Welcome to InstructLab Chat w/ GRANITE-7B-LAB-Q4_K_M (type /h for help)                                                   │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
>>> INFO 2024-05-21 18:21:31,845 server.py:206 Starting server process, press CTRL+C to shutdown server...       [S][default]
INFO 2024-05-21 18:21:31,845 server.py:207 After application startup complete see http://127.0.0.1:57595/docs for API.
>>>                                                                                                              [S][default]
>>> Write a python program calculating nth fibonacci using recursion!                                                                                       [S][default]
╭─────────────────────────────────────────────────────────────────────── granite-7b-lab-Q4_K_M ────────────────────────────────────────────────────────────────────────╮
│                                                                                                                                                                      │
│ Certainly! Here's an example Python program that calculates the nth Fibonacci number recursively:                                                                    │
│                                                                                                                                                                      │
│ ```python                                                                                                                                                            │
│ def fib_recursive(n):                                                                                                                                                │
│     """                                                                                                                                                              │
│     Calculate the nth Fibonacci number recursively.                                                                                                                  │
│                                                                                                                                                                      │
│     :param n: The position of the Fibonacci number in the sequence.                                                                                                  │
│     :return: The nth Fibonacci number.                                                                                                                               │
│     """                                                                                                                                                              │
│     if n <= 0:                                                                                                                                                       │
│         raise ValueError("Input should be a positive integer.")                                                                                                      │
│     elif n == 1:                                                                                                                                                     │
│         return 0                                                                                                                                                     │
│     elif n == 2:                                                                                                                                                     │
│         return 1                                                                                                                                                     │
│     else:                                                                                                                                                            │
│         return fib_recursive(n - 1) + fib_recursive(n - 2)                                                                                                           │
│                                                                                                                                                                      │
│ # Example usage:                                                                                                                                                     │
│ n = 10                                                                                                                                                               │
│ print(f"The {n}th Fibonacci number is: {fib_recursive(n)}")                                                                                                          │
│ ```                                                                                                                                                                  │
│                                                                                                                                                                      │
│ In this program, the `fib_recursive` function takes an integer `n` as input and returns the nth Fibonacci number. The function first checks if `n` is less than or   │
│ equal to 0, in which case it raises a `ValueError`. If `n` is 1 or 2, it returns 0 or 1, respectively. Otherwise, it calculates the sum of the (n-1)th and (n-2)th   │
│ Fibonacci numbers, recursively calls itself with `n - 1`, and stores the result in a variable. The function then returns the calculated Fibonacci number.            │
│                                                                                                                                                                      │
│ The example usage at the end of the program demonstrates how to use the `fib_recursive` function to calculate the 10th Fibonacci number.                             │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── elapsed 29.977 seconds ─╯

You can stop the interactive chat using Ctrl+D.

Step 4: Run Instructlab in serve mode

Another option is to create an endpoint to use in other applications. To do this, we'll start an OpenAI-compatible web server using the ilab serve command. First, ensure the model can be accessed from outside the container by setting the host_port variable in our config.yaml to 0.0.0.0. Use the oneliner below, then run the ilab serve command:

sed -i "s/127\.0\.0\.1/0\.0\.0\.0/g" config.yaml
ilab serve

Now, we can access our model using the IP of our LPAR with tools like cURL, Python requests, or the Python OpenAI library, as we have an OpenAI-compatible web server.

  • Using cURL on the same system in a different terminal session:
export MODEL="granite-7b-lab-Q4_K_M.gguf"
export IP="localhost"

curl http://${IP}:8000/v1/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer 'EMPTY'" \
-d "{\"model\": \"${MODEL}\", \"prompt\": \"Why is the sky blue?\", \"temperature\": 0, \"max_tokens\": 100}" | jq

## Response should look like this
{
  "id": "cmpl-7d3b38cd-d9f6-4918-8689-dadcd9b99485",
  "object": "text_completion",
  "created": 1716325732,
  "model": "granite-7b-lab-Q4_K_M.gguf",
  "choices": [
    {
      "text": "\nThe sky appears blue due to a phenomenon called Rayleigh scattering. When sunlight reaches Earth's atmosphere, it is scattered in all directions by molecules and particles. Shorter-wavelength light, such as blue and violet light, is scattered more than longer-wavelength light, like red, orange, and yellow. However, our eyes are more sensitive to blue light and less sensitive to violet light. Additionally, sunlight reaches us with a greater",
      "index": 0,
      "logprobs": null,
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 6,
    "completion_tokens": 100,
    "total_tokens": 106
  }
}
  • Using Python and the openai library from our local laptop

Create a file called ilab_stream.py (make sure you change to IP accordingly):

from openai import OpenAI
import os

IP="129.40.95.113" # IP of your TechZone LPAR
client = OpenAI(
    api_key="EMPTY",
    base_url=f"http://{IP}:8000/v1",
)

# send a ChatCompletion request to count to 100
response = client.chat.completions.create(
    model='granite-7b-lab-Q4_K_M.gguf',
    messages=[
        {'role': 'user', 'content': 'Why is the sky blue?'}
    ],
    temperature=0,
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="", flush=True)

Install the OpenAI client and run the program

(watsonx) mgiessing@Marvins-MBP ~ % pip3 install -U openai
(watsonx) mgiessing@Marvins-MBP ~ % python3 ilab_stream.py 

The blue color of the sky is primarily due to a phenomenon called Rayleigh scattering. When sunlight, which is made up of different colors, encounters molecules in Earth's atmosphere (such as nitrogen and oxygen), it causes the light to scatter in various directions. Shorter wavelengths (blue and violet light) are scattered more than longer wavelengths (red, orange, and yellow light). However, our eyes are more sensitive to blue light and less sensitive to violet light, and sunlight reaches us more abundantly in the blue part of the spectrum. As a result, we perceive the sky as blue during the daytime.

At sunrise and sunset, the sunlight has to pass through a greater thickness of the Earth's atmosphere, causing the shorter blue and violet wavelengths to scatter even more, allowing the longer wavelengths like red, orange, and yellow to reach our eyes and make the sky appear redder or orange. This fascinating display of color is a result of the interaction between sunlight, Earth's atmosphere, and our perception.

Congratulations - you're now an InstructLab expert for serving models!

In the next blog we'll have a look on how to fine-tune a model with new knowledge added to taxonomy :)


[1] https://www.redhat.com/en/topics/ai/what-is-instructlab

Permalink