Power Data and AI

Power Data and AI

IBM Power systems provide a robust and scalable platform for a wide range of data and AI workloads, offering benefits in performance, security, and ease of use.


#Servers
#Artificialintelligence
#Power
#APIeconomy
#Power


#Power
#TechXchangeConferenceLab
 View Only

How to run a quantized MPT Model on POWER10 Using ONNXRuntime

By RAJALAKSHMI SRINIVASARAGHAVAN posted Tue February 13, 2024 11:42 AM

  

ONNX MPT on POWER10

 
This blog details the steps required to run inferencing with ONNX Runtime on POWER10 systems using an MPT model.
 

Prerequisites

 
This blog assumes the user already has conda installed.
 

Environment Setup

 
Create a new conda environment.
 
conda create --name your-env-name-here python=3.11
 
This will create a new environment and install python version 3.11 and its required dependencies.
 
Activate the newly created environment.
 
conda activate your-env-name-here
 
Once the environment is active, install onnxruntime, and needed dependencies.
 
conda install onnxruntime -c rocketce
 
conda install onnx -c conda-forge
 
conda install pyarrow -c rocketce
 
conda install libopenblas pytorch-cpu -c rocketce
 
pip install transformers==4.36.1
 
pip install optimum[onnxruntime]==1.16.2
 
pip install chardet
 
When using the conda install command with the -c argument, packages will attempt be installed from a specified channel. Packages installed via the rocketce channel will have MMA optimizations.
 

Project Setup

 
Navigate to a desired project directory and export the model to onnx format.
 
optimum-cli export onnx --model mosaicml/mpt-7b mpt-7b_onnx
 
Create a new python script with any text editor or IDE (vi, vim, nano, vscode, etc…) and paste the following code. (llm.py)
 
from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForCausalLM
 
# Load model and tokenizer
model_id = "/home/user/mpt/mpt-onnx-32/mpt-7b_onnx"
 
tokenizer = AutoTokenizer.from_pretrained(model_id, padding_side="left")
tokenizer.pad_token = tokenizer.eos_token
model = ORTModelForCausalLM.from_pretrained(model_id, use_cache=True, use_merged=False, use_io_binding=False)
 
# Sample texts
#texts = ["The weather today will be"]
texts = ["The best football player in the world is ", "My next vacation is", "The Belgian national football team ", "Once there was a man ", "The weather today will be ", "The best football player in the world is ", "My next vacation is", "The Belgian national football team ", "When I was in  school", " My first day of office", " I am scared of", "Life is a dream", "What is credit by exam?", "weather prediction for December","Once there was a man ", "The weather today will be ", "The best football player in the world is ", "My next vacation is", "The Belgian national football team ", "Once there was a man ", "The weather today will be ", "The best football player in the world is ", "My next vacation is", "The Belgian national football team ", "When I was in  school", " My first day of office", " I am scared of", "Life is a dream", "What is credit by exam?", "weather prediction for December","Once there was a man ", "The weather today will be ", "The best football player in the world is ", "My next vacation is", "The Belgian national football team ", "Once there was a man ", "The weather today will be ", "The best football player in the world is ", "My next vacation is", "The Belgian national football team ", "When I was in  school", " My first day of office", " I am scared of", "Life is a dream", "What is credit by exam?", "weather prediction for December","The Belgian national football team ", "Time just flies"]
 
# Tokenize input
inp = tokenizer(texts, padding=True, return_tensors='pt')
 
# Run model
res = model.generate(**inp, do_sample=True, max_new_tokens=50, temperature=0.7, top_k=40, top_p=0.95)
 
# Decode results
out = tokenizer.batch_decode(res, skip_special_tokens=True)
 
for text in out:
  print("---------")
  print(text)
Be sure to change the `model_id` path to the location of the downloaded model.
 

Execution

 
Once the script is complete, run the model and view the results.
 
python3 llm.py
 

Conclusion

 The above sample script shows how to convert a LLM hugging face model into onnx format and run it on POWER10. Batch sizes can be changed by modifying “texts” list in the above python script.

0 comments
46 views

Permalink