ONNX MPT on POWER10
This blog details the steps required to run inferencing with ONNX Runtime on POWER10 systems using an MPT model.
Prerequisites
This blog assumes the user already has conda installed.
Environment Setup
Create a new conda environment.
conda create --name your-env-name-here python=3.11
This will create a new environment and install python version 3.11 and its required dependencies.
Activate the newly created environment.
conda activate your-env-name-here
Once the environment is active, install onnxruntime, and needed dependencies.
conda install onnxruntime -c rocketce
conda install onnx -c conda-forge
conda install pyarrow -c rocketce
conda install libopenblas pytorch-cpu -c rocketce
pip install transformers==4.36.1
pip install optimum[onnxruntime]==1.16.2
pip install chardet
When using the conda install command with the -c argument, packages will attempt be installed from a specified channel. Packages installed via the rocketce channel will have MMA optimizations.
Project Setup
Navigate to a desired project directory and export the model to onnx format.
optimum-cli export onnx --model mosaicml/mpt-7b mpt-7b_onnx
Create a new python script with any text editor or IDE (vi, vim, nano, vscode, etc…) and paste the following code. (llm.py)
from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForCausalLM
# Load model and tokenizer
model_id = "/home/user/mpt/mpt-onnx-32/mpt-7b_onnx"
tokenizer = AutoTokenizer.from_pretrained(model_id, padding_side="left")
tokenizer.pad_token = tokenizer.eos_token
model = ORTModelForCausalLM.from_pretrained(model_id, use_cache=True, use_merged=False, use_io_binding=False)
# Sample texts
#texts = ["The weather today will be"]
texts = ["The best football player in the world is ", "My next vacation is", "The Belgian national football team ", "Once there was a man ", "The weather today will be ", "The best football player in the world is ", "My next vacation is", "The Belgian national football team ", "When I was in school", " My first day of office", " I am scared of", "Life is a dream", "What is credit by exam?", "weather prediction for December","Once there was a man ", "The weather today will be ", "The best football player in the world is ", "My next vacation is", "The Belgian national football team ", "Once there was a man ", "The weather today will be ", "The best football player in the world is ", "My next vacation is", "The Belgian national football team ", "When I was in school", " My first day of office", " I am scared of", "Life is a dream", "What is credit by exam?", "weather prediction for December","Once there was a man ", "The weather today will be ", "The best football player in the world is ", "My next vacation is", "The Belgian national football team ", "Once there was a man ", "The weather today will be ", "The best football player in the world is ", "My next vacation is", "The Belgian national football team ", "When I was in school", " My first day of office", " I am scared of", "Life is a dream", "What is credit by exam?", "weather prediction for December","The Belgian national football team ", "Time just flies"]
# Tokenize input
inp = tokenizer(texts, padding=True, return_tensors='pt')
# Run model
res = model.generate(**inp, do_sample=True, max_new_tokens=50, temperature=0.7, top_k=40, top_p=0.95)
# Decode results
out = tokenizer.batch_decode(res, skip_special_tokens=True)
for text in out:
print("---------")
print(text)
Be sure to change the `model_id` path to the location of the downloaded model.
Execution
Once the script is complete, run the model and view the results.
python3 llm.py
Conclusion
The above sample script shows how to convert a LLM hugging face model into onnx format and run it on POWER10. Batch sizes can be changed by modifying “texts” list in the above python script.