Power Data and AI

IBM Power systems provide a robust and scalable platform for a wide range of data and AI workloads, offering benefits in performance, security, and ease of use.

#Servers
#Artificialintelligence
#Power
#APIeconomy
#Power

#Power
#TechXchangeConferenceLab

View Only

Back to Blog List

How to run a quantized MPT Model on POWER10 Using ONNXRuntime

By RAJALAKSHMI SRINIVASARAGHAVAN posted Tue February 13, 2024 11:42 AM

ONNX MPT on POWER10

This blog details the steps required to run inferencing with ONNX Runtime on POWER10 systems using an MPT model.

Prerequisites

This blog assumes the user already has conda installed.

Environment Setup

Create a new conda environment.

conda create --name your-env-name-here python=3.11

This will create a new environment and install python version 3.11 and its required dependencies.

Activate the newly created environment.

conda activate your-env-name-here

Once the environment is active, install onnxruntime, and needed dependencies.

conda install onnxruntime -c rocketce

conda install onnx -c conda-forge

conda install pyarrow -c rocketce

conda install libopenblas pytorch-cpu -c rocketce

pip install transformers==4.36.1

pip install optimum[onnxruntime]==1.16.2

pip install chardet

When using the conda install command with the -c argument, packages will attempt be installed from a specified channel. Packages installed via the rocketce channel will have MMA optimizations.

Project Setup

Navigate to a desired project directory and export the model to onnx format.

optimum-cli export onnx --model mosaicml/mpt-7b mpt-7b_onnx

Create a new python script with any text editor or IDE (vi, vim, nano, vscode, etc…) and paste the following code. (llm.py)

from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForCausalLM
 
# Load model and tokenizer
model_id = "/home/user/mpt/mpt-onnx-32/mpt-7b_onnx"
 
tokenizer = AutoTokenizer.from_pretrained(model_id, padding_side="left")
tokenizer.pad_token = tokenizer.eos_token
model = ORTModelForCausalLM.from_pretrained(model_id, use_cache=True, use_merged=False, use_io_binding=False)
 
# Sample texts
#texts = ["The weather today will be"]
texts = ["The best football player in the world is ", "My next vacation is", "The Belgian national football team ", "Once there was a man ", "The weather today will be ", "The best football player in the world is ", "My next vacation is", "The Belgian national football team ", "When I was in  school", " My first day of office", " I am scared of", "Life is a dream", "What is credit by exam?", "weather prediction for December","Once there was a man ", "The weather today will be ", "The best football player in the world is ", "My next vacation is", "The Belgian national football team ", "Once there was a man ", "The weather today will be ", "The best football player in the world is ", "My next vacation is", "The Belgian national football team ", "When I was in  school", " My first day of office", " I am scared of", "Life is a dream", "What is credit by exam?", "weather prediction for December","Once there was a man ", "The weather today will be ", "The best football player in the world is ", "My next vacation is", "The Belgian national football team ", "Once there was a man ", "The weather today will be ", "The best football player in the world is ", "My next vacation is", "The Belgian national football team ", "When I was in  school", " My first day of office", " I am scared of", "Life is a dream", "What is credit by exam?", "weather prediction for December","The Belgian national football team ", "Time just flies"]
 
# Tokenize input
inp = tokenizer(texts, padding=True, return_tensors='pt')
 
# Run model
res = model.generate(**inp, do_sample=True, max_new_tokens=50, temperature=0.7, top_k=40, top_p=0.95)
 
# Decode results
out = tokenizer.batch_decode(res, skip_special_tokens=True)
 
for text in out:
  print("---------")
  print(text)

Be sure to change the `model_id` path to the location of the downloaded model.

Execution

Once the script is complete, run the model and view the results.

python3 llm.py

Conclusion

The above sample script shows how to convert a LLM hugging face model into onnx format and run it on POWER10. Batch sizes can be changed by modifying “texts” list in the above python script.

0 comments

46 views

Permalink

https://community.ibm.com/community/user/blogs/rajalakshmi-srinivasaraghavan/2024/02/13/how-to-run-a-huggingface-mpt-model-on-power10-usin

Power Data and AI

Power Data and AI

How to run a quantized MPT Model on POWER10 Using ONNXRuntime

By RAJALAKSHMI SRINIVASARAGHAVAN posted Tue February 13, 2024 11:42 AM

ONNX MPT on POWER10

Prerequisites

Environment Setup

Project Setup

Execution

Conclusion

The above sample script shows how to convert a LLM hugging face model into onnx format and run it on POWER10. Batch sizes can be changed by modifying “texts” list in the above python script.

Permalink

Additional
Resources

Office

Quick Links

Power Data and AI

Power Data and AI

How to run a quantized MPT Model on POWER10 Using ONNXRuntime

By RAJALAKSHMI SRINIVASARAGHAVAN posted Tue February 13, 2024 11:42 AM

ONNX MPT on POWER10

Prerequisites

Environment Setup

Project Setup

Execution

Conclusion

The above sample script shows how to convert a LLM hugging face model into onnx format and run it on POWER10. Batch sizes can be changed by modifying “texts” list in the above python script.

Permalink

Additional Resources

Office

Quick Links

Additional
Resources