This blog details the steps required to run inferencing with ONNX Runtime on IBM Power10 systems using a BERT model. Bert stands for Bidirectional Encoder Representations from Transformers and is a natural language processing model that is pre-trained on vast amounts of text data. It learns contextualized representations of words by considering both left and right contexts simultaneously, enabling it to capture rich semantic information and achieve impressive results across a wide range of language understanding tasks.
Prerequisites
This blog assumes the user already has conda installed. Utilize the following blog post by Sebastian Lehrig to get conda setup on power if needed.
Environment Setup
Create a new conda environment.
conda create --name your-env-name-here python=3.11
This will create a new environment and install python version 3.11 and its required dependencies.
Activate the newly created environment.
conda activate your-env-name-here
Once the environment is active, install openblas, onnxruntime, and their dependencies.
conda install libopenblas -c rocketce
conda install onnxruntime -c rocketce
conda install transformers -c rocketce
When using the conda install command with the -c argument, packages will attempt be installed from a specified channel. Packages installed via the rocketce channel will have MMA optimizations.
Project Setup
Navigate to a desired project directory and download the model from the ONNX Model Zoo.
wget https://github.com/onnx/models/raw/main/validated/text/machine_comprehension/bert-squad/model/bertsquad-12.onnx
The specific model being used is the BERT-Squad model with opset version 12 from the validated models section of the model zoo.
Create a new python script inside the project directory.
touch onnx_bert.py
Open the python script with any text editor or IDE (vi, vim, nano, vscode, etc…) and paste the following code.
import onnxruntime
import argparse
import numpy as np
from transformers import BertTokenizer, logging
##
# Answer a question based off of a given context.
# Input: Context string and question string
# Output: Answer to the given question
# Example: Context: 'Albany is the capital of New York.'
# Question: 'What is the capital of New York?'
# Answer: 'Albany'
##
def answerQuestion(context, question, debug):
# Get a tokenizer and set the model name
logging.set_verbosity_error()
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
onnx_model = 'bertsquad-12.onnx'
max_seq_length = 256
# Tokenize input
input = tokenizer(context, question)
# Ensure input does not exceed max_seq_length
if len(input['input_ids']) > max_seq_length:
seq_length = len(input['input_ids'])
print(f'Tokenized input of length {seq_length} exceeds maximum sequence length. Please shorten context and question input.')
exit()
# Zero-pad up to the maximum sequence length.
while len(input['input_ids']) < max_seq_length:
input['input_ids'].append(0)
input['attention_mask'].append(0)
input['token_type_ids'].append(0)
# Organize data
# Onnx model contains an extra input (unique_ids_raw_output___9:0) not found in other BERT models, we must handle this
data = {
"unique_ids_raw_output___9:0": np.array([0], dtype=np.int64),
"input_ids:0" : [input["input_ids"]],
"input_mask:0" : [input["attention_mask"]],
"segment_ids:0" : [input["token_type_ids"]]
}
# Create onnx runtime inference session and load the model
sess = onnxruntime.InferenceSession(onnx_model)
# Run the session and capture the output
result = sess.run(["unstack:0", "unstack:1"], data)
if debug:
print(f'Raw result tensor: {result}')
print(f'Answer found from index {np.argmax(result[0])} to index {np.argmax(result[1]) + 1}')
# Extract answer
answer = tokenizer.decode(data["input_ids:0"][0][np.argmax(result[0]):np.argmax(result[1]) + 1])
print(f'Answer: {answer}')
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('-c', '--context', help='Context to be provided to the model', required=True)
parser.add_argument('-q', '--question', help='Question to ask the model based on the context', required=True)
parser.add_argument('-d', '--debug', help='Enable debug mode', required=False, action='store_true')
args = parser.parse_args()
answerQuestion(args.context, args.question, args.debug)
The script parameters should be used as follows.
- -c/--context is a required parameter and should be followed by a sequence/string. This parameter will be taken as the context and should provide background information to the model.
- -q/--question is a required parameter and should be followed by another sequence/string. This parameter will be taken as the question that the model will attempt to answer based on the information extracted from the context.
- -d/--debug is an optional parameter that can be passed to enable debug mode. With debug mode enabled, there will be additional logging of intermediate results.
Execution
Once the script is complete, run the model and view the results.
python3 onnx_bert.py -c ‘Albany is the capital of New York.' -q ‘What is the capital of New York?’ -d