Power Data and AI

Power Data and AI

IBM Power systems provide a robust and scalable platform for a wide range of data and AI workloads, offering benefits in performance, security, and ease of use.


#Servers
#Artificialintelligence
#Power
#APIeconomy
#Power


#Power
#TechXchangeConferenceLab
 View Only

How to Run a BERT Model on IBM Power10 Using OnnxRuntime

By Daniel Schenker posted Fri February 16, 2024 03:55 PM

  

This blog details the steps required to run inferencing with ONNX Runtime on IBM Power10 systems using a BERT model. Bert stands for Bidirectional Encoder Representations from Transformers and is a natural language processing model that is pre-trained on vast amounts of text data. It learns contextualized representations of words by considering both left and right contexts simultaneously, enabling it to capture rich semantic information and achieve impressive results across a wide range of language understanding tasks.

Prerequisites

This blog assumes the user already has conda installed. Utilize the following blog post by Sebastian Lehrig to get conda setup on power if needed.

Environment Setup

Create a new conda environment.

conda create --name your-env-name-here python=3.11

This will create a new environment and install python version 3.11 and its required dependencies.

Activate the newly created environment.

conda activate your-env-name-here

Once the environment is active, install openblas, onnxruntime, and their dependencies.

conda install libopenblas -c rocketce

conda install onnxruntime -c rocketce

conda install transformers -c rocketce

When using the conda install command with the -c argument, packages will attempt be installed from a specified channel. Packages installed via the rocketce channel will have MMA optimizations.

Project Setup

Navigate to a desired project directory and download the model from the ONNX Model Zoo.

wget https://github.com/onnx/models/raw/main/validated/text/machine_comprehension/bert-squad/model/bertsquad-12.onnx

The specific model being used is the BERT-Squad model with opset version 12 from the validated models section of the model zoo.

Create a new python script inside the project directory.

touch onnx_bert.py

Open the python script with any text editor or IDE (vi, vim, nano, vscode, etc…) and paste the following code.

import onnxruntime
import argparse
import numpy as np
from transformers import BertTokenizer, logging

##
# Answer a question based off of a given context.
# Input: Context string and question string
# Output: Answer to the given question
# Example: Context: 'Albany is the capital of New York.'
#          Question: 'What is the capital of New York?'
#          Answer: 'Albany'
##
def answerQuestion(context, question, debug):
    # Get a tokenizer and set the model name
    logging.set_verbosity_error()
    tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
    onnx_model = 'bertsquad-12.onnx'
    max_seq_length = 256

    # Tokenize input
    input = tokenizer(context, question)

    # Ensure input does not exceed max_seq_length
    if len(input['input_ids']) > max_seq_length:
        seq_length = len(input['input_ids'])
        print(f'Tokenized input of length {seq_length} exceeds maximum sequence length. Please shorten context and question input.')
        exit()

    # Zero-pad up to the maximum sequence length.
    while len(input['input_ids']) < max_seq_length:
        input['input_ids'].append(0)
        input['attention_mask'].append(0)
        input['token_type_ids'].append(0)

    # Organize data
    # Onnx model contains an extra input (unique_ids_raw_output___9:0) not found in other BERT models, we must handle this
    data = {
        "unique_ids_raw_output___9:0": np.array([0], dtype=np.int64),
        "input_ids:0" : [input["input_ids"]],
        "input_mask:0" : [input["attention_mask"]],
        "segment_ids:0" : [input["token_type_ids"]]
    }

    # Create onnx runtime inference session and load the model
    sess = onnxruntime.InferenceSession(onnx_model)

    # Run the session and capture the output
    result = sess.run(["unstack:0", "unstack:1"], data)

    if debug:
        print(f'Raw result tensor: {result}')
        print(f'Answer found from index {np.argmax(result[0])} to index {np.argmax(result[1]) + 1}')

    # Extract answer
    answer = tokenizer.decode(data["input_ids:0"][0][np.argmax(result[0]):np.argmax(result[1]) + 1])
    print(f'Answer: {answer}')

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('-c', '--context', help='Context to be provided to the model', required=True)
    parser.add_argument('-q', '--question', help='Question to ask the model based on the context', required=True)
    parser.add_argument('-d', '--debug', help='Enable debug mode', required=False, action='store_true')
    args = parser.parse_args()

    answerQuestion(args.context, args.question, args.debug)

The script parameters should be used as follows.

  • -c/--context is a required parameter and should be followed by a sequence/string. This parameter will be taken as the context and should provide background information to the model.
  • -q/--question is a required parameter and should be followed by another sequence/string. This parameter will be taken as the question that the model will attempt to answer based on the information extracted from the context.
  • -d/--debug is an optional parameter that can be passed to enable debug mode. With debug mode enabled, there will be additional logging of intermediate results.

Execution

Once the script is complete, run the model and view the results.

 python3 onnx_bert.py -c ‘Albany is the capital of New York.' -q ‘What is the capital of New York?’ -d

  • output: Albany
0 comments
17 views

Permalink