Data and AI on Power

 View Only

How to Run a BERT Model on IBM Power10 Using PyTorch

By Daniel Schenker posted Fri February 09, 2024 04:32 PM

  

This blog details the steps required to run inferencing with PyTorch on IBM Power10 systems using various bert models. Bert stands for Bidirectional Encoder Representations from Transformers and is a natural language processing model that is pre-trained on vast amounts of text data. It learns contextualized representations of words by considering both left and right contexts simultaneously, enabling it to capture rich semantic information and achieve impressive results across a wide range of language understanding tasks.

Prerequisites

This blog assumes the user already has conda installed. Utilize the following blog post by Sebastian Lehrig to get conda setup on power if needed.

Environment Setup

Create a new conda environment.

conda create --name your-env-name-here python=3.11

This will create a new environment and install python version 3.11 and its required dependencies.

Activate the newly created environment.

conda activate your-env-name-here

Once the environment is active, install all required libraries and their dependencies.

pip install tqdm boto3 requests regex sentencepiece sacremoses chardet

conda install libopenblas -c rocketce

conda install pytorch-cpu -c rocketce

conda install huggingface_hub -c rocketce

conda install transformers -c rocketce

When using the conda install command with the -c argument, packages will attempt be installed from a specified channel. Packages installed via the rocketce channel will have MMA optimizations.

Project Setup

This section is broken up into various subsections based on what the specific BERT model is meant to do. The first section provides a base and modified BERT model for masked token prediction. The second section provides a base and modified BERT model for sentence paraphrase detection. The third section provides a base and modified BERT model for question answering.

The initial steps for all sections will be the same. Start by navigating to a desired project directory and create a new python script inside the directory. Each of the following code snippets was put together using the PyTorch Transformers page as a guide.

BERT For Masked Token Prediction

Paste the following code into the newly created python script.

import torch

# Get a tokenizer to encode inputs
tokenizer = torch.hub.load('huggingface/pytorch-transformers', 'tokenizer', 'bert-base-cased')

# Sample input sequences
text_1 = "Who was Jim Henson ?"
text_2 = "Jim Henson was a puppeteer"

# Tokenized input with special tokens around it (for BERT: [CLS] at the beginning and [SEP] at the end)
indexed_tokens = tokenizer.encode(text_1, text_2, add_special_tokens=True)

# Define sentence A and B indices associated to 1st and 2nd sentences (see paper)
segments_ids = [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1]

# Convert inputs to PyTorch tensors
segments_tensors = torch.tensor([segments_ids])

# Mask a token that we will try to predict back with `BertForMaskedLM`
masked_index = 8
indexed_tokens[masked_index] = tokenizer.mask_token_id
tokens_tensor = torch.tensor([indexed_tokens])

# Load the pre-trained model
masked_lm_model = torch.hub.load('huggingface/pytorch-transformers', 'modelForMaskedLM', 'bert-base-cased')

# Run the model in inference mode
with torch.no_grad():
    predictions = masked_lm_model(tokens_tensor, token_type_ids=segments_tensors)

# Get the predicted token
predicted_index = torch.argmax(predictions[0][0], dim=1)[masked_index].item()
predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0]

print(predicted_token, predicted_index)This script was put together using the PyTorch Bert

The script above is the basic implementation of BERT for masked token prediction. It contains hard coded input sequences and a set masked token index. The script will mask the token ‘Jim’ in text_2 for the model to predict back. The following python script is a modified version of the basic script that allows the user to input any sequence with the mask on any token for the model to attempt to predict back.

import torch
import argparse
from transformers import BertTokenizer, logging

##
# Predict a masked token in a sequence
# Input: A string/sequence containing exactly one '[MASK]' token to be predicted
# Ouput: The prediction of the masked token
# Example: '[MASK] is the capital of France.' -> Paris
##
def predictMaskedToken(sequence, debug):
    # Verify user input
    if(not verifyInput(sequence)):
        print(f'Invalid input sequence. Ensure the input sequence is a string and contains exactly one instance of the mask token ([MASK]). Example: \\'[MASK] is the capital of France.\\'')
        exit()

    # Get a pre-trained BERT tokenizer and model
    logging.set_verbosity_error()
    tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
    masked_lm_model = torch.hub.load('huggingface/pytorch-transformers', 'modelForMaskedLM', 'bert-base-cased')
    
    # Generate model input from sequence using tokenizer
    input = tokenizer(sequence)
    # Save encoded version of sequence from input dictionary
    encoded_sequence = input["input_ids"]
    # Save ids of sequence from the input dictionary
    sequence_ids = input['token_type_ids']
    # Convert inputs to PyTorch tensors
    tokens_tensor = torch.tensor([encoded_sequence])
    segments_tensors = torch.tensor([sequence_ids])
    
    # Verbose debugging output
    if debug:
        print(f'Raw Sequence: {sequence}')
        tokenized_sequence = tokenizer.tokenize(sequence)
        print(f'Tokenized sequence: {tokenized_sequence}')
        print(f'Encoded Sequence: {encoded_sequence}')
        decoded_sequence = tokenizer.decode(encoded_sequence)
        print(f'Decoded Sequence: {decoded_sequence}')
        print(f'Sequence Ids: {sequence_ids}')
        print(f'Masked Index: {encoded_sequence.index(103)}')

    # Run the model
    with torch.no_grad():
        predictions = masked_lm_model(tokens_tensor, token_type_ids=segments_tensors)

    # Get the predicted token
    predicted_index = torch.argmax(predictions[0][0], dim=1)[encoded_sequence.index(103)].item()
    predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0]

    print(predicted_token)

def verifyInput(sequence):
    if type(sequence) is not str:
        return False
    return sequence.count('[MASK]') == 1

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('-s', '--sequence', help='Sequence with one [MASK] token to predict', required=True)
    parser.add_argument('-d', '--debug', help='Enable debug mode', required=False, action='store_true')
    args = parser.parse_args()

    predictMaskedToken(args.sequence, args.debug)

The script parameters should be used as follows.

  • -s/--sequence is a required parameter and should be followed by a sequence/string with exactly one instance of the [MASK] token. The bert model will attempt to predict the word that is masked by this token.
  • -d/--debug is an optional parameter that can be passed to enable debug mode. With debug mode enabled, there will be additional logging of intermediate results such as the tokenized and encoded input.
  • example: python3 bert_masked_token.py -s ‘[MASK] is the capital of France.’ -d
    • output: Paris

BERT For Paraphrase Detection

Paste the following code into the newly created python script.

import torch

# Load the tokenizer and the model
sequence_classification_model = torch.hub.load('huggingface/pytorch-transformers', 'modelForSequenceClassification', 'bert-base-cased-finetuned-mrpc')
sequence_classification_tokenizer = torch.hub.load('huggingface/pytorch-transformers', 'tokenizer', 'bert-base-cased-finetuned-mrpc')

# Sample input 
text_1 = "Jim Henson was a puppeteer"
text_2 = "Who was Jim Henson ?"

# Tokenized input with special tokens around it (for BERT: [CLS] at the beginning and [SEP] at the end)
indexed_tokens = sequence_classification_tokenizer.encode(text_1, text_2, add_special_tokens=True)
# Define sentence A and B indices associated to 1st and 2nd sentences (see paper)
segments_ids = [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1]
# Convert inputs to PyTorch tensors
segments_tensors = torch.tensor([segments_ids])
tokens_tensor = torch.tensor([indexed_tokens])

# Predict the sequence classification logits
with torch.no_grad():
    seq_classif_logits = sequence_classification_model(tokens_tensor, token_type_ids=segments_tensors)

# Get predicition with highest confidence
predicted_labels = torch.argmax(seq_classif_logits[0]).item()

# In MRPC dataset label 0 means the two sentences are not paraphrasing each other
if(predicted_labels == 0):
    print('Sequences are not paraphrasing each other.')
else:
    print('Sequences are paraphrasing each other.')

# Or get the sequence classification loss (set model to train mode before if used for training)
labels = torch.tensor([1])
seq_classif_loss = sequence_classification_model(tokens_tensor, token_type_ids=segments_tensors, labels=labels)

The script above is the basic implementation of BERT for sentence paraphrase detection. It contains hard coded input sequences, therefore it doesn’t have much usability other than confirming that the model works as intended. The following python script is a modified version of the basic script that allows the user to input any two sentences. The model will attempt to determine if the sentences are paraphrasing each other.

import torch
import argparse
from transformers import BertTokenizer, logging

##
# Determine if two strings/sequences are paraphrasing each other
# Input: Two strings/sequences 
# Output: Boolean value
# Example: SequenceA: 'The company headquarters is located in New York City.'
#          SequenceB: 'The headquarters of the company resides in Manhatten.'
#          Output: 'Sequences are paraphrasing eachother.'
##
def predictMaskedToken(sequence_a, sequence_b, debug):
    # Get a pre-trained BERT tokenizer and model
    logging.set_verbosity_error()
    tokenizer = BertTokenizer.from_pretrained('bert-base-cased-finetuned-mrpc')
    sequence_classification_model = torch.hub.load('huggingface/pytorch-transformers', 'modelForSequenceClassification', 'bert-base-cased-finetuned-mrpc')
    
    # Generate model input from sequence using tokenizer
    input = tokenizer(sequence_a, sequence_b)
    # Save encoded version of sequence from input dictionary
    encoded_sequence = input["input_ids"]
    # Save ids of sequence from the input dictionary
    sequence_ids = input['token_type_ids']
    # Convert inputs to PyTorch tensors
    tokens_tensor = torch.tensor([encoded_sequence])
    segments_tensors = torch.tensor([sequence_ids])
    
    # Verbose debugging output
    if debug:
        print(f'Raw Sequence A: {sequence_a}')
        tokenized_sequence_a = tokenizer.tokenize(sequence_a)
        print(f'Tokenized sequence A: {tokenized_sequence_a}')
        print(f'Raw Sequence B: {sequence_b}')
        tokenized_sequence_b = tokenizer.tokenize(sequence_b)
        print(f'Tokenized sequence B: {tokenized_sequence_b}')
        print(f'Encoded Sequence: {encoded_sequence}')
        decoded_sequence = tokenizer.decode(encoded_sequence)
        print(f'Decoded Sequence: {decoded_sequence}')
        print(f'Sequence Ids: {sequence_ids}')

    # Run the model
    with torch.no_grad():
        seq_classif_logits = sequence_classification_model(tokens_tensor, token_type_ids=segments_tensors)

    # Extract the predicted label (0 or 1)
    predicted_labels = torch.argmax(seq_classif_logits[0]).item()

    # In MRPC dataset, label 0 means the two sentences are not paraphrasing each other
    if(predicted_labels == 0):
        print('Sequences are not paraphrasing each other.')
    else:
        print('Sequences are paraphrasing each other.')

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('-sa', '--sequenceA', help='Sequence A for paraphrase detection', required=True)
    parser.add_argument('-sb', '--sequenceB', help='Sequence B for paraphrase detection', required=True)
    parser.add_argument('-d', '--debug', help='Enable debug mode', required=False, action='store_true')
    args = parser.parse_args()

    predictMaskedToken(args.sequenceA, args.sequenceB, args.debug)

The script parameters should be used as follows.

  • -sa/--sequenceA is a required parameter and should be followed by a sequence/string.
  • -sb/--sequenceB is a required parameter and should be followed by another sequence/string.
    • sequenceA and sequenceB are the two strings that will be compared by the model for paraphrase detection.
  • -d/--debug is an optional parameter that can be passed to enable debug mode. With debug mode enabled, there will be additional logging of intermediate results such as the tokenized and encoded inputs.
  • example: python3 bert_paraphrase_detection.py -sa ‘The company headquarters is located in New York City.' -sb ‘The headquarters of the company resides in Manhatten.’ -d
    • output: The sentences are paraphrasing each other

BERT For Q&A

Paste the following code into the newly created python script.

import torch

# Load the tokenizer and the model
question_answering_model = torch.hub.load('huggingface/pytorch-transformers', 'modelForQuestionAnswering', 'bert-large-uncased-whole-word-masking-finetuned-squad')
question_answering_tokenizer = torch.hub.load('huggingface/pytorch-transformers', 'tokenizer', 'bert-large-uncased-whole-word-masking-finetuned-squad')

# The format is paragraph first and then question
text_1 = "Jim Henson was a puppeteer"
text_2 = "Who was Jim Henson ?"

# Tokenized input with special tokens around it (for BERT: [CLS] at the beginning and [SEP] at the end)
indexed_tokens = question_answering_tokenizer.encode(text_1, text_2, add_special_tokens=True)
# Define sentence A and B indices associated to 1st and 2nd sentences (see paper)
segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
# Convert inputs to PyTorch tensors
segments_tensors = torch.tensor([segments_ids])
tokens_tensor = torch.tensor([indexed_tokens])

# Predict the start and end positions logits
with torch.no_grad():
    out = question_answering_model(tokens_tensor, token_type_ids=segments_tensors)

# Get the highest prediction
answer = question_answering_tokenizer.decode(indexed_tokens[torch.argmax(out.start_logits):torch.argmax(out.end_logits)+1])
print(f'Answer: {answer}')

# Or get the total loss which is the sum of the CrossEntropy loss for the start and end token positions (set model to train mode before if used for training)
start_positions, end_positions = torch.tensor([12]), torch.tensor([14])
multiple_choice_loss = question_answering_model(tokens_tensor, token_type_ids=segments_tensors, start_positions=start_positions, end_positions=end_positions)

The script above is the basic implementation of BERT for Q&A. It contains hard coded input sequences, therefore it doesn’t have much usability other than confirming that the model works as intended. The following python script is a modified version of the basic script that allows the user to input any two sequences. The first sequence being the context to give to the model and the second being the question for the model to attempt to answer based on the given context.

import torch
import argparse
from transformers import BertTokenizer, logging

##
# Answer a question based off of a given context.
# Input: Context string and question string
# Output: Answer to the given question
# Example: Context: 'Paris is the capital of France.'
#          Question: 'What is the capital of France?'
#          Answer: 'Paris'
##
def answerQuestion(context, question, debug):
    # Get a pre-trained BERT tokenizer and model
    logging.set_verbosity_error()
    tokenizer = BertTokenizer.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')
    question_answering_model = torch.hub.load('huggingface/pytorch-transformers', 'modelForQuestionAnswering', 'bert-large-uncased-whole-word-masking-finetuned-squad')
    
    # Generate model input from sequence using tokenizer
    input = tokenizer(context, question)
    # Save encoded version of sequence from input dictionary
    encoded_sequence = input["input_ids"]
    # Save ids of sequence from the input dictionary
    sequence_ids = input['token_type_ids']
    # Convert inputs to PyTorch tensors
    tokens_tensor = torch.tensor([encoded_sequence])
    segments_tensors = torch.tensor([sequence_ids])
    
    # Verbose debugging output
    if debug:
        print(f'Raw Context: {context}')
        tokenized_context = tokenizer.tokenize(context)
        print(f'Tokenized Context: {tokenized_context}')
        print(f'Raw Question: {question}')
        tokenized_question = tokenizer.tokenize(question)
        print(f'Tokenized Question: {tokenized_question}')
        print(f'Encoded Sequence: {encoded_sequence}')
        decoded_sequence = tokenizer.decode(encoded_sequence)
        print(f'Decoded Sequence: {decoded_sequence}')
        print(f'Sequence Ids: {sequence_ids}')

    # Run the model
    with torch.no_grad():
        output = question_answering_model(tokens_tensor, token_type_ids=segments_tensors)

    # Extract the answer
    answer = tokenizer.decode(encoded_sequence[torch.argmax(output.start_logits):torch.argmax(output.end_logits)+1])

    print(f'Answer: {answer}')

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('-c', '--context', help='Context to be provided to the model', required=True)
    parser.add_argument('-q', '--question', help='Question to ask the model based on the context', required=True)
    parser.add_argument('-d', '--debug', help='Enable debug mode', required=False, action='store_true')
    args = parser.parse_args()

    answerQuestion(args.context, args.question, args.debug)

The script parameters should be used as follows.

  • -c/--context is a required parameter and should be followed by a sequence/string. This parameter will be taken as the context and should provide background information to the model. It can be anything from a single sentence to an entire story.
  • -q/--question is a required parameter and should be followed by another sequence/string. This parameter will be taken as the question that the model will attempt to answer based on the information extracted from the context.
  • -d/--debug is an optional parameter that can be passed to enable debug mode. With debug mode enabled, there will be additional logging of intermediate results such as the tokenized and encoded inputs.
  • example: python3 bert_q_and_a.py -c ‘Paris is the capital of France.' -sb ‘What is the capital of France?’ -d
    • output: Paris

Conclusion

This blog detailed the steps required to run inferencing with PyTorch on IBM Power10 systems using various bert models. Bert models for masked token prediction, paraphrase detection, and Q&A were explored. The basic implementation of each model setup was improved upon for better usability. Each of the models were either trained on general datasets or fine tuned to the specific use case detailed by this blog. One should take this blog as a starting point and further fine tune the models to better fit the desired use case. See the PyTorch Transformers page for more information.

Permalink