watsonx.ai

 View Only

How to load pre-trained Models with Transformers on your computer

By Ruslan Idelfonso Magaña Vsevolodovna posted Thu October 05, 2023 06:28 AM

  

How to load pre-trained Models with Transformers on your computer


If you are interested in working locally to develop some tests, Hugging Face provides pre-trained models for a wide range of natural language processing (NLP) tasks, including language translation, question answering, and text classification. 

Here's a general overview of how to load a pre-trained Hugging Face model in Python and a little of theory to know how to work. 

In order to work with pre-trained models is important to understand the parameters that are needed to make it possible to run the models.

Introduction 


The Hugging Face Transformers library is an open-source library that provides a wide range of pre-trained models and tools for natural language processing (NLP) tasks. It is built on top of the PyTorch and TensorFlow frameworks and offers a unified API for working with various state-of-the-art transformer models.

The Hugging Face Transformers library allows users to easily access and utilize pre-trained transformer models for tasks like text generation, text classification, named entity recognition, and more. It also provides functionalities for fine-tuning these models on custom datasets.


 AutoModelForCasualLM


AutoModelForCasualLM is a class in the Hugging Face Transformers library, which is a popular open-source library for natural language processing tasks. 

This class is specifically designed for casual language modeling tasks, where the model generates text in a conversational manner.
It is a part of the AutoModel family, which provides a unified interface for various pre-trained models.

AutoModelForCasualLM automatically selects the appropriate pre-trained model based on the task and fine-tunes it for casual language generation. It can be used to generate responses, chatbot interactions, and other conversational outputs.

AutoTokenizer


AutoTokenizer is a class in the Hugging Face Transformers library. It is designed to automatically select and load the appropriate tokenizer for a given pre-trained model. Tokenizers are used to convert raw text into numerical tokens that can be understood by machine learning models.

AutoTokenizer simplifies the process of selecting the correct tokenizer by automatically identifying the tokenizer associated with a specific pre-trained model. It eliminates the need for manually specifying and loading the tokenizer separately for each model.

By using AutoTokenizer.from_pretrained, you can easily load the tokenizer associated with a specific pre-trained model without explicitly specifying the tokenizer's name or type. This allows for a more streamlined and convenient workflow when working with different models and tasks in natural language processing.

Hello World Example 

Let us assume that you have Python installed 3.10 on your computer and also you have an Nvidia GPU at least with 8GB of memory. In this example, I will use llama2, but you should have a Hugging Face account. If you don’t have one, you can create it here and request Meta to download the model. here
1. Install CUDA 11.8.0 from this site here.

2. Install huggingface-cli tool. You can find the installation instructions here

huggingface-cli login

After running the command, you’ll be prompted to enter your Hugging Face username and password. Make sure to enter the credentials associated with your Hugging Face account.
3. Install the Hugging Face Transformers library by running the following command:

pip install transformers 


4. Install the libraries needed to run PyTorch

pip install torchvision  torchaudio torch --index-url https://download.pytorch.org/whl/cu118


5. Run the following python code:

from transformers import AutoModelForCausalLM, AutoTokenizer
 
# Load the model and tokenizer
model_name = "meta-llama/Llama-2-7b-chat-hf"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Define the prompt
prompt = "Hello, how are you today?"

# Tokenize the prompt
input_ids = tokenizer.encode(prompt, return_tensors="pt")

# Generate text
output = model.generate(input_ids, max_length=100, num_return_sequences=1)

# Decode and print the generated text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)
For  you will have something like:
Example of Hello World LLM
It is important to know the generate method allows for performing more custom operations for the new application codes.
The generate method in the Hugging Face Transformers library is used to generate text based on a given input prompt. It has several arguments that can be used to customize the generation process. Here are the commonly used arguments:
 
input_ids: This argument represents the input sequence of tokens encoded as IDs. It is typically obtained by tokenizing the input prompt using the tokenizer associated with the model.
 
- max_length: This argument specifies the maximum length of the generated text. The generation process will stop once this length is reached.
 
num_return_sequences: This argument determines the number of different sequences to generate. By default, it generates a single sequence.
 
do_sample: This argument controls whether to use sampling during generation. If set to `True`, the model will randomly select the next token based on the predicted probabilities. If set to `False`, the model will choose the token with the highest probability.
 
- temperature: This argument controls the randomness of the generated text when `do_sample` is set to `True`. Higher values (e.g., 1.0) make the output more random, while lower values (e.g., 0.2) make it more deterministic.
 
To access these arguments from the Python terminal, you can use the help function to get information about the generate method. 
help(model.generate)
This is important when you are changing the models.
In conclusion, the use of Transformers from Hugging Face can help us to perform some custom tests of some models of llm. With simple models and easy-to-use, we can effortlessly generate high-quality text for various applications. Whether it's chatbots, language translation, or content generation, Transformers provides a versatile and efficient solution that continues to push the boundaries of natural language processing.


#watsonx.ai
#GenerativeAI

0 comments
32 views

Permalink