watsonx.ai

watsonx.ai

A one-stop, integrated, end- to-end AI development studio

 View Only

Exploring Lengthy YouTube Video Summarization with IBM watsonx.ai and LangChain

By Ravi Bansal posted Mon March 18, 2024 10:20 AM

  
Introduction
IBM watsonx.ai platform brings together new generative AI capabilities powered by foundation models and traditional machine learning (ML) into a powerful studio spanning the AI lifecycle. With watsonx, clients have access to IBM selected open-source models from Hugging Face and a family of IBM-trained foundation models such as GRANITE_13B_CHAT for a variety of enterprise NLP generative AI tasks.
 
Watsonx.ai Prompt Lab
The watsonx.ai platform features a Prompt Lab where AI builders can work with foundation models and build prompts using prompt engineering. Within the Prompt Lab, users can experiment with zero-shot, one-shot, or few-shot prompting to support a range of Natural Language Processing (NLP) type tasks including question answering, content generation and summarization. An example of Prompt Lab showing a summarization use case is shown in the picture below. The instructional prompt is shown in the Instruction set up. This prompt along with the "earning call" text is input under the Document section The LLM is set to "flan-ul2-20b" on the top-right. On clicking "Generate" button, the summary is generated seen on the bottom right in the light blue panel.  
LLM Token Limitation
A challenge often encountered with using LLMs with simple prompt engineering like in the Prompt Lab is the token limitation. A token is an element of text, typically, equivalent to a word in the text. Every generative AI LLM model has a maximum token limit or context limit also known as the "context window". Token limits represent the maximum amount of text that can be processed by an AI model. Once the total number of tokens within an LLM call and response (i.e., including all prompts and completions) exceeds the maximum token limit for the model, the LLM might generate an error or the context from the beginning of the conversation is lost. If a user requests a summarization of a large amount of text, the LLM is likely to omit a part of the text once the token limit is exceeded. The resulting summary delivered by the LLM can, therefore, be factually inaccurate.
LangChain
Langchain is a comprehensive framework for developing applications powered by large language models. It is designed to streamline and enhance text analysis tasks. It encompasses a diverse set of tools and functionalities, including text summarization, question answering, sentiment analysis, and document classification. With its modular architecture and extensive collection of pre-trained models, Langchain provides developers with a flexible and scalable framework for building NLP solutions tailored to specific use cases.
Map-Reduced Earnings Call Summarization from a YouTube video
In this article, we will be summarizing an hour-long YouTube video using an LLM. We will discover that the video transcript is too long to be passed into ate context window of an LLM. Therefore, we will be splitting the transcript into smaller chunks using LangChain. This step is essential for processing large amount of text efficiently. Following that, we will use a LangChain summarization chain of the type "Map Reduce". This is a method that first generates a summary of the smaller chunks that fit within the token limit. And, then it generates a "summary of the summaries". 

Watsonx.ai Python Notebook

The code that follows can be found at the following link:

https://github.com/rbansal100/ai/blob/main/LongTextSummarizationWithLangChain.ipynb

Please note that, if running this notebook as part of your own watsonx project, you will need to provide your own PROJECT_ID and API_KEY. You will aos need to associate a Watson Machine Learning service. This can be done by going to the Project's Manage tab, selecting "Service and integrations" page. The more powerful (GPU-enabled) the type of Watson Machine Learning selected, the faster will be the LLM inferencing response.

Let's get started!

1. Install Required Packages:

The script begins by installing several Python packages using the pip package manager. These packages include langchain, langchain_ibm, ibm-watsonx-ai, and youtube-transcript-api.

!pip install langchain -q
!pip install langchain_ibm -q 
!pip install ibm-watsonx-ai -q
!pip install youtube-transcript-api -q

2. Create the credentials dictionary with the HTTPS endpoint to the Watson Machine Learning service  and the IBM Cloud API Key. Also set the Project ID

credentials = {
    #Using the endpoint of the Watson runtime in Dallas as that is associated with the project
    "url": "https://us-south.ml.cloud.ibm.com",      

     #Use your own api key. Create a new api key under using the IBM Cloud IAM if one is not avaiable.
    "apikey": "KGzXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"  
}

#Use your own project_id. This can be found on "Manage" tab of Watsonx Project page under "Details" section
project_id = "2036XXXXXXXXXXX" 

3. LangChain provides a YoutubeLoader class from the langchain_community.document_loaders module to load YouTube videos's transcript. The YouTube video's URL is provided as input. The loader to loads the Youtube video's transcript into Document objects. This only works with YouTube videos that have transcripts. Speech-to-Text can be used for videos missing a transcript. 

from langchain_community.document_loaders import YoutubeLoader
loader = YoutubeLoader.from_youtube_url( "https://www.youtube.com/watch?v=txOv_pi-_R4", add_video_info=False)
list_of_doc_objects = loader.load()

4. Display the video:

from IPython.display import YouTubeVideo

YouTubeVideo('txOv_pi-_R4', width=400, height=300)

Link to video

5. Extract the text from the loaded Document object.

print("**** Number of document objects ****")
print(str(len(list_of_doc_objects)))

print("**** Text of document ****")
text = list_of_doc_objects[0].page_content
print(text)
**** Number of document objects ****
1
**** Text of document ****
good afternoon my name Rob and I'll be your conference operator today at this time I would like to welcome everyone to the nvidia's fourth quarter earnings call all lines have been placed on mute to prevent any background noise after the speaker's remarks there will be a question and answer session if you would like to ask a question during this time simply press star followed by the number one on your telephone keypad if you would like to 

6. Analyze the text for number of characters and tokens. If the text is too long to fix in the context window (token limit), it will need splitting or chunking. 

words = text.split()
word_count = len(words)
print ("Number of words = " + str(word_count))
char_count = len(text)
print ("Number of characters = " + str(char_count))
num_tokens =    int (char_count/4)  # Using a thumb rule of 4 chars per token
print ("Approx Number of tokens = " + str(num_tokens))
Number of words = 8429
Number of characters = 48711
Approx Number of tokens = 12177

The above data tells us that, with >8000 words and >12k tokens, the text cannot be sent to an LLM for processing as it will exceed the token limit of most foundation models. 

7.  The text will be split into smaller chunks using the RecursiveCharacterTextSplitter class from the langchain.text_splitter module. This step is essential for processing large text efficiently. Text can be split in multiple ways such as by sentences, paragraphs etc. We want to split the text in such a way so that each chunk retains enough summarizable content without exceeding the token limit. Since we plan to use LLAMA_2_70B_CHAT model with a limit of 4k tokens, we will split the text for each chunk to be 2000 tokens or approximately 8000 characters. An overlap 800 characters enables maintaining context continuity between chunks when splitting text. 

from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=8000, chunk_overlap=800)  #8000 characters or 2000 tokens
chunks = text_splitter.split_documents(list_of_doc_objects)
print("Number of Chunks: " + str(len(chunks)))

for x in range (0, len(chunks)):
    print("Chunk Number: "+str(x) +" " + chunks[x].page_content + "\n")
Number of Chunks: 7
Chunk Number: 0 good afternoon my name Rob and I'll be your conference operator today at this time I would like to welcome everyone to the nvidia's fourth quarter earnings call all lines have been placed on mute to prevent any background noise after the speaker's r

We see the text split into 7 chunks.

8. An instance of the wrapper IBM WatsonxLLM is initialized using the WatsonxLLM class from the langchain_ibm module. We set parameters for the AI model, specifying the decoding method, minimum and maximum number of new tokens, and stop sequences. The model ID, URL, API key, project ID, and parameters are provided as inputs. Please note that the instance of WatsonxLLM gives us a handle to the "LLAMA_2_70B_CHAT" foundation model so we can run inferences against it in the next step.

from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
from ibm_watsonx_ai.foundation_models.utils.enums import DecodingMethods
from ibm_watsonx_ai.foundation_models.utils.enums import ModelTypes


#model parameters
parameters = {   
    GenParams.DECODING_METHOD: DecodingMethods.GREEDY,
    GenParams.MIN_NEW_TOKENS: 1,
    GenParams.MAX_NEW_TOKENS: 500,
    GenParams.STOP_SEQUENCES: ["<|endoftext|>"]
}

from langchain_ibm import WatsonxLLM

model_id = ModelTypes.LLAMA_2_70B_CHAT  #LLM Model selected

llm = WatsonxLLM(
    model_id=model_id.value,
    url=credentials.get("url"),
    apikey=credentials.get("apikey"),
    project_id=project_id,
    params=parameters
)

9. Now we load a summarization chain using the load_summarize_chain function from the langchain.chains.summarize module. This chain is responsible for generating summaries based on the input text. The chain of the type "Map Reduce". Map-Reduce works by first "mapping" each document or document chunk to an individual summary. This is guided by the "map_prompt" and is done using a separate LLM call for each document or text chunk. Once all of the individual summaries have been generated, they are then reduced to combine them into a single global summary. This is guided by the "combine_prompt" and is also done using a separate LLM call.

from langchain.chains.summarize import load_summarize_chain
from langchain.prompts import PromptTemplate

map_prompt = "Write a concise summary of the following:'{text}' CONCISE SUMMARY: "
map_prompt_template = PromptTemplate(template=map_prompt, input_variables=["text"])

combine_prompt = "Write a concise abstractive summary of the following:'{text}' Summary should include financial numbers."
combine_prompt_template = PromptTemplate(template=combine_prompt, input_variables=["text"])

summary_chain = load_summarize_chain(llm=llm,
                                     chain_type='map_reduce',
                                     map_prompt=map_prompt_template,
                                     combine_prompt=combine_prompt_template,
                                     verbose=False
                                    )

10. Finally we invoke the summarization chain on the text chunks obtained earlier. The generated summary is printed to the console. The summary is less than 300 words.

LLM_response = summary_chain.invoke(chunks)
print(LLM_response.get("output_text"))

We can try different foundation models and vary chunk sizes and prompts to get the best results. The selection of values for these may be dynamically calculated based on the characteristics such as the length and format of input text. As an example, please see a different output obtained by modifying the prompt.

map_prompt = "Write a concise summary of the following:'{text}' CONCISE SUMMARY: "
map_prompt_template = PromptTemplate(template=map_prompt, input_variables=["text"])

combine_prompt = "Write a concise bullet point summary of the following:'{text}'. The summary should include financial numbers."
combine_prompt_template = PromptTemplate(template=combine_prompt, input_variables=["text"])

summary_chain = load_summarize_chain(llm=llm,
                                     chain_type='map_reduce',
                                     map_prompt=map_prompt_template,
                                     combine_prompt=combine_prompt_template,
                                     verbose=False
                                    )

                                    
LLM_response = summary_chain.invoke(chunks)
print(LLM_response.get("output_text"))

Conclusion

This blog article demonstrated a workflow for leveraging IBM Watson AI services to automatically generate summaries from YouTube video transcripts. It showcases the integration of various Python libraries and modules to streamline the process of text processing and summarization.


#watsonx.ai

0 comments
125 views

Permalink