watsonx.ai

watsonx.ai

A one-stop, integrated, end- to-end AI development studio

 View Only

Using Embedding Models to improve text retrieval with watsonx.ai

By Kevin Macdonald posted Thu April 18, 2024 07:18 PM

  

watsonx.ai is an enterprise studio to train, validate, tune, and deploy AI models and generative AI applications for your business use cases. Our studio provides every tool imaginable to help you achieve your AI goals. The latest edition is the Embeddings Service API which allows you to generate text embeddings for your data and develop popular use cases such as retrieval augmented generation (RAG).

The magic behind this API is powered by embedding models which are encoder-only foundation models that create text embeddings. Text embeddings capture the meaning of a sentence or passage and are commonly used to help with document comparison, question-answering, and retrieval-augmented generation tasks.

What is a text embedding?

A text embedding is a numerical representation – or vector – of a sentence or passage. By converting sentences to vectors, operations on sentences become more like math equations, which is something computers can do quickly, and can do well.

An embedding model creates a vector representation of a sentence by assigning numeric values that capture the semantic meaning of the sentence. The model also positions the vector within a multidimensional space based on its assigned values. While the size of this dimensional space varies by model, all models position vectors so that sentences with similar meanings are closer to one another in the space.

Which models support text embeddings in watsonx.ai?

The following models from the IBM Slate family of foundation models are included with the Embeddings Service API:

·         slate-30m-english-rtrvr

·         slate-125m-english-rtrvr

For complete details, see the Supported embedding models page in the IBM watsonx documentation.

Note: Additional open-source embedding models will be added in the near future to provide more choice and flexibility.

A quick overview of the embeddings API

Below we’ll cover how to use the watsonx.ai Embeddings Service API to convert text passages into vectors, you can also review details in the documentation.

Generate embeddings

To generate embeddings, run the following REST API commands to convert text input into vectors, using the POST request.

To try this technique, replace eyJhbGciOiJSUzUxM... with your bearer token. You will also need a model_ID and your watsonx.ai project_ID:

POST request details:

 

curl --post 'https://wml-fvt.ml.test.cloud.ibm.com/ml/v1/text/embeddings?version=2024-04-04'

-H 'Authorization: Bearer eyJhbGciOiJSUzUxM...'

-H 'Content-Type: application/json'

-H 'Accept: application/json'

-d '{

  "inputs": [

     "A foundation model is a large generative AI model.",

    "Retrieval-augmented generation (RAG) is a technique in which a foundation model is augmented with knowledge from external sources to generate text."

  ],

  "model_id": "ibm/slate-30m-english-rtrvr",

  "project_id": "81823e98-c691-48a2-9bcc-e637a84db410"

}'

 

Example response

The code returns a REST API array of embedding ratings for each input string.

Note that each input becomes a separate embedding object in the output. And each object has an n-dimension number of vector values. For example, the slate-30m-english-rtrvr model has something like 385 dimensions, so there are about 384 values in the embedding object.

response:

{

  "model_id": "ibm/slate-30m-english-rtrvr",

  "created_at": "2024-04-04T14:50:23.923Z",

  "results": [

    {

      "embedding": [

        -0.020545086,

        . . .

      ]

    },

    {

      "embedding": [

        -0.020545086,

        . . .

      ]

    }

  ],

  "input_token_count": 45

}

 

Storing vectors in a vector database

Vectors that are returned as responses from the API can be used to compare passages in real time or can be stored in a database. Once you have stored vectorized text passages, you can reference the store when you need to answer a question, or to ground a prompt with relevant information.

While you can store embeddings results vectors in any vector database of your choice, IBM provides an integrated vector database, based on the open-source Milvus vector store, as part of watsonx.data. See the Working with Milvus documentation for full details.

Once you have created a Milvus service, you can create a collection in Milvus:

curl -X 'POST' \
  '${MILVUS_HOST}:${MILVUS_PORT}/v1/vector/collections/create' \
  -H 'Authorization: Bearer ${TOKEN}' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
       "dbName": "default",  
       "collectionName": "text_inputs",
       "dimension": 256,
       "metricType": "L2",
       "primaryField": "id",
       "vectorField": "vector"
      }'

Next, upload your vector data to the collection:

curl --request POST \
     --url '${MILVUS_HOST}:${MILVUS_PORT}/v1/vector/insert' \
     --header 'Authorization: Bearer <TOKEN>' \
     --header 'accept: application/json' \
     --header 'content-type: application/json' \
     -d '{
         "collectionName": "text_inputs",
         "data": [
             {
                "vector": [0.1, 0.2, 0.3],
                "name": "Person1",
                "email": "person1@company.com",
                "date": "2024-04-13"
             },{
                "vector": [0.1, 0.2, 0.3],
                "name": "Person2",
                "email": "person2@company.com",
                "date": "2024-04-11"
             }
         ]
     }'

Using your generated embeddings

As stated earlier, a common use of embeddings – vectors – is when using Retrieval Augmented Generation (RAG) patterns with your foundation model. RAG is a technique for enhancing the accuracy and reliability of AI models with facts fetched from external sources. For more information see Retrieval-augmented generation in the IBM watsonx documentation.

A common approach with RAG is to create dense vector representations of your data, in order to calculate the semantic similarity to a given user query. The IBM Embeddings API then allows you to take your data, embed it using an embedding model, load the data into a vector database, and then query that data.

For a great example of this use case, see the sample Python notebook on GitHub.

In this article, you learned about how to get started with the embeddings service now available with watsonx.ai.  We covered how to generate embeddings, store them and leverage them in a RAG use case.  To learn more about watsonx.ai, and all the services and foundation models it provides, visit the product page and/or start a free trial today.


#watsonx.ai
0 comments
53 views

Permalink