watsonx.ai

A one-stop, integrated, end- to-end AI development studio

View Only

Back to Blog List

Deploying custom foundation models to watsonx.ai

By Saloni Saluja posted Thu March 28, 2024 01:17 PM

Deploying custom foundation models to watsonx.ai

Since its release, IBM watsonx.ai has been enabling businesses to train, validate, tune, and deploy AI models in a fraction of the time and with a fraction of the data. And starting with watsonx.ai 1.1.4 software release, you can do even more with the enterprise studio: upload and deploy your own custom foundation models.

There are many reasons to import a custom foundation model, all driven by the unique needs of your organization. Ultimately, it boils down to a specific foundation model that is optimal for the task at hand but resides outside of watsonx.ai. For instance, you may need support for a language that is not currently available in the watsonx.ai foundation model library. Or, your organization may have invested resources to fine-tune a model to optimize it for your specific industry or business domain. This “bring your own model” approach provides greater flexibility in how you select and utilize the right foundation model to meet your specific generative AI use case.

In this article, we will describe how to install and deploy a custom foundation model in watsonx.ai (version 1.1.4 or later) using the example of a foundation model from Hugging Face.

What kind of model can I deploy?

A custom foundation model:

must be built with an architecture supported by watsonx.ai (see the table below)
can be the original base version or a fine-tuned version
must include a config.json file

The following table provides information about architectures, quantization methods, and parallel tensors that are supported for each architecture:

You must check the architecture type of a model before registering the model with watsonx.ai. For example, in this article we will be using the Falcon-40b model obtained from Hugging Face. To check the model architecture type for that model, follow these steps:

Open the Falcon-40b model on the Hugging Face website and click Files and Versions.

Open the config.json file and check for the model type. This file is built with the supported Falcon architecture.

Deployment process overview

Whether you are deploying a model obtained from Hugging Face or deploying a model from your own environment, the steps are similar. In this article, we will go through the steps for a deployment process for a model obtained from Hugging Face. Hugging Face is one of the largest repositories of open-source models and a place where you can find a wide variety of foundation models. If you select a model from Hugging Face that suits your needs, you can manually import and deploy it into watsonx.ai provided it is based on one of the supported architecture types. Next, use the Prompt Lab or API to inference and embed the model into your generative AI applications as you would any other model that is currently available off-the-shelf with watsonx.ai

The variations for deploying a model from your own environment are described in the product documentation. In this example, we will install the Falcon-40b model obtained from Hugging Face..

To deploy a custom foundation model, you must have admin rights on the underlying admin cluster where watsonx.ai is installed.

The steps to deploy a custom model are divided between system administrator tasks and watsonx.ai user tasks, as follows:

Admin tasks:

Task 1: Prepare the model and upload it to PVC storage

Task 2: Register the model with watsonx.ai

watsonx.ai user tasks:

Task 3: Create the deployment for the custom model
Task 4: Prompt the custom foundation model

Prerequisites

Before you start deploying your custom model to watsonx.ai, make sure you follow these guidelines:

You must have Git Large File Storage (git lfs) installed on your cluster. To download git lfs, follow the steps described in the Github guide to installing git lfs.
You must get the digest for the fmaas-runtime-wisdom-ansible image from the case bundle ibm-watsonx-ai-ifm/inventory/watsonxaiifm/resources.yaml.
If your model is hosted on Hugging Face, you must have a Hugging Face account. To create a new account, go to the Hugging Face website. After creating a new account, generate a new Hugging Face token. To generate a token, see Hugging Face's guide to creating a token.

How to deploy a custom foundation model to watsonx.ai

Step 1: Prepare to deploy a custom foundation model

In this first task, you will set up storage, clone the model, prepare the model, create the PVC, and run the job to upload the model to the PVC storage.

To set up storage and upload a model that is located on Hugging Face, follow these steps:

Find the name of the model on the Hugging Face website.

Set up basic environment variables:

export MODEL_NAME="<Hugging Face model name>" 
export HF_TOKEN="<Hugging Face token>" 
export MODEL_PATH="<Path to the directory where you want to download your model>"

Clone the model.

git clone --no-checkout https://<Hugging Face username>:<Hugging Face token>@huggingface.co/${MODEL_NAME}

After cloning the model, check the name of the created folder.

Navigate to the folder that contains the model and then check the model size:
```
cd <folder that contains the cloned model> 
git lfs ls-files -s 
```
Example output:
```
root@wmlubntu1:~/falcon-40b# git lfs ls-files -s 
1c4b989693 - pytorch_model-00001-of-00002.bin (10 GB) 
11822397cd - pytorch_model-00002-of-00002.bin (4.5 GB)
```
Calculate the total size of the model and add a 100% buffer to the result. For example, if the model size is 14.5 GB, the size of the PVC that you create is 29 GB.

Create the pvc.yaml file that contains the details of a PVC to upload your model to:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: <pvc-name>
  namespace: <namespace-name>
spec:
  storageClassName: <storage-class>
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: <model size + buffer>Gi

For example:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: model-copy-pvc
  namespace: cpd-instance
spec:
  storageClassName: ocs-storagecluster-cephfs
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 29Gi

Create the PVC in your cluster. After creating the PVC, wait for two minutes and then run this command to verify that the PVC is bounded:
```
oc create -f pvc.yaml
```

After creating the PVC, wait for two minutes and then run this command to verify that the PVC is bounded:
```
oc get pvc <pvc-name> -n ${PROJECT_CPD_INST_OPERANDS} -o jsonpath='{.stats.phase}'
```

Encode your Hugging Face token to base64:
```
echo ${HF_TOKEN} | base64
```

Create the secret.yaml file:

apiVersion: v1
kind: Secret
metadata:
  name: <secret-name>
  namespace: ${PROJECT_CPD_INST_OPERANDS}
type: Opaque
data:
  TOKEN: <base64-encoded Hugging Face token>

Create the secret in your cluster:
```
oc apply -f secret.yaml
```

Get the digest for the fmaas-runtime-wisdom-ansible image:

oc get pods -n ${PROJECT_CPD_INST_OPERATORS} | grep ibm-cpd-watsonx-ai-ifm-operator 
oc exec -it <ibm-cpd-watsonx-ai-ifm-operator-pod-from-last-cmd> /bin/bash -n ${PROJECT_CPD_INST_OPERATORS} 
cat /opt/ansible/8.4.0/digests.yaml | grep -A3 fmaas_runtime_wisdom_ansible_image:

For example:

fmaas_runtime_wisdom_ansible_image:
name: fmaas-runtime-wisdom-ansible@sha256
tag: 2cc673a6066cab686eed7b8e6998bc453a49be4ccb993674ab0ff81f099f807f
tag_metadata: 0.39.0_ubi9_py311_cpd

Create a job that downloads the model from Hugging Face, and then (if needed) converts it to safetensor and fast-tokenizer formats:

First, create the job.yaml file:

apiVersion: batch/v1
kind: Job
metadata:
  name: <job-name>
  namespace: ${PROJECT_CPD_INST_OPERANDS}
spec:
  template:
   spec:
      containers:
      - name: models-convertor
        image: cp.icr.io/cp/cpd/fmaas-runtime-wisdom-ansible@sha256:<tag>
        env:
        - name: ${MODEL_PATH}
          value: <model-path from step 1>
        - name: ${MODEL_NAME}
          value: <model-name from step 1>
        - name: ${HF_TOKEN}
          valueFrom:
            secretKeyRef:
              name: <secret-name>
              key: TOKEN
        command: ["/bin/sh", "-c"]
        args:
        - |
          huggingface-cli login --token ${HF_TOKEN}
          huggingface-cli download ${MODEL_NAME} --local-dir ${MODEL_PATH} --cache-dir ${MODEL_PATH}
          text-generation-server convert-to-safetensors ${MODEL_PATH}
          text-generation-server convert-to-fast-tokenizer ${MODEL_PATH}
        volumeMounts:
        - mountPath: /model
          name: byom-model
      restartPolicy: Never
      volumes:
      - name: byom-model
        persistentVolumeClaim:
          claimName: <pvc-name>

For example:

apiVersion: batch/v1
kind: Job
metadata:
  name: model-copy-job
  namespace: cpd-instance
spec:
  template:
    spec:
      containers:
      - name: models-convertor
        image: cp.icr.io/cp/cpd/fmaas-runtime-wisdom-ansible@sha256:2cc673a6066cab686eed7b8e6998bc453a49be4ccb993674ab0ff81f099f807f
        env:
        - name: ${MODEL_PATH}
          value: "/model"
        - name: ${MODEL_NAME}
          value: "google/mt5-base"
        - name: TOKEN
          valueFrom:
            secretKeyRef:
              name: huggingface-tokken
              key: ${HF_TOKEN}
        command: ["/bin/sh", "-c"]
        args:
        - |
          huggingface-cli login --token ${HF_TOKEN}
          huggingface-cli download ${MODEL_NAME} --local-dir ${MODEL_PATH} --cache-dir ${MODEL_PATH}
          text-generation-server convert-to-safetensors ${MODEL_PATH}
          text-generation-server convert-to-fast-tokenizer ${MODEL_PATH}
        volumeMounts:
        - mountPath: /model
          name: byom-model
      restartPolicy: Never
      volumes:
      - name: byom-model
      persistentVolumeClaim:
        claimName: model-copy-pvc

Create and run the job in your cluster:
```
oc apply -f job.yaml
```

Verify that the job was created:

oc get job <job-name> -n <pvc-namespace>

Expected output:

NAME                   COMPLETIONS   DURATION   AGE
2<job-name>   1/1           xx         xx

Check the job status:

oc get job <job-name> -n <namespace> -o jsonpath='{.status.conditions[?(@.type=="Complete")].status}'

Expected output:
```
True
```

Step 2: Register the model with watsonx.ai

Once you have uploaded your custom foundation model, you must register the model to make it available with watsonx.ai.

To register the model with watsonx.ai, follow these steps:

Add a model entry under spec.custom_foundation_models and enter the following details:

For example:

    apiVersion: watsonxaiifm.cpd.ibm.com/v1beta1
    kind: Watsonxaiifm
    metadata:
    name: watsonxaiifm-cr
    ......
    spec:
    ignoreForMaintenance: false
    .......
    custom_foundation_models:
    - model_id: example_model_70b
        location:
          pvc_name: example_model_pvc
        tags:
        - example_model
        - 70b
        parameters:
        - name: dtype
            default: float16
            options:
            - float16
            - bfloat16
        - name: max_batch_size
            default: 256
            min: 16
            max: 512
        - name: max_concurrent_requests
            default: 64
            min: 0
            max: 128
        - name: max_sequence_length
            default: 2048
            min: 256
            max: 8192
        - name: max_new_tokens
            default: 2048
            min: 512
            max: 4096
    - model_id: example_model_13b
        location:
          pvc_name: example_model_pvc_13b

Step 3: Create the deployment for your custom model

After the model is stored and registered, you can add the model asset to a space, and then create an online deployment.

Note: The steps in this section describe how to create the deployment using API commands. To create the deployment from a deployment space, see Creating a deployment from space.

To create a deployment programmatically, you must first get the model asset ID and then create the deployment.

View available custom foundation models

Begin by running a command to list the deployable foundation models you have in watsonx.ai. To view the list of available custom foundation models by using the watsonx API, run this code:

curl --location 'https://<cluster_url>/ml/v4/custom_foundation_models' \
--header 'Authorization: Bearer $TOKEN'

Example output:

{
    "first": {
        "href": "/ml/v4/custom_foundation_models?limit=100"
    },
    "limit": 100,
    "resources": [
        {
            "model_id": "example_model_13b",
            "parameters": [
                {
                    "default": "float16",
                    "display_name": "Data Type",
                    "name": "dtype",
                    "options": [
                        "float16",
                        "bfloat16"
                    ],
                    "type": "string"
                },
                {
                    "default": 256,
                    "display_name": "Max Batch Size",
                    "name": "max_batch_size",
                    "type": "number"
                },
                {
                    "default": 1024,
                    "display_name": "Max Concurrent Requests",
                    "name": "max_concurrent_requests",
                    "type": "number"
                },
                {
                    "default": 2048,
                    "display_name": "Max New Tokens",
                    "name": "max_new_tokens",
                    "type": "number"
                },
                {
                    "default": 2048,
                    "display_name": "Max Sequence Length",
                    "name": "max_sequence_length",
                    "type": "number"
                }
            ]
        },
        {
            "model_id": "example_model_70b",
            "parameters": [
                {
                    "default": "float16",
                    "display_name": "Data Type",
                    "name": "dtype",
                    "options": [
                        "float16",
                        "bfloat16"
                    ],
                    "type": "string"
                },
                {
                    "default": 256,
                    "display_name": "Max Batch Size",
                    "max": 512,
                    "min": 16,
                    "name": "max_batch_size",
                    "type": "number"
                },
                {
                    "default": 64,
                    "display_name": "Max Concurrent Requests",
                    "max": 128,
                    "min": 0,
                    "name": "max_concurrent_requests",
                    "type": "number"
                },
                {
                    "default": 2048,
                    "display_name": "Max New Tokens",
                    "max": 4096,
                    "min": 512,
                    "name": "max_new_tokens",
                    "type": "number"
                },
                {
                    "default": 2048,
                    "display_name": "Max Sequence Length",
                    "max": 8192,
                    "min": 256,
                    "name": "max_sequence_length",
                    "type": "number"
                }
            ],
            "tags": [
                "example_model",
                "70b"
            ]
        }
    ],
    "total_count": 2
}

Note: If you access the model list programmatically, you can access all the parameters that you can set for the selected model. For models deployed through the UI, the parameters are available at the online deployment creation phase. See the description of parameters for custom foundation models.

The next step is to create the custom foundation model asset. You can create a model asset in two contexts: project context and space context.

If you create a project asset in project context, you can then import the model to your project and then promote it to space.
If you create a project asset in space context, you can import the model and then deploy it online. A model deployed from a space is also accessible from Prompt Lab in the project scope.

To create a model asset for your custom foundation model in space context, run this code:

curl -X POST "https://<cluster_url>/ml/v4/models?version=2024-01-29" \
-H "Authorization: Bearer " \
-H "content-type: application/json" \
--data '{
            "name": "<a meaningful name>",
            "space_id": "<your space id>",
            "foundation_model": {
            "model_id": "<your model id>"
            },
            "type": "custom foundation model 1.0",
            "software_spec": {
            "name": "watsonx-cfm-caikit-1.0"
            }
        }'

Note: The model type must be custom_foundation_model_1.0. The software specification name must be watsonx-cfm-caikit-1.0. You cannot customize the software specification.

Create online deployment

When the custom foundation model asset has been created, you are ready to create the online deployment.

For instance, here is an example code that shows a sample deployment with some of the parameters overridden.

curl -X POST "https://<cluster_url>/ml/v4/deployments?version=2024-01-29" \
-H "Authorization: Bearer " \
-H "content-type: application/json" \
--data '{
  "asset":{
    "id":<your custom foundation model id>  // WML custom foundation model asset
  },
  "online":{
    "parameters":{
      "serving_name":"test_custom_fm",
      "foundation_model": {
           "max_sequence_length": 4096
      }
    }
  },
  "hardware_spec": {                        // Only one, of "id" or "name" must be set.
    "id": "<your custom hardware spec id>",
    "num_nodes": 1
  },
  "description": "Testing deployment using custom foundation model",
  "name":"custom_fm_deployment",
  "project_id":<your project id>  // Either "project_id" (or) "space_id". Only one is allowed
}'

View the status for a deployment

You can view the status for an existing deployment by running this command:

curl -X GET "https://<cluster_url>/ml/v4/deployments/<your deployment ID>?version=2024-01-29&project_id=<your project ID>" \
-H "Authorization: Bearer "

Note: The deployed_asset_type is returned as custom_foundation_model.

Step 4: Run a prompt using your custom model

Now that you have stored and deployed your custom foundation model, you can start using it. You can use the Prompt Lab to prompt the model and generate responses or create a prompt programmatically.

To run a prompt with the custom model using the API, run this code:

curl -X POST "https://<cluster_url>/ml/v1/deployments/<your deployment ID>/text/generation?version=2024-01-29" \
-H "Authorization: Bearer " \
-H "content-type: application/json" \
--data '{
 "input": "Hello, what is your name",
 "parameters": {
    "max_new_tokens": 200,
    "min_new_tokens": 20
 }
}'

Congratulations, your model is now fully deployed and ready to use!

Summary

By deploying a custom foundation model to watsonx.ai, you are able to work with a model that best fits your project and business needs. A custom model can be any model that is built with an architecture supported by watsonx.ai, which greatly expands your options and flexibility in terms of the models that best fit your specific use case.

In this article, we covered the steps necessary to install and deploy a custom foundation model, using the example of a foundation model from Hugging Face. To learn about other deployment options, see the Deploying foundation model documentation.

#watsonx.ai
#MachineLearning
#PromptLab
#GenerativeAI

0 comments

117 views

Permalink

https://community.ibm.com/community/user/blogs/saloni-saluja/2024/03/28/deploying-custom-foundation-models-to-watsonxai

watsonx.ai

watsonx.ai

Deploying custom foundation models to watsonx.ai

By Saloni Saluja posted Thu March 28, 2024 01:17 PM

Deploying custom foundation models to watsonx.ai

What kind of model can I deploy?

Deployment process overview

Admin tasks:

watsonx.ai user tasks:

Prerequisites

How to deploy a custom foundation model to watsonx.ai

Step 1: Prepare to deploy a custom foundation model

Step 2: Register the model with watsonx.ai

Step 3: Create the deployment for your custom model

View available custom foundation models

Create online deployment

View the status for a deployment

Step 4: Run a prompt using your custom model

Summary

Permalink

Additional
Resources

Office

Quick Links

watsonx.ai

watsonx.ai

Deploying custom foundation models to watsonx.ai

By Saloni Saluja posted Thu March 28, 2024 01:17 PM

Deploying custom foundation models to watsonx.ai

What kind of model can I deploy?

Deployment process overview

Admin tasks:

watsonx.ai user tasks:

Prerequisites

How to deploy a custom foundation model to watsonx.ai

Step 1: Prepare to deploy a custom foundation model

Step 2: Register the model with watsonx.ai

Step 3: Create the deployment for your custom model

View available custom foundation models

Create online deployment

View the status for a deployment

Step 4: Run a prompt using your custom model

Summary

Permalink

Additional Resources

Office

Quick Links

Additional
Resources