Deploying custom foundation models to watsonx.ai
Since its release, IBM watsonx.ai has been enabling businesses to train, validate, tune, and deploy AI models in a fraction of the time and with a fraction of the data. And starting with watsonx.ai 1.1.4 software release, you can do even more with the enterprise studio: upload and deploy your own custom foundation models.
There are many reasons to import a custom foundation model, all driven by the unique needs of your organization. Ultimately, it boils down to a specific foundation model that is optimal for the task at hand but resides outside of watsonx.ai. For instance, you may need support for a language that is not currently available in the watsonx.ai foundation model library. Or, your organization may have invested resources to fine-tune a model to optimize it for your specific industry or business domain. This “bring your own model” approach provides greater flexibility in how you select and utilize the right foundation model to meet your specific generative AI use case.
In this article, we will describe how to install and deploy a custom foundation model in watsonx.ai (version 1.1.4 or later) using the example of a foundation model from Hugging Face.
What kind of model can I deploy?
A custom foundation model:
-
must be built with an architecture supported by watsonx.ai (see the table below)
-
can be the original base version or a fine-tuned version
-
must include a config.json file
The following table provides information about architectures, quantization methods, and parallel tensors that are supported for each architecture:
You must check the architecture type of a model . For example, in this article we will be using the Falcon-40b model obtained from Hugging Face. To check the model architecture type for that model, follow these steps:
-
Open on the Hugging Face website and click Files and Versions.
-
Open the config.json file and check for the model type. This file is built with the supported Falcon architecture.
Deployment process overview
Whether you are deploying a model or deploying a model from your own environment, the steps are similar. In this article, we will go through the steps for a deployment process . Hugging Face is one of the largest repositories of open-source models and a place where you can find a wide variety of foundation models. If you select a model from Hugging Face that suits your needs, you can manually import and deploy it into watsonx.ai provided it is based on one of the supported architecture types. Next, use the Prompt Lab or API to inference and embed the model into your generative AI applications as you would any other model that is currently available off-the-shelf with watsonx.ai
The variations for deploying a model from your own environment are described in the product documentation. In this example, we will install .
The steps to deploy a custom model are divided between system administrator tasks and watsonx.ai user tasks, as follows:
Admin tasks:
watsonx.ai user tasks:
Prerequisites
Before you start deploying your custom model to watsonx.ai, make sure you these guidelines:
-
You must have Git Large File Storage (git lfs) installed on your cluster. To download git lfs, follow the steps described in the Github guide to installing git lfs.
-
You must get the digest for the fmaas-runtime-wisdom-ansible image from the case bundle ibm-watsonx-ai-ifm/inventory/watsonxaiifm/resources.yaml.
-
If your model is hosted on Hugging Face, you must have a Hugging Face account. To create a new account, go to the Hugging Face website. After creating a new account, generate a new Hugging Face token. To generate a token, see Hugging Face's guide to creating a token.
How to deploy a custom foundation model to watsonx.ai
Step 1: Prepare to deploy a custom foundation model
In this first task, you will set up storage, clone the model, prepare the model, create the PVC, and run the job to upload the model to the PVC storage.
To set up storage and upload a model that is located on Hugging Face, follow these steps:
-
Find the name of the model on the Hugging Face website.
-
Set up basic environment variables:
export MODEL_NAME="<Hugging Face model name>"
export HF_TOKEN="<Hugging Face token>"
export MODEL_PATH="<Path to the directory where you want to download your model>"
After cloning the model, check the name of the created folder.
-
Navigate to the folder that contains the model and then check the model size:
cd <folder that contains the cloned model>
git lfs ls-files -s
Example output:
root@wmlubntu1:~/falcon-40b# git lfs ls-files -s
1c4b989693 - pytorch_model-00001-of-00002.bin (10 GB)
11822397cd - pytorch_model-00002-of-00002.bin (4.5 GB)
Calculate the total size of the model and add a 100% buffer to the result. For example, if the model size is 14.5 GB, the size of the PVC that you create is 29 GB.
-
Create the pvc.yaml file that contains the details of a PVC to upload your model to:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: <pvc-name>
namespace: <namespace-name>
spec:
storageClassName: <storage-class>
accessModes:
- ReadWriteMany
resources:
requests:
storage: <model size + buffer>Gi
For example:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: model-copy-pvc
namespace: cpd-instance
spec:
storageClassName: ocs-storagecluster-cephfs
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 29Gi
-
Create the PVC in your cluster. After creating the PVC, wait for two minutes and then run this command to verify that the PVC is bounded:
oc create -f pvc.yaml
-
Encode your Hugging Face token to base64:
echo ${HF_TOKEN} | base64
-
Create the secret.yaml file:
apiVersion: v1
kind: Secret
metadata:
name: <secret-name>
namespace: ${PROJECT_CPD_INST_OPERANDS}
type: Opaque
data:
TOKEN: <base64-encoded Hugging Face token>
-
Create the secret in your cluster:
oc apply -f secret.yaml
-
Get the digest for the fmaas-runtime-wisdom-ansible image:
oc get pods -n ${PROJECT_CPD_INST_OPERATORS} | grep ibm-cpd-watsonx-ai-ifm-operator
oc exec -it <ibm-cpd-watsonx-ai-ifm-operator-pod-from-last-cmd> /bin/bash -n ${PROJECT_CPD_INST_OPERATORS}
cat /opt/ansible/8.4.0/digests.yaml | grep -A3 fmaas_runtime_wisdom_ansible_image:
-
Create a job that downloads the model from Hugging Face, and then (if needed) converts it to safetensor and fast-tokenizer formats:
First, create the job.yaml file:
apiVersion: batch/v1
kind: Job
metadata:
name: <job-name>
namespace: ${PROJECT_CPD_INST_OPERANDS}
spec:
template:
spec:
containers:
- name: models-convertor
image: cp.icr.io/cp/cpd/fmaas-runtime-wisdom-ansible@sha256:<tag>
env:
- name: ${MODEL_PATH}
value: <model-path from step 1>
- name: ${MODEL_NAME}
value: <model-name from step 1>
- name: ${HF_TOKEN}
valueFrom:
secretKeyRef:
name: <secret-name>
key: TOKEN
command: ["/bin/sh", "-c"]
args:
- |
huggingface-cli login --token ${HF_TOKEN}
huggingface-cli download ${MODEL_NAME} --local-dir ${MODEL_PATH} --cache-dir ${MODEL_PATH}
text-generation-server convert-to-safetensors ${MODEL_PATH}
text-generation-server convert-to-fast-tokenizer ${MODEL_PATH}
volumeMounts:
- mountPath: /model
name: byom-model
restartPolicy: Never
volumes:
- name: byom-model
persistentVolumeClaim:
claimName: <pvc-name>
- For example:
apiVersion: batch/v1
kind: Job
metadata:
name: model-copy-job
namespace: cpd-instance
spec:
template:
spec:
containers:
- name: models-convertor
image: cp.icr.io/cp/cpd/fmaas-runtime-wisdom-ansible@sha256:2cc673a6066cab686eed7b8e6998bc453a49be4ccb993674ab0ff81f099f807f
env:
- name: ${MODEL_PATH}
value: "/model"
- name: ${MODEL_NAME}
value: "google/mt5-base"
- name: TOKEN
valueFrom:
secretKeyRef:
name: huggingface-tokken
key: ${HF_TOKEN}
command: ["/bin/sh", "-c"]
args:
- |
huggingface-cli login --token ${HF_TOKEN}
huggingface-cli download ${MODEL_NAME} --local-dir ${MODEL_PATH} --cache-dir ${MODEL_PATH}
text-generation-server convert-to-safetensors ${MODEL_PATH}
text-generation-server convert-to-fast-tokenizer ${MODEL_PATH}
volumeMounts:
- mountPath: /model
name: byom-model
restartPolicy: Never
volumes:
- name: byom-model
persistentVolumeClaim:
claimName: model-copy-pvc
-
Create and run the job in your cluster:
oc apply -f job.yaml
Step 2: Register the model with watsonx.ai
Once you have uploaded your custom foundation model, you must register the model to make it available with watsonx.ai.
To register the model with watsonx.ai, follow these steps:
-
Log in to OpenShift and then edit the CR:
oc edit Watsonxaiifm
-
Add a model entry under spec.custom_foundation_models and enter the following details:
Step 3: Create the deployment for your custom model
After the model is stored and registered, you can add the model asset to a space, and then create an online deployment.
Note: The steps in this section describe how to create the deployment using API commands. To create the deployment from a deployment space, see Creating a deployment from space.
To create a deployment programmatically, you must first get the model asset ID and then create the deployment.
View available custom foundation models
Begin by running a command to list the deployable foundation models you have in watsonx.ai. To view the list of available custom foundation models by using the watsonx API, run this code:
curl --location 'https://<cluster_url>/ml/v4/custom_foundation_models' \
--header 'Authorization: Bearer $TOKEN'
{
"first": {
"href": "/ml/v4/custom_foundation_models?limit=100"
},
"limit": 100,
"resources": [
{
"model_id": "example_model_13b",
"parameters": [
{
"default": "float16",
"display_name": "Data Type",
"name": "dtype",
"options": [
"float16",
"bfloat16"
],
"type": "string"
},
{
"default": 256,
"display_name": "Max Batch Size",
"name": "max_batch_size",
"type": "number"
},
{
"default": 1024,
"display_name": "Max Concurrent Requests",
"name": "max_concurrent_requests",
"type": "number"
},
{
"default": 2048,
"display_name": "Max New Tokens",
"name": "max_new_tokens",
"type": "number"
},
{
"default": 2048,
"display_name": "Max Sequence Length",
"name": "max_sequence_length",
"type": "number"
}
]
},
{
"model_id": "example_model_70b",
"parameters": [
{
"default": "float16",
"display_name": "Data Type",
"name": "dtype",
"options": [
"float16",
"bfloat16"
],
"type": "string"
},
{
"default": 256,
"display_name": "Max Batch Size",
"max": 512,
"min": 16,
"name": "max_batch_size",
"type": "number"
},
{
"default": 64,
"display_name": "Max Concurrent Requests",
"max": 128,
"min": 0,
"name": "max_concurrent_requests",
"type": "number"
},
{
"default": 2048,
"display_name": "Max New Tokens",
"max": 4096,
"min": 512,
"name": "max_new_tokens",
"type": "number"
},
{
"default": 2048,
"display_name": "Max Sequence Length",
"max": 8192,
"min": 256,
"name": "max_sequence_length",
"type": "number"
}
],
"tags": [
"example_model",
"70b"
]
}
],
"total_count": 2
}
Note: If you access the model list programmatically, you can access all the parameters that you can set for the selected model. For models deployed through the UI, the parameters are available at the online deployment creation phase. See the description of parameters for custom foundation models.
The next step is to create the custom foundation model asset. You can create a model asset in two contexts: project context and space context.
-
If you create a project asset in project context, you can then import the model to your project and then promote it to space.
-
If you create a project asset in space context, you can import the model and then deploy it online. A model deployed from a space is also accessible from Prompt Lab in the project scope.
To create a model asset for your custom foundation model in space context, run this code:
curl -X POST "https://<cluster_url>/ml/v4/models?version=2024-01-29" \
-H "Authorization: Bearer " \
-H "content-type: application/json" \
--data '{
"name": "<a meaningful name>",
"space_id": "<your space id>",
"foundation_model": {
"model_id": "<your model id>"
},
"type": "custom foundation model 1.0",
"software_spec": {
"name": "watsonx-cfm-caikit-1.0"
}
}'
Note: The model type must be custom_foundation_model_1.0. The software specification name must be watsonx-cfm-caikit-1.0. You cannot customize the software specification.
Create online deployment
When the custom foundation model asset has been created, you are ready to create the online deployment.
For instance, here is an example code that shows a sample deployment with some of the parameters overridden.
curl -X POST "https://<cluster_url>/ml/v4/deployments?version=2024-01-29" \
-H "Authorization: Bearer " \
-H "content-type: application/json" \
--data '{
"asset":{
"id":<your custom foundation model id> // WML custom foundation model asset
},
"online":{
"parameters":{
"serving_name":"test_custom_fm",
"foundation_model": {
"max_sequence_length": 4096
}
}
},
"hardware_spec": { // Only one, of "id" or "name" must be set.
"id": "<your custom hardware spec id>",
"num_nodes": 1
},
"description": "Testing deployment using custom foundation model",
"name":"custom_fm_deployment",
"project_id":<your project id> // Either "project_id" (or) "space_id". Only one is allowed
}'
View the status for a deployment
You can view the status for an existing deployment by running this command:
curl -X GET "https://<cluster_url>/ml/v4/deployments/<your deployment ID>?version=2024-01-29&project_id=<your project ID>" \
-H "Authorization: Bearer "
Note: The deployed_asset_type is returned as custom_foundation_model.
Step 4: Run a prompt using your custom model
Now that you have stored and deployed your custom foundation model, you can start using it. You can use the Prompt Lab to prompt the model and generate responses or create a prompt programmatically.
To run a prompt the custom model using the API, run this code:
curl -X POST "https://<cluster_url>/ml/v1/deployments/<your deployment ID>/text/generation?version=2024-01-29" \
-H "Authorization: Bearer " \
-H "content-type: application/json" \
--data '{
"input": "Hello, what is your name",
"parameters": {
"max_new_tokens": 200,
"min_new_tokens": 20
}
}'
Congratulations, your model is now fully deployed and ready to use!
Summary
By deploying a custom foundation model to watsonx.ai, you are able to work with a model that best fits your project and business needs. A custom model can be any model that is built with an architecture supported by watsonx.ai, which greatly expands your options and flexibility in terms of the models that best fit your specific use case.
In this article, we covered the steps necessary to install and deploy a custom foundation model, using the example of a foundation model from Hugging Face. To learn about other deployment options, see the Deploying foundation model documentation.
#watsonx.ai
#MachineLearning
#PromptLab
#GenerativeAI