Infrastructure as a Service

 View Only

From Training to Model Serving with Red Hat OpenShift Data Science - Part 2

By Alexei Karve posted Sun April 30, 2023 06:23 PM

  

From Training to Model Serving with Red Hat OpenShift Data Science - Part 2

MNIST handwritten digits, Fashion MNIST and CIFAR10 data sets

Introduction

In Part 1, we saw the use of Codeflare/Ray to finetune a huggingface model with imdb for sentiment analysis using the @ray.remote annotation and deployed the model to ModelMesh. In this Part 2, we will use the TorchRun to train the mnist_784 and fashion datasets for the 28x28 images using the Codeflare DDPJobDefinition using two methods for the AppWrapper: "KubeRay with MCAD" and "MCAD with pods". We will first work on building and using a custom image for the Ray cluster, creating the Ray Cluster, training the model using three worker pods with GPUs belonging to Ray Cluster, exporting the model in onnx format, copying multiple models from Ray pod to the notebook pod, testing the model locally within the notebook, copying the model to the S3 bucket, deploying the models to ModelMesh using the InferenceService and finally testing requests to the mnist model using HTTP REST and GRPC. We will also see how to convert and serve the model using OpenVINO-IR format. Next, we will use the method of AppWrapper with pods without Ray Cluster. We will build a custom image within the cluster and push it to OpenShift's internal image registry. We will use this image from the local registry for training with the Fashion MNIST, deploy the model, and submit remote REST and gRPC requests. Finally we will work with the CIFAR10 3x32x32 data set.

Ray is designed to be a general-purpose library, that can run a broad array of distributed compute workloads performantly. If your application is written in Python, you can scale it with Ray, no other infrastructure is required. KubeRay is an open-source toolkit to run Ray applications on Kubernetes. KubeRay provides several tools to simplify managing Ray clusters on Kubernetes. The KubeRay operator converts your Ray configuration into a Ray cluster consisting of one or more Ray nodes; each Ray node occupies its own pod. KubeRay provides three custom resource definitions, RayCluster, RayJob, and RayService. The Multi-Cluster-App-Dispatcher (MCAD) is a Kubernetes controller providing mechanisms for applications to manage batch jobs in a single Kubernetes cluster or multi-Kubernetes-cluster environment. MCAD uses AppWrappers represented as a custom resource (CR) to wrap any Kubernetes object the user provides. Wrapping objects means appending user yaml definitions to “.Spec.GenericItem” level inside the AppWrapper. User objects within an AppWrapper are queued until aggregated resources are available in one of the Kubernetes clusters. When there are not enough aggregate resources available in the cluster to deploy the Kubernetes objects wrapped by the AppWrapper job, the MCAD Controller will queue the entire job (no partial deployments will be created). This is useful for batch workloads that require all resources to be deployed to make progress. For example, some distributed AI Deep Learning jobs define job parameters requiring all learners to be deployed, process and then communicate in a synchronous manner.

Using KubeRay with the Multi-Cluster-App-Dispatcher (MCAD) Kubernetes controller helps to avoid situations that block your ML workload with pods are stuck in a pending state. Specifically, MCAD allows you to queue each of your Ray workloads (resources required by AppWrapper) until resource availability requirements are met. With MCAD, your Ray cluster’s pods will only be created once there is a guarantee that all the pods can be scheduled. To schedule a Ray cluster, we need to place a head pod and worker pods. To accommodate these requirements of these pods, the AppWrapper reserves aggregate resources.

Setting up the Notebook

We use a cluster with OpenShift Server Version: 4.10.53 in this blog post. We assume that you have installed the GPU Operator, Red Hat OpenShift Data Science, the Codeflare Stack and created a Data Science Project as described in Part 1. We can continue to use the same project as in Part 1. Using the git menu, clone the https://github.com/thinkahead/rhods-notebooks.git. Navigate to rhods-notebooks/batch-job.

Custom image for Ray Cluster

As of this writing, the default image used by codeflare-sdk cluster is ghcr.io/foundation-model-stack/base:ray2.1.0-py38-gpu-pytorch1.12.0cu116-20221213-193103. We used this default image in Part 1. Using skopeo to list the images using the command shown below returns images with cu116. There isn't any image with pytorch cu117.

brew install skopeo
#skopeo list-tags docker://ghcr.io/foundation-model-stack/base
skopeo list-tags docker://ghcr.io/foundation-model-stack/base | jq '.Tags[] | select( index("ray") )'

We may use two ways to install the libraries required for our job. One is to update the requirements.txt, and the other to create a custom image for Ray.

  1. Updating the requirements.txt means that the required libraries need to be installed every time the Ray Cluster is created. This makes the start of the training process slow.

    requirements.txt

    --extra-index-url https://download.pytorch.org/whl/nightly/cu117
    torch
    torchvision
    pytorch_lightning==1.5.10
    ray_lightning
    torchmetrics==0.9.1
    onnx
    
  2. The alternative is to build a custom image, the requirements.txt does not need to be passed to scheduler_args.

    Dockerfile.cu117

    FROM ghcr.io/foundation-model-stack/base:ray2.1.0-py38-gpu-pytorch1.12.0cu116-20221213-193103
    RUN pip uninstall torch torchvision -y;pip install --pre torch torchvision --extra-index-url https://download.pytorch.org/whl/nightly/cu117
    RUN pip install pytorch_lightning==1.5.10 ray_lightning torchmetrics==0.9.1 onnx
    RUN chmod 777 /home/ray
    

    Build and push the image

    podman build --format docker -f Dockerfile.cu117 -t quay.io/thinkahead/base:ray2.1.0-py38-gpu-pytorch1.12.0cu117-20230419-1 . --tls-verify=false
    podman push quay.io/thinkahead/base:ray2.1.0-py38-gpu-pytorch1.12.0cu117-20230419-1
    

    Job definition without the scheduler_args looks as follows. We create the Ray cluster as we did in Part 1 by running the relevant sections in the notebook.

    jobdef = DDPJobDefinition(
        name="mnisttest",
        script="mnist.py",
        #scheduler_args={"requirements": "requirements.txt"}
    )
    job = jobdef.submit(cluster)
    

Running the batch_mnist_ray sample - The batch_mnist_ray.ipynb has been modified for training on the Ray cluster in the batch-mnist namespace with 3 worker nodes each with 1 GPU and the ClusterConfiguration shows the image built above being used.

You can create the new namespace.

oc new-project batch-mnist

We will continue using this batch-mnist namespace for the training in the rest of the examples, but will continue to use the huggingface namespace for the notebook and model serving. Now start running the notebook. Update the token and server and create the Ray Cluster.

Watch the AppWrapper create the RayCluster with the Deployment and the corresponding ReplicaSet and pods from your Laptop. Also create the https route for the Ray Dashboard. You will need to do this every time you recreate the Cluster.

watch oc get AppWrapper,RayCluster,svc,routes,deployment,rs,pods -n batch-mnist
oc project batch-mnist
oc get routes -n batch-mnist ray-dashboard-mnisttest -o yaml | sed "s/ray-dashboard-mnisttest/route-ray-dashboard-mnisttest/g" | oc apply -f -
oc -n batch-mnist patch route.route.openshift.io/route-ray-dashboard-mnisttest -p '{"spec":{"tls":{"termination":"edge"}}}'

Output:

NAME                                AGE
appwrapper.mcad.ibm.com/mnisttest   5m49s

NAME                          DESIRED WORKERS   AVAILABLE WORKERS   STATUS   AGE
raycluster.ray.io/mnisttest   3                 3                   ready    5m48s

NAME                         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                                AGE
service/mnisttest-head-svc   ClusterIP   172.30.112.20   <none>        8265/TCP,10001/TCP,8080/TCP,6379/TCP   5m48s

NAME                                               HOST/PORT                                                     PATH   SERVICES             PORT        TERMINATION
WILDCARD
route.route.openshift.io/ray-dashboard-mnisttest   ray-dashboard-mnisttest-batch-mnist.apps.mini2.mydomain.com          mnisttest-head-svc   dashboard
None

NAME                                               READY   STATUS    RESTARTS   AGE
pod/mnisttest-head-2dlzz                           1/1     Running   0          5m49s
pod/mnisttest-worker-small-group-mnisttest-4fnq4   1/1     Running   0          5m49s
pod/mnisttest-worker-small-group-mnisttest-5dkng   1/1     Running   0          5m49s
pod/mnisttest-worker-small-group-mnisttest-ksw9r   1/1     Running   0          5m49s

The source mnist.py shows pytorch convolutions that operate on (batch, channel, width, and height). The MNIST images are black and white and therefore don’t need three different color-channel to represent the final color; instead use one channel. The mnist.py has been modified as follows to convert and export the model into onnx format after the training is completed.

# Train the model
trainer.fit(model)
model.eval()

#torch.save(model, '/tmp/model.pt')

dummy_input = torch.randn(1, 1, 28, 28)
input_names = [ "input_0" ]
output_names = [ "output_0" ]
dynamic_axes={'input_0' : {0 : 'batch_size'},'output_0' : {0 : 'batch_size'}}

#model.to_onnx('/tmp/mnist4.onnx', dummy_input, input_names=input_names, output_names=output_names, dynamic_axes=dynamic_axes)
torch.onnx.export(model, dummy_input, '/tmp/mnist3.onnx', verbose=True, input_names=input_names, output_names=output_names, dynamic_axes=dynamic_axes)
#model.to_onnx('/tmp/mnist2.onnx', dummy_input, input_names=input_names, output_names=output_names)
#torch.onnx.export(model, dummy_input, '/tmp/mnist1.onnx', verbose=True, input_names=input_names, output_names=output_names)

Four exports mnist1.onnx, mnist2.onnx, mnist3.onnx and mnist4.onnx can be seen. The default method uses the torch.onnx.export. PyTorch Lightning has its own method for exporting the model with the model.to_onnx method. We export with and without dynamic_axes. The dynamic_axes in parameter 0 is for providing a batch of inputs for inferencing. We set the self.example_input_array, so the dummy_input does not need to be provided.

self.example_input_array = torch.randn(1, 1, 28, 28)

You can increase the max_epochs in the mnist.py if you want higher accuracy. We will look at early stopping in a later example.

trainer = Trainer(
    accelerator="auto",
    # devices=1 if torch.cuda.is_available() else None,  # limiting got iPython runs
    max_epochs=20,
    callbacks=[TQDMProgressBar(refresh_rate=20)],
    num_nodes=int(os.environ.get("GROUP_WORLD_SIZE", 1)),
    devices=int(os.environ.get("LOCAL_WORLD_SIZE", 1)),
    strategy="ddp",
) 

While the training is in progress, we can check the dashboard by creating a https route route-ray-dashboard-mnisttest we created above as shown in Part 1 or check the output of nvidia-smi using command line on any Ray worker pod to watch the GPU usage.

oc exec -it pod/mnisttest-worker-small-group-mnisttest-cmldq -n batch-mnist -- bash

Defaulted container "machine-learning" out of: machine-learning, init-myservice (init), wait-gcs-ready (init)
(base) 1000990000@mnisttest-worker-small-group-mnisttest-cmldq:~/workspace$ nvidia-smi
Thu Apr 20 06:26:52 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.13    Driver Version: 525.60.13    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A2           On   | 00000000:41:00.0 Off |                    0 |
|  0%   46C    P0    23W /  60W |    970MiB / 15356MiB |     28%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+
(base) 1000990000@mnisttest-worker-small-group-mnisttest-cmldq:~/workspace$ watch nvidia-smi
(base) 1000990000@mnisttest-worker-small-group-mnisttest-cmldq:~/workspace$ exit
Ray Dashboard when training mnist handwritten dataset

The job.status() returns PENDING, RUNNING and when done, it returns SUCCEEDED. If there are any problems, it returns FAILED. The next line job.logs() is used to fetch the logs.

When the job status shows SUCCEEDED, copy the model to the notebook server pod from the Ray worker pod using to “oc cp” command. You will need to run the oc login command in the Terminal window of Jupyter Hub (File->New->Terminal).

oc get all -n batch-mnist # Get one worker pod name
cd rhods-notebooks/batch-job/
oc -n batch-mnist cp mnisttest-worker-small-group-mnisttest-cmldq:/tmp/mnist3.onnx ./mnist3.onnx

Run the section to copy the model to the s3 bucket using the boto3. I created the models 1 through 4 for each of the two export methods (standard torch.onnx.export and Pytorch Lightning model.to_onnx) with and without the dynamic_axes and copied them by running the section multiple times by changing the model_name. The dynamic_axes are for providing a batch input. One we are done with the training and copying, bring down the cluster with:

cluster.down()

Going back to the OpenShift Data Science projects we select our project and configure the Model Mesh Server (under Models and Model Server) with one replica and Medium size. You can also select the Custom size and set the resource limits as desired. Set the Check mark for the Model route. This creates the OpenVINO™ Model Server (ovms) that provides an inference service via gRPC or REST API. It is implemented in C++ to take full advantage of high performance Intel Xeon CPUs or AI accelerators for inference over a network interface. OpenVINO Intermediate Representation (OpenVINO IR .bin & .xml) and Open Neural Network Exchange (ONNX) format models are supported. The OpenVINO™ toolkit supports Intel® Integrated Graphics.

If you want to change the resources for your Model Server, you can also edit the ServingRuntime, it will restart the pod(s).

oc edit ServingRuntime -n huggingface

Create the InferenceService by adding the exported models using the UI or using yaml. Deployment for mnist1.onnx and mnist4.onnx are shown below:

Deploy mnist1.onnx
Deploy mnist4.onnx

The picture below shows the output of deployed models if you deploy all four models:

Deployed four mnist models

The yaml for the InferenceService looks as follows for mnist3 model:

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  annotations:
    openshift.io/display-name: mnist1
    serving.kserve.io/deploymentMode: ModelMesh
  labels:
    name: mnist3
    opendatahub.io/dashboard: "true"
  name: mnist1
  namespace: huggingface
spec:
  predictor:
    model:
      modelFormat:
        name: onnx
        version: "1"
      runtime: model-server-huggingface
      storage:
        key: aws-connection-my-object-store
        path: mnist3.onnx

We can check the ServingRuntime and InferenceServices (isvc):

oc get servingruntime -n huggingface
oc get inferenceservices -n huggingface

Output:

oc get servingruntime -n huggingface

NAME                       DISABLED   MODELTYPE     CONTAINERS   AGE
model-server-huggingface              openvino_ir   ovms         3d22h

oc get inferenceservices -n huggingface

NAME     URL                                         READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION   AGE
mnist1   grpc://modelmesh-serving.huggingface:8033   True                                                                  6m29s
mnist2   grpc://modelmesh-serving.huggingface:8033   True                                                                  20h
mnist3   grpc://modelmesh-serving.huggingface:8033   True                                                                  4h35m
mnist4   grpc://modelmesh-serving.huggingface:8033   True                                                                  3h42m

We fetch the mnist_784 dataset for the 28x28 images in the notebook pod. This MNIST database of handwritten digits consists of 784 features. The raw data is available at http://yann.lecun.com/exdb/mnist/ and consists of 70,000 examples.

from sklearn.datasets import fetch_openml
X, y = fetch_openml('mnist_784', return_X_y=True, parser='auto')

We can use the downloaded onnx models to test requests locally. The batch will only work with the models that were exported with dynamic_axes for parameter 0.

Since ORT 1.9, you are required to explicitly set the providers parameter when instantiating InferenceSession.

session = onnxruntime.InferenceSession(model_file_name, providers=['CUDAExecutionProvider']) # providers=['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
Local requests to onnx model

The output from submitting a remote HTTP REST request to the model for batch and single request from the notebook is shown below.

Submit single and batch requests to mnist onnx modelserver

Next, we use the V2 Inference Protocol for gRPC interface. The ModelInferRequest.InferInputTensor for the payload with data as InferTensorContents is used for the prediction. The V2 Inference Protocol is an industry-wide effort to provide a standardized protocol to communicate with different inference servers (e.g. MLServer, Triton, etc.) and orchestrating frameworks (e.g. Seldon Core, KServe, etc.). For the gRPC, the python command with grpc_tools.protoc generates the kfs_inference_v2_pb2.py and kfs_inference_v2_pb2_grpc.py. We use the GRPCInferenceServiceStub with the insecure channel to submit a ModelInfer request. The content for the request is “fp32_contents” with shape (batch_size,1,28,28) and the response is unpacked to a FP32 array of shape (batch_size,10). The output from submitting a gRPC request to the model for single and batch requests from the notebook is shown below.

gRPC request to model server with single sample
gRPC request to model server with batch of samples

You can run this on your Macbook/Laptop. Convert the model from onnx to OpenVINO IR

pip install openvino-dev
mo --input_model mnist3.onnx

It creates the files:

mnist3.bin  mnist3.mapping  mnist3.xml

Import the mnist_784 dataset

python
>>> from sklearn.datasets import fetch_openml
>>> import numpy as np
>>> import json
>>> import requests
>>> X, y = fetch_openml('mnist_784', return_X_y=True, parser='auto')

Read and compile the openvino model

>>> from openvino.runtime import Core
>>> ie = Core()
>>> model = ie.read_model(model="mnist3.xml")
>>> compiled_model = ie.compile_model(model=model, device_name="CPU")

Output and input layer

>>> output_layer = compiled_model.output(0)
>>> output_layer
<ConstOutput: names[output_0] shape[?,10] type: f32>>>> compiled_model.input(0)
<ConstOutput: names[input_0] shape[?,1,28,28] type: f32>

Single input

>>> result_infer = compiled_model(np.array(X.iloc[0].values.tolist()).reshape(1,1,28,28))[output_layer]
>>> result_infer
array([[-1013.5224 , -1070.5754 ,  -999.4851 ,  -371.72485, -1805.959  ,
            0.     , -1309.7092 , -1083.8599 , -1093.3772 ,  -867.76056]],
      dtype=float32)
>>> result_index = np.argmax(result_infer)
>>> result_index
5

Batch input

>>> result_infer = compiled_model(np.array(X[0:5].values.tolist()).reshape(5,1,28,28))[output_layer]
>>> result_index = np.argmax(result_infer,axis=1)
>>> result_index
array([5, 0, 4, 1, 9])

Upload the bin and xml file to S3 from the Notebook - Copy the mnist3.bin and mnist3.mapping to a mnist3 directory, then run the following section to upload the files to S3 endpoint

import os
import boto3
from boto3 import session
key_id = os.environ.get('AWS_ACCESS_KEY_ID')
secret_key = os.environ.get('AWS_SECRET_ACCESS_KEY')
endpoint_url = os.environ.get('AWS_S3_ENDPOINT')
session = boto3.session.Session(aws_access_key_id=key_id, aws_secret_access_key=secret_key)
s3_client = boto3.client('s3', aws_access_key_id=key_id, aws_secret_access_key=secret_key,endpoint_url=endpoint_url,verify=False)
buckets=s3_client.list_buckets()
s3_client.upload_file(model_name+"/"+model_name+".bin", bucket['Name'],model_name+"/"+model_name+".bin")
s3_client.upload_file(model_name+"/"+model_name+".xml", bucket['Name'],model_name+"/"+model_name+".xml")
[item.get("Key") for item in s3_client.list_objects_v2(Bucket=bucket['Name']).get("Contents")]

Output:

['mnist3/mnist3.bin',
 'mnist3/mnist3.xml']

Deploy the model to ModelMesh - Select the Folder where the bin and xml files for the model were uploaded

Deploy OpenVINO model

Test the REST HTTP Request - The response with the new model name “mnist3openvino” is the same as with the onnx model.

REST request to OpenVINO model

Fashion-MNIST

We can use the Fashion-MNIST dataset, as a drop-in replacement for the MNIST handwritten digits dataset. MNIST is quite trivial with neural networks and we used a fully connected neural network to classify handwritten digits from the MNIST dataset. Fashion-MNIST is a set of 28x28 greyscale images of clothes. It’s more complex than MNIST and a better representation of datasets used in the real world. Although images from the same classes share the same fundamental features, those features can be found at different locations and in different sizes. This requires us to use a technique that accounts for the local relationship of pixels in an image, which is where a Convolutional Neural Network (CNN) becomes useful. We can run exactly the same steps as above using the batch_mnist_fashion_ray.ipynb that uses the mnist_fashion.py. The model uses the Convolutional layer that applies sliding convolutional filters to 2-D input and then the Maxpooling layer to downsample the input representation keeping the most active pixels from the previous layer. The linear and dropout layers to avoid overfitting and produce 10 outputs. A loss function of negative log likelihood loss and Adam optimizer with learning rate of 2e-4 are setup. The GPU usage during training is shown below:

Ray Dashboard GPU usage when training with Fashion Mnist dataset

The actual and expected outputs for the model are pictured below:

Local requests to onnx model for Fashion mnist dataset

AppWrapper using pods for MCAD without Ray Cluster

We can run the Job with the AppWrapper without the Ray Cluster as seen in batch_mnist_mcad.ipynb. The AppWrapper directly creates the pods and runs the torchrun in the command with the correct arguments: rdzv_backend c10d, rdzv_endpoint, rdzv_id, nodes, proc_peer_node, node_rank and the selected python file. We want to use our own python source code in the image, so we first need to build an image.

The following command in the notebook starts a new image build:

!oc -n huggingface new-build --name custom-mnist-image --code https://github.com/thinkahead/rhods-notebooks --context-dir batch-job/custom-image

If you get the error, then delete the buildconfig and run the build again.

error: buildconfigs.build.openshift.io "custom-mnist-image" already exists

When the new build is started, we can check the buildconfig and the build resources:

oc get bc,build,is -n huggingface -l build=custom-mnist-image

NAME                                                TYPE     FROM   LATEST
buildconfig.build.openshift.io/custom-mnist-image   Docker   Git    1

NAME                                            TYPE     FROM          STATUS    STARTED         DURATION
build.build.openshift.io/custom-mnist-image-1   Docker   Git@96d0029   Running   6 minutes ago

We can check the pod where the image is being built

oc get pod -n huggingface -l openshift.io/build.name=custom-mnist-image-1

NAME                         READY   STATUS    RESTARTS   AGE
custom-mnist-image-1-build   1/1     Running   0          8m25s

When the image is built and pushed to the internal image registry, the status of build shows Complete:

oc get bc,build,is,pods -n huggingface
 
NAME                                                TYPE     FROM   LATEST
buildconfig.build.openshift.io/custom-mnist-image   Docker   Git    1


NAME                                            TYPE     FROM          STATUS     STARTED          DURATION
build.build.openshift.io/custom-mnist-image-1   Docker   Git@96d0029   Complete   11 minutes ago   9m28s



NAME                                                IMAGE REPOSITORY                                                                                TAGS
               UPDATED
imagestream.image.openshift.io/base                 default-route-openshift-image-registry.apps.mini2.mydomain.com/huggingface/base                 pytorch-latest-nightly-20230426   18 hours ago
imagestream.image.openshift.io/custom-mnist-image   default-route-openshift-image-registry.apps.mini2.mydomain.com/huggingface/custom-mnist-image   latest
               2 minutes ago


NAME                                                              READY   STATUS      RESTARTS   AGE
pod/custom-mnist-image-1-build                                    0/1     Completed   0          11m

We can wait for the build to be completed in the notebook with:

!oc wait --for=condition=complete build.build.openshift.io/custom-mnist-image-1 -n huggingface --timeout=600s

The logs for the build pod show that the base image is downloaded along with the Dockerfile and sources as specified with “--code https://github.com/thinkahead/rhods-notebooks --context-dir batch-job/custom-image”. You can update the code and context-dir parameters to point to your source repository location.

oc logs custom-mnist-image-1-build -n huggingface

Output:

time="2023-04-27T16:58:57Z" level=info msg="Not using native diff for overlay, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled"
I0427 16:58:57.461980       1 defaults.go:102] Defaulting to storage driver "overlay" with options [mountopt=metacopy=on].
Caching blobs under "/var/cache/blobs".

Pulling image ghcr.io/foundation-model-stack/base@sha256:048c0ec287a7e849f22a4847dc925ae628ff7a2771dc509c267c27b801e1f379 ...
Trying to pull ghcr.io/foundation-model-stack/base@sha256:048c0ec287a7e849f22a4847dc925ae628ff7a2771dc509c267c27b801e1f379... 
…
STEP 1/10: FROM ghcr.io/foundation-model-stack/base@sha256:048c0ec287a7e849f22a4847dc925ae628ff7a2771dc509c267c27b801e1f379
STEP 2/10: COPY mnist_fashion.py mnist_fashion.py
…
COMMIT temp.builder.openshift.io/huggingface/custom-mnist-image-1:5bc33886
--> b3a72e98c0b
Successfully tagged temp.builder.openshift.io/huggingface/custom-mnist-image-1:5bc33886
b3a72e98c0b61865b99a7eea3cd9b79fd02ce985ccc7b3ecdc1efebbb5e7ed6f 

Pushing image image-registry.openshift-image-registry.svc:5000/huggingface/custom-mnist-image:latest ...
Getting image source signatures 
…
Writing manifest to image destination
Storing signatures
Successfully pushed image-registry.openshift-image-registry.svc:5000/huggingface/custom-mnist-image@sha256:1d49659286100e540f44d86cfddc9b407e57552d3344f2e12390d48b15d9c781
Push successful
Create custom image for use by pods created using MCAD

After the image is created, we can delete the buildconfig, build and the relevant pod with:

oc delete bc -n huggingface --selector build=custom-mnist-image

The image stream will contain the image for use by the AppWrapper. Now we can submit the job with the DDPJobDefinition. We also pass the S3 secrets so that the mnist_fashion.py can export and write the onnx model directly to the OUTPUT_PATH on the specified S3 bucket.

import os
jobdef = DDPJobDefinition(
    name="mnistjob",
    script="mnist_fashion.py",
    scheduler_args={"namespace": "huggingface"},
    j="3x1",
    gpu=1,
    cpu=1,
    memMB=8000,
    env={'AWS_ACCESS_KEY_ID':os.environ.get('AWS_ACCESS_KEY_ID'),
         'AWS_SECRET_ACCESS_KEY':os.environ.get('AWS_SECRET_ACCESS_KEY'),
         'AWS_S3_ENDPOINT':os.environ.get('AWS_S3_ENDPOINT'),
         'OUTPUT_PATH':'saved/mymodel.onnx'},
    image="image-registry.openshift-image-registry.svc:5000/huggingface/custom-mnist-image:latest"
    #image="quay.io/michaelclifford/mnist-test:latest"
)
job = jobdef.submit()

The above definition creates 3 pods each with one GPU and runs the training using the custom-image/mnist_fashion.py. This mnist_fashion.py also copies the model to S3 bucket.

if trainer.global_rank==0:
    modelfile='/tmp/mnist3.onnx'
    print("Copying",modelfile)
    import os
    import boto3
    from boto3 import session

    key_id = os.environ.get('AWS_ACCESS_KEY_ID')
    secret_key = os.environ.get('AWS_SECRET_ACCESS_KEY')
    endpoint_url = os.environ.get('AWS_S3_ENDPOINT')
    uploaded_file_name = os.environ.get('OUTPUT_PATH',os.uname()[1])
    session = boto3.session.Session(aws_access_key_id=key_id, aws_secret_access_key=secret_key)
    s3_client = boto3.client('s3', aws_access_key_id=key_id, aws_secret_access_key=secret_key,endpoint_url=endpoint_url,verify=False)
    buckets=s3_client.list_buckets()
    for bucket in buckets['Buckets']: print(bucket['Name'])
    s3_client.upload_file(modelfile, bucket['Name'],uploaded_file_name)
    print('uploaded_file_name',uploaded_file_name)
    print([item.get("Key") for item in s3_client.list_objects_v2(Bucket=bucket['Name']).get("Contents")])

The pod with global rank 0 shows the model being copied:

oc logs mnistjob-nm65qm0x66z6r-0 -f -n huggingface

Output:

…
[0]:Epoch 4: 100%|██████████| 79/79 [00:11<00:00,  7.18it/s, loss=0.246, v_num=0, val_loss=0.252, val_acc=0.913]
[0]:GLOBAL_RANK: is  0
…
[0]:Copying /tmp/mnist3.onnx
[0]:uploaded_file_name saved/mymodel.onnx

Now we can create the Model Server and deploy the model using the model file that was uploaded to the S3 bucket using the RHODS Console.

Deploy Fashion mnist saved/mymodel

Wait for the Green checkmark for the deployed Model “mymodel”

Wait for green checkmark for mymodel

The custom image is no longer required and can be deleted

!oc delete is -n huggingface --selector build=custom-mnist-image

Now, we can run the predictions using the REST API:

Fashion mnist mymodel predictions using REST API

We can also run predictions using gRPC:

gRPC requests to mymodel Fashion mnist dataset

CIFAR10

CIFAR10 dataset (Canadian Institute for Advanced Research) has 10 classes: [0]airplane, [1]automobile, [2]bird, [3]cat, [4]deer, [5]dog, [6]frog, [7]horse, [8]ship and [9]truck. The images in CIFAR-10 are of size 3x32x32, i.e., 3-channel color images of 32x32 pixels in size. Pytorch makes it easy for us to load CIFAR10 directly from torchvision datasets. We make use of pl.LightningDataModule to download the data and create the training and validation DataLoader. The ToTensor transform turns NumPy arrays and PIL images to tensors. It also takes care to lay out the dimensions of the output tensor as C × H × W (channel, height, width). The image is turned into a 3 × 32 × 32 tensor and therefore a 3-channel (RGB) 32 × 32 image. Thus, the ToTensor converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0].

We can run exactly the same steps as in mnist handwritten digits example using the batch_cifar10_ray.ipynb that uses the cifar10.py. The AppWrapper creates the RayCluster with the Deployment and the corresponding ReplicaSet and pods. You will need to create the route with tls if you want to see the Ray Dashboard.

The logs in the Ray dashboard show the torchrun command as the job is executed, the epochs and the validation accuracy as the model is trained, the final export, conversion of the model to onnx and copying to S3 bucket.

[RayActor(name='cifar10', command=['bash', '-c', "torchrun --rdzv_backend static --rdzv_endpoint $TORCHX_RANK0_HOST:49782 --rdzv_id 'cifar10-zwhwm41q4nw9h' --nnodes 3 --nproc_per_node 1 --node_rank '0' --tee 3 --role '' cifar10.py"], env={'AWS_ACCESS_KEY_ID': 'xxxxxx', 'AWS_SECRET_ACCESS_KEY': 'xxxxxx', 'AWS_S3_ENDPOINT': 'http://s3.openshift-storage.svc', 'OUTPUT_PATH': 'saved/cifar10.onnx', 'TORCHX_TRACKING_EXPERIMENT_NAME': 'default-experiment', 'LOGLEVEL': 'WARNING', 'TORCHX_JOB_ID': 'ray://torchx/cifar10-zwhwm41q4nw9h'}, num_cpus=8, num_gpus=1, min_replicas=3), RayActor(name='cifar10', command=['bash', '-c', "torchrun --rdzv_backend static --rdzv_endpoint $TORCHX_RANK0_HOST:49782 --rdzv_id 'cifar10-zwhwm41q4nw9h' --nnodes 3 --nproc_per_node 1 --node_rank '1' --tee 3 --role '' cifar10.py"], env={'AWS_ACCESS_KEY_ID': 'xxxxxx', 'AWS_SECRET_ACCESS_KEY': 'xxxxxx', 'AWS_S3_ENDPOINT': 'http://s3.openshift-storage.svc', 'OUTPUT_PATH': 'saved/cifar10.onnx', 'TORCHX_TRACKING_EXPERIMENT_NAME': 'default-experiment', 'LOGLEVEL': 'WARNING', 'TORCHX_JOB_ID': 'ray://torchx/cifar10-zwhwm41q4nw9h'}, num_cpus=8, num_gpus=1, min_replicas=3), RayActor(name='cifar10', command=['bash', '-c', "torchrun --rdzv_backend static --rdzv_endpoint $TORCHX_RANK0_HOST:49782 --rdzv_id 'cifar10-zwhwm41q4nw9h' --nnodes 3 --nproc_per_node 1 --node_rank '2' --tee 3 --role '' cifar10.py"], env={'AWS_ACCESS_KEY_ID': 'xxxxxx', 'AWS_SECRET_ACCESS_KEY': 'xxxxxx', 'AWS_S3_ENDPOINT': 'http://s3.openshift-storage.svc', 'OUTPUT_PATH': 'saved/cifar10.onnx', 'TORCHX_TRACKING_EXPERIMENT_NAME': 'default-experiment', 'LOGLEVEL': 'WARNING', 'TORCHX_JOB_ID': 'ray://torchx/cifar10-zwhwm41q4nw9h'}, num_cpus=8, num_gpus=1, min_replicas=3)]
22023-04-30 09:37:51,992	INFO worker.py:1230 -- Using address 10.130.0.232:6379 set in the environment variable RAY_ADDRESS
32023-04-30 09:37:51,992	INFO worker.py:1342 -- Connecting to existing Ray cluster at address: 10.130.0.232:6379...
42023-04-30 09:37:52,000	INFO worker.py:1519 -- Connected to Ray cluster. View the dashboard at [1m[32mhttp://10.130.0.232:8265 [39m[22m
5Waiting for minimum placement group to start.
6Successfully created placement groups
7rdzv_endpoint set to 10.130.0.231 for actor 77cc879f31c37e5c74adc45c04000000
8rdzv_endpoint set to 10.130.0.231 for actor 1a4f83a567ee364ec3693a5a04000000
9rdzv_endpoint set to 10.130.0.231 for actor e087f0f000f73df633e5bd4604000000
10Successfully placed command actors
11Entering main loop, start executing the script on worker nodes
…
565[2m[36m(CommandActor pid=221, ip=10.130.0.231)[0m [0]:initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/3
566[2m[36m(CommandActor pid=221, ip=10.130.0.231)[0m [0]:----------------------------------------------------------------------------------------------------
567[2m[36m(CommandActor pid=221, ip=10.130.0.231)[0m [0]:distributed_backend=nccl
568[2m[36m(CommandActor pid=221, ip=10.130.0.231)[0m [0]:All distributed processes registered. Starting with 3 processes
569[2m[36m(CommandActor pid=221, ip=10.130.0.231)[0m [0]:----------------------------------------------------------------------------------------------------
570[2m[36m(CommandActor pid=221, ip=10.130.0.231)[0m [0]:
571[2m[36m(CommandActor pid=221, ip=10.130.0.231)[0m [0]:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
572[2m[36m(CommandActor pid=221, ip=10.130.0.231)[0m [0]:Missing logger folder: /tmp/ray/session_2023-04-30_08-46-48_980230_8/runtime_resources/working_dir_files/_ray_pkg_485a26e680218c33/lightning_logs
573[2m[36m(CommandActor pid=221, ip=10.130.0.231)[0m [0]:
574[2m[36m(CommandActor pid=221, ip=10.130.0.231)[0m [0]:  | Name          | Type       | Params | In sizes        | Out sizes      
575[2m[36m(CommandActor pid=221, ip=10.130.0.231)[0m [0]:---------------------------------------------------------------------------------
576[2m[36m(CommandActor pid=221, ip=10.130.0.231)[0m [0]:0 | val_accuracy  | Accuracy   | 0      | ?               | ?              
577[2m[36m(CommandActor pid=221, ip=10.130.0.231)[0m [0]:1 | test_accuracy | Accuracy   | 0      | ?               | ?              
578[2m[36m(CommandActor pid=221, ip=10.130.0.231)[0m [0]:2 | convlayer1    | Sequential | 1.9 K  | [1, 3, 32, 32]  | [1, 64, 16, 16]
579[2m[36m(CommandActor pid=221, ip=10.130.0.231)[0m [0]:3 | convlayer2    | Sequential | 74.1 K | [1, 64, 16, 16] | [1, 128, 7, 7] 
580[2m[36m(CommandActor pid=221, ip=10.130.0.231)[0m [0]:4 | convlayer3    | Sequential | 295 K  | [1, 128, 7, 7]  | [1, 256, 2, 2] 
581[2m[36m(CommandActor pid=221, ip=10.130.0.231)[0m [0]:5 | fc1           | Linear     | 4.2 M  | [1, 1024]       | [1, 4096]      
582[2m[36m(CommandActor pid=221, ip=10.130.0.231)[0m [0]:6 | drop          | Dropout    | 0      | [1, 4096]       | [1, 4096]      
583[2m[36m(CommandActor pid=221, ip=10.130.0.231)[0m [0]:7 | fc2           | Linear     | 4.2 M  | [1, 4096]       | [1, 1024]      
584[2m[36m(CommandActor pid=221, ip=10.130.0.231)[0m [0]:8 | drop2         | Dropout    | 0      | [1, 1024]       | [1, 1024]      
585[2m[36m(CommandActor pid=221, ip=10.130.0.231)[0m [0]:9 | fc3           | Linear     | 10.2 K | [1, 1024]       | [1, 10]        
586[2m[36m(CommandActor pid=221, ip=10.130.0.231)[0m [0]:---------------------------------------------------------------------------------
587[2m[36m(CommandActor pid=221, ip=10.130.0.231)[0m [0]:8.8 M     Trainable params
588[2m[36m(CommandActor pid=221, ip=10.130.0.231)[0m [0]:0         Non-trainable params
589[2m[36m(CommandActor pid=221, ip=10.130.0.231)[0m [0]:8.8 M     Total params
590[2m[36m(CommandActor pid=221, ip=10.130.0.231)[0m [0]:35.103    Total estimated model params size (MB)
…
1699[2m[36m(CommandActor pid=221, ip=10.130.0.231)[0m [0]:Epoch 63: 100%|██████████| 66/66 [00:10<00:00,  6.55it/s, loss=0.447, v_num=0, val_loss=0.582, val_acc=0.797]
…
1734[2m[36m(CommandActor pid=221, ip=10.130.0.231)[0m [0]:GLOBAL_RANK: is  0
1735[2m[36m(CommandActor pid=221, ip=10.130.0.231)[0m [0]:Copying /tmp/cifar10.onnx
1737[2m[36m(CommandActor pid=221, ip=10.130.0.231)[0m [0]:uploaded_file_name saved/cifar10.onnx

You may copy the model from the worker pod to the Notebook pod and directly run the onnx model in the Notebook as shown in the “Load the onnx model” section and then the “Inference using the onnx model” section.

oc -n batch-mnist cp mnisttest-worker-small-group-mnisttest-5crf6:/tmp/cifar10.onnx cifar10.onnx 
CIFAR10 onnx local inferencing output

In this sample, we also copy the onnx model to the S3 bucket directly using the environment variables passed from the notebook to the cifar10.py. This model can be served by the Model Mesh using the path OUTPUT_PATH=saved/cifar10.onnx. You may also copy the onnx model from the worker pod as shown in Part 1 and deploy the Model.

Deploy CIFAR10 onnx model

Alternatively, convert the model from onnx to OpenVINO IR and copy to S3 bucket by specifying the folder where the bin and xml files are copied.

Deploy CIFAR10 OpenVINO model

We can alternatively deploy the InferenceService using the following yaml where the storage points to the aws-connection-my-object-store secret and the path cifar10 is the folder with the model.

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  annotations:
    openshift.io/display-name: cifar10
    serving.kserve.io/deploymentMode: ModelMesh
  labels:
    name: cifar10
    opendatahub.io/dashboard: "true"
  name: cifar10
  namespace: huggingface
spec:
  predictor:
    model:
      modelFormat:
        name: openvino_ir
        version: opset1
      runtime: model-server-huggingface
      storage:
        key: aws-connection-my-object-store
        path: cifar10

Then, test the prediction requests using REST or gRPC requests to the model.

Inferencing requests using CIFAR10

Conclusion

In this Part 2, we saw how to train the mnist handwritten digits, fashion mnist and cifar10 batch samples using two mechanisms with CodeFlare AppWrapper: RayCluster and directly with pods. We also saw how to build a custom image using OpenShift with our sources from github and run the sample using PyTorch Distributed Data Parallel with torchrun when running directly with pods. Finally we deployed and exercised the models in onnx and OpenVINO IR formats. In Part 3, we will look into running the onnx model with GPU and quantization.

Hope you have enjoyed this article. Share your thoughts in the comments or engage in the conversation with me on Twitter @aakarve.

References

  1. Converting Machine Learning Models to ONNX format
  2. Convert Pytorch lightning model to ONNX
  3. Load and predict with ONNX Runtime and a very simple model
  4. Deploy Models Into Production
  5. Dynamic axes
  6. Older batch_mnist sample
  7. Onnx export with device gpu
  8. Convert ONNX to OpenVINO
  9. Normalize Images
  10. Mean and Std of CIFAR10 dataset
  11. Model Checkpoint
  12. InferTensorContents - gRPC v2 Inference Protocol

#RedHatOpenShift #DataScienceExperience #Jupyter #grpc #MachineLearning #Notebook #rhods #onnx #openvino #cifar10 #mnist 


#infrastructure-highlights

0 comments
57 views

Permalink