instructions.md · psabharwal/trustai at main

Instructions to run end-to-end demo

Chapters

I. Installation of KServe & its dependencies

II. Setting up local MinIO S3 storage

III. Setting up your OpenShift AI workbench

IV. Train model and evaluate

V. Convert model to Caikit format and save to S3 storage

V. Deploy model onto Caikit-TGIS Serving Runtime

VI. Model inference

Prerequisites

To support training and inference, your cluster needs a node with CPUS, 4 GPUs, and GB memory. Instructions to add GPU support to RHOAI can be found here.
You have a cluster administrator permissions
You have installed the OpenShift CLI (oc)
You have installed the Red Hat OpenShift Service Mesh Operator
You have installed the Red Hat OpenShift Serverless Operator
You have installed the Red Hat OpenShift AI Operator and created a DataScienceCluster object

Installation of KServe & its dependencies

Instructions adapted from Manually installing KServe

Git clone this repository

git clone https://github.com/trustyai-explainability/trustyai-detoxify-sft.git

Login to your OpenShift cluster as a cluster adminstrator
```
oc login --token=<token>
```
Create the required namespace for Red Hat OpenShift Service Mesh
```
oc create ns istio-system
```

Create a ServiceMeshControlPlane object

oc apply -f manifests/kserve/smcp.yaml -n istio-system

Sanity check to verify creation of the service mesh instance

oc get pods -n istio-system

Expected output:

NAME                                          READY   STATUS   	  RESTARTS    AGE
istio-egressgateway-7c46668687-fzsqj          1/1     Running     0           22h
istio-ingressgateway-77f94d8f85-fhsp9         1/1     Running     0           22h
istiod-data-science-smcp-cc8cfd9b8-2rkg4      1/1     Running     0           22h

Create the required namespace for a KnativeServing instance
```
oc create ns knative-serving
```

Create a ServiceMeshMember object

oc apply -f manifests/kserve/default-smm.yaml -n knative-serving

Create and define a KnativeServing object

oc apply -f manifests/kserve/knativeserving-istio.yaml -n knative-serving

Sanity check to validate creation of the Knative Serving instance

oc get pods -n knative-serving

Expected output:

NAME                                     	READY       STATUS    	RESTARTS   	AGE
activator-7586f6f744-nvdlb               	2/2         Running   	0          	22h
activator-7586f6f744-sd77w               	2/2         Running   	0          	22h
autoscaler-764fdf5d45-p2v98             	2/2         Running   	0          	22h
autoscaler-764fdf5d45-x7dc6              	2/2         Running   	0          	22h
autoscaler-hpa-7c7c4cd96d-2lkzg          	1/1         Running   	0          	22h
autoscaler-hpa-7c7c4cd96d-gks9j         	1/1         Running   	0          	22h
controller-5fdfc9567c-6cj9d              	1/1         Running   	0          	22h
controller-5fdfc9567c-bf5x7              	1/1         Running   	0          	22h
domain-mapping-56ccd85968-2hjvp          	1/1         Running   	0          	22h
domain-mapping-56ccd85968-lg6mw          	1/1         Running   	0          	22h
domainmapping-webhook-769b88695c-gp2hk   	1/1         Running     0          	22h
domainmapping-webhook-769b88695c-npn8g   	1/1         Running   	0          	22h
net-istio-controller-7dfc6f668c-jb4xk    	1/1         Running   	0          	22h
net-istio-controller-7dfc6f668c-jxs5p    	1/1         Running   	0          	22h
net-istio-webhook-66d8f75d6f-bgd5r       	1/1         Running   	0          	22h
net-istio-webhook-66d8f75d6f-hld75      	1/1         Running   	0          	22h
webhook-7d49878bc4-8xjbr                 	1/1         Running   	0          	22h
webhook-7d49878bc4-s4xx4                 	1/1         Running   	0          	22h

From the web console, install KServe by going to Operators -> Installed Operators and click on the Red Hat OpenShift AI Operator
Click on the DSC Intialization tab and click on the default-dsci object
Click on the YAML tab and in the spec section, change the serviceMesh.managementState to Unmanaged
```
spec:
serviceMesh:
managementState: Unmanaged
```
Click Save
Click on the Data Science Cluster tab and click on the default-dsc object
Click on the YAML tab and in the spec section, change the components.kserve.managementState and the components.kserve.serving.managementState to Managed
```
spec:
components:
kserve:
    managementState: Managed
    serving:
        managementState: Managed
```
Click Save

Setting up local MinIO S3 storage

Create a namespace for your project called "detoxify-sft"
```
oc create namespace detoxify-sft
```
Set up your local MinIO S3 storage in your newly created namespace
```
oc apply -f manifests/minio/setup-s3.yaml -n detoxify-sft
```

Run the following sanity checks

oc get pods -n detoxify-sft | grep "minio"

Expected output:

NAME                                     	READY       STATUS    	RESTARTS   	AGE
minio-7586f6f744-nvdl                       1/1         Running     0           22h

oc get route -n detoxify-sft | grep "minio"

Expected output:

NAME                                        STATUS    	LOCATION   	            SERVICE
minio-api                                   Accepted    https://minio-api...    minio-service
minio-ui                                    Accepted    https://minio-ui...     minio-service

Get the MinIO UI location URL and open it in a web browser
```
oc get route minio-ui -n detoxify-sft
```
Login using the credentials in manifests/minio/setup-s3.yaml

user: minio

password: minio123
Click on Create a Bucket and choose a name for your bucket and click on Create Bucket

Setting up your OpenShift AI workbench

Go to Red Hat OpenShift AI from the web console
Click on Data Science Projects and then click on Create data science project
Give your project a name and then click Create
Click on the Workbenches tab and then create a workbench with a Pytorch notebook image, set the container size to Large, and select a single NVIDIA GPU. Click on Create Workbench
Click on Add data connection to create a matching data connection for MinIO
Fill out the required fields and then click on Add data collection
Once your workbench status changes from Starting to Running, click on Open to open JupyterHub in a web browser

In your JupyterHub environment, launch a terminal and clone this project

git clone https://github.com/trustyai-explainability/trustyai-detoxify-sft.git

Go into the notebooks directory

Train model and evaluate

Open the 01-sft.ipynb file
Run each cell in the notebook
Once the model trained and uploaded to HuggingFace Hub, open the 02-eval.ipynb file and run each cell to compare the model trained on raw input-output pairs vs. the one trained on detoxified prompts

Convert model to Caikit format and save to S3 storage

Open the 03-save_convert_model.ipynb and run each cell in the notebook to convert the model Caikit format and save it to a MinIO bucket

Deploy model onto Caikit-TGIS Serving Runtime

In the OpenShift AI dashboard, navigate to the project details page and click the Models tab
In the Single-model serving platform tile, click on deploy model. Provide the following values:

Model Name: opt-350m-caikit

Serving Runtime: Caikit-TGIS Serving Runtime

Model framework: caikit

Existing data connection: My Storage

Path: models/opt-350m-caikit
Click Deploy

Increase the initialDelaySeconds

oc patch template caikit-tgis-serving-template  --type=='merge' -p '{"spec":{"containers":[{"readinessProbe":"initialDelaySeconds":300, "livenessProbe":"initialDelaySeconds":300}]}}'

Wait for the model Status to show a green checkmark

Model inference

Return to the JupyterHub environment to test out the deployed model
Click on 03-inference_request.ipynb and run each cell to make an inference request to the detoxified model