psabharwal
commited on
Commit
•
819c358
1
Parent(s):
15d4f31
Upload instructions.md
Browse filesComit of instruction files
- instructions.md +224 -0
instructions.md
ADDED
@@ -0,0 +1,224 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
## Instructions to run end-to-end demo
|
2 |
+
|
3 |
+
## Chapters
|
4 |
+
[I. Installation of KServe & its dependencies](#installation-of-kserve--its-dependencies)
|
5 |
+
|
6 |
+
[II. Setting up local MinIO S3 storage](#setting-up-local-minio-s3-storage)
|
7 |
+
|
8 |
+
[III. Setting up your OpenShift AI workbench](#setting-up-your-openshift-ai-workbench)
|
9 |
+
|
10 |
+
[IV. Train model and evaluate](#train-model-and-evaluate)
|
11 |
+
|
12 |
+
[V. Convert model to Caikit format and save to S3 storage](#convert-model-to-caikit-format-and-save-to-s3-storage)
|
13 |
+
|
14 |
+
[V. Deploy model onto Caikit-TGIS Serving Runtime](#deploy-model-onto-caikit-tgis-serving-runtime)
|
15 |
+
|
16 |
+
[VI. Model inference](#model-inference)
|
17 |
+
|
18 |
+
**Prerequisites**
|
19 |
+
* To support training and inference, your cluster needs a node with CPUS, 4 GPUs, and GB memory. Instructions to add GPU support to RHOAI can be found [here](https://docs.google.com/document/d/1T2oc-KZRMboUVuUSGDZnt3VRZ5s885aDRJGYGMkn_Wo/edit#heading=h.9xmhoufikqid).
|
20 |
+
* You have a cluster administrator permissions
|
21 |
+
* You have installed the OpenShift CLI (`oc`)
|
22 |
+
* You have installed the `Red Hat OpenShift Service Mesh Operator`
|
23 |
+
* You have installed the `Red Hat OpenShift Serverless Operator`
|
24 |
+
* You have installed the `Red Hat OpenShift AI Operator` and created a **DataScienceCluster** object
|
25 |
+
|
26 |
+
|
27 |
+
### Installation of KServe & its dependencies
|
28 |
+
Instructions adapted from [Manually installing KServe](https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_self-managed/2-latest/html/serving_models/serving-large-models_serving-large-models#manually-installing-kserve_serving-large-models)
|
29 |
+
1. Git clone this repository
|
30 |
+
```
|
31 |
+
git clone https://github.com/trustyai-explainability/trustyai-detoxify-sft.git
|
32 |
+
```
|
33 |
+
|
34 |
+
2. Login to your OpenShift cluster as a cluster adminstrator
|
35 |
+
```
|
36 |
+
oc login --token=<token>
|
37 |
+
```
|
38 |
+
2. Create the required namespace for Red Hat OpenShift Service Mesh
|
39 |
+
```
|
40 |
+
oc create ns istio-system
|
41 |
+
```
|
42 |
+
|
43 |
+
3. Create a `ServiceMeshControlPlane` object
|
44 |
+
```
|
45 |
+
oc apply -f manifests/kserve/smcp.yaml -n istio-system
|
46 |
+
```
|
47 |
+
4. Sanity check to verify creation of the service mesh instance
|
48 |
+
```
|
49 |
+
oc get pods -n istio-system
|
50 |
+
```
|
51 |
+
Expected output:
|
52 |
+
```
|
53 |
+
NAME READY STATUS RESTARTS AGE
|
54 |
+
istio-egressgateway-7c46668687-fzsqj 1/1 Running 0 22h
|
55 |
+
istio-ingressgateway-77f94d8f85-fhsp9 1/1 Running 0 22h
|
56 |
+
istiod-data-science-smcp-cc8cfd9b8-2rkg4 1/1 Running 0 22h
|
57 |
+
```
|
58 |
+
|
59 |
+
5. Create the required namespace for a `KnativeServing` instance
|
60 |
+
```
|
61 |
+
oc create ns knative-serving
|
62 |
+
```
|
63 |
+
|
64 |
+
6. Create a `ServiceMeshMember` object
|
65 |
+
```
|
66 |
+
oc apply -f manifests/kserve/default-smm.yaml -n knative-serving
|
67 |
+
```
|
68 |
+
|
69 |
+
7. Create and define a `KnativeServing` object
|
70 |
+
```
|
71 |
+
oc apply -f manifests/kserve/knativeserving-istio.yaml -n knative-serving
|
72 |
+
```
|
73 |
+
8. Sanity check to validate creation of the Knative Serving instance
|
74 |
+
```
|
75 |
+
oc get pods -n knative-serving
|
76 |
+
```
|
77 |
+
Expected output:
|
78 |
+
```
|
79 |
+
NAME READY STATUS RESTARTS AGE
|
80 |
+
activator-7586f6f744-nvdlb 2/2 Running 0 22h
|
81 |
+
activator-7586f6f744-sd77w 2/2 Running 0 22h
|
82 |
+
autoscaler-764fdf5d45-p2v98 2/2 Running 0 22h
|
83 |
+
autoscaler-764fdf5d45-x7dc6 2/2 Running 0 22h
|
84 |
+
autoscaler-hpa-7c7c4cd96d-2lkzg 1/1 Running 0 22h
|
85 |
+
autoscaler-hpa-7c7c4cd96d-gks9j 1/1 Running 0 22h
|
86 |
+
controller-5fdfc9567c-6cj9d 1/1 Running 0 22h
|
87 |
+
controller-5fdfc9567c-bf5x7 1/1 Running 0 22h
|
88 |
+
domain-mapping-56ccd85968-2hjvp 1/1 Running 0 22h
|
89 |
+
domain-mapping-56ccd85968-lg6mw 1/1 Running 0 22h
|
90 |
+
domainmapping-webhook-769b88695c-gp2hk 1/1 Running 0 22h
|
91 |
+
domainmapping-webhook-769b88695c-npn8g 1/1 Running 0 22h
|
92 |
+
net-istio-controller-7dfc6f668c-jb4xk 1/1 Running 0 22h
|
93 |
+
net-istio-controller-7dfc6f668c-jxs5p 1/1 Running 0 22h
|
94 |
+
net-istio-webhook-66d8f75d6f-bgd5r 1/1 Running 0 22h
|
95 |
+
net-istio-webhook-66d8f75d6f-hld75 1/1 Running 0 22h
|
96 |
+
webhook-7d49878bc4-8xjbr 1/1 Running 0 22h
|
97 |
+
webhook-7d49878bc4-s4xx4 1/1 Running 0 22h
|
98 |
+
```
|
99 |
+
|
100 |
+
9. From the web console, install KServe by going to **Operators -> Installed Operators** and click on the **Red Hat OpenShift AI Operator**
|
101 |
+
|
102 |
+
10. Click on the **DSC Intialization** tab and click on the **default-dsci** object
|
103 |
+
|
104 |
+
11. Click on the **YAML** tab and in the `spec` section, change the `serviceMesh.managementState` to `Unmanaged`
|
105 |
+
```
|
106 |
+
spec:
|
107 |
+
serviceMesh:
|
108 |
+
managementState: Unmanaged
|
109 |
+
```
|
110 |
+
|
111 |
+
12. Click **Save**
|
112 |
+
|
113 |
+
12. Click on the **Data Science Cluster** tab and click on the **default-dsc** object
|
114 |
+
|
115 |
+
13. Click on the **YAML** tab and in the `spec` section, change the `components.kserve.managementState` and the `components.kserve.serving.managementState` to `Managed`
|
116 |
+
```
|
117 |
+
spec:
|
118 |
+
components:
|
119 |
+
kserve:
|
120 |
+
managementState: Managed
|
121 |
+
serving:
|
122 |
+
managementState: Managed
|
123 |
+
|
124 |
+
```
|
125 |
+
15. Click **Save**
|
126 |
+
|
127 |
+
### Setting up local MinIO S3 storage
|
128 |
+
1. Create a namespace for your project called "detoxify-sft"
|
129 |
+
```
|
130 |
+
oc create namespace detoxify-sft
|
131 |
+
```
|
132 |
+
2. Set up your local MinIO S3 storage in your newly created namespace
|
133 |
+
```
|
134 |
+
oc apply -f manifests/minio/setup-s3.yaml -n detoxify-sft
|
135 |
+
```
|
136 |
+
3. Run the following sanity checks
|
137 |
+
```
|
138 |
+
oc get pods -n detoxify-sft | grep "minio"
|
139 |
+
```
|
140 |
+
Expected output:
|
141 |
+
```
|
142 |
+
NAME READY STATUS RESTARTS AGE
|
143 |
+
minio-7586f6f744-nvdl 1/1 Running 0 22h
|
144 |
+
```
|
145 |
+
|
146 |
+
```
|
147 |
+
oc get route -n detoxify-sft | grep "minio"
|
148 |
+
```
|
149 |
+
Expected output:
|
150 |
+
```
|
151 |
+
NAME STATUS LOCATION SERVICE
|
152 |
+
minio-api Accepted https://minio-api... minio-service
|
153 |
+
minio-ui Accepted https://minio-ui... minio-service
|
154 |
+
```
|
155 |
+
4. Get the MinIO UI location URL and open it in a web browser
|
156 |
+
```
|
157 |
+
oc get route minio-ui -n detoxify-sft
|
158 |
+
```
|
159 |
+
5. Login using the credentials in `manifests/minio/setup-s3.yaml`
|
160 |
+
|
161 |
+
**user**: `minio`
|
162 |
+
|
163 |
+
**password**: `minio123`
|
164 |
+
|
165 |
+
6. Click on **Create a Bucket** and choose a name for your bucket and click on **Create Bucket**
|
166 |
+
|
167 |
+
### Setting up your OpenShift AI workbench
|
168 |
+
1. Go to Red Hat OpenShift AI from the web console
|
169 |
+
|
170 |
+
2. Click on **Data Science Projects** and then click on **Create data science project**
|
171 |
+
|
172 |
+
3. Give your project a name and then click **Create**
|
173 |
+
|
174 |
+
4. Click on the **Workbenches** tab and then create a workbench with a Pytorch notebook image, set the container size to Large, and select a single NVIDIA GPU. Click on **Create Workbench**
|
175 |
+
|
176 |
+
5. Click on **Add data connection** to create a matching data connection for MinIO
|
177 |
+
|
178 |
+
6. Fill out the required fields and then click on **Add data collection**
|
179 |
+
|
180 |
+
7. Once your workbench status changes from **Starting** to **Running**, click on **Open** to open JupyterHub in a web browser
|
181 |
+
|
182 |
+
8. In your JupyterHub environment, launch a terminal and clone this project
|
183 |
+
```
|
184 |
+
git clone https://github.com/trustyai-explainability/trustyai-detoxify-sft.git
|
185 |
+
```
|
186 |
+
8. Go into the `notebooks` directory
|
187 |
+
|
188 |
+
### Train model and evaluate
|
189 |
+
1. Open the `01-sft.ipynb` file
|
190 |
+
|
191 |
+
2. Run each cell in the notebook
|
192 |
+
|
193 |
+
3. Once the model trained and uploaded to HuggingFace Hub, open the `02-eval.ipynb` file and run each cell to compare the model trained on raw input-output pairs vs. the one trained on detoxified prompts
|
194 |
+
|
195 |
+
### Convert model to Caikit format and save to S3 storage
|
196 |
+
1. Open the `03-save_convert_model.ipynb` and run each cell in the notebook to convert the model Caikit format and save it to a MinIO bucket
|
197 |
+
|
198 |
+
### Deploy model onto Caikit-TGIS Serving Runtime
|
199 |
+
1. In the OpenShift AI dashboard, navigate to the project details page and click the **Models** tab
|
200 |
+
|
201 |
+
2. In the **Single-model serving platform** tile, click on deploy model. Provide the following values:
|
202 |
+
|
203 |
+
**Model Name**: `opt-350m-caikit`
|
204 |
+
|
205 |
+
**Serving Runtime**: `Caikit-TGIS Serving Runtime`
|
206 |
+
|
207 |
+
**Model framework**: `caikit`
|
208 |
+
|
209 |
+
**Existing data connection**: `My Storage`
|
210 |
+
|
211 |
+
**Path**: `models/opt-350m-caikit`
|
212 |
+
|
213 |
+
3. Click **Deploy**
|
214 |
+
|
215 |
+
4. Increase the `initialDelaySeconds`
|
216 |
+
```
|
217 |
+
oc patch template caikit-tgis-serving-template --type=='merge' -p '{"spec":{"containers":[{"readinessProbe":"initialDelaySeconds":300, "livenessProbe":"initialDelaySeconds":300}]}}'
|
218 |
+
```
|
219 |
+
5. Wait for the model **Status** to show a green checkmark
|
220 |
+
|
221 |
+
### Model inference
|
222 |
+
1. Return to the JupyterHub environment to test out the deployed model
|
223 |
+
|
224 |
+
2. Click on `03-inference_request.ipynb` and run each cell to make an inference request to the detoxified model
|