psabharwal commited on
Commit
819c358
1 Parent(s): 15d4f31

Upload instructions.md

Browse files

Comit of instruction files

Files changed (1) hide show
  1. instructions.md +224 -0
instructions.md ADDED
@@ -0,0 +1,224 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Instructions to run end-to-end demo
2
+
3
+ ## Chapters
4
+ [I. Installation of KServe & its dependencies](#installation-of-kserve--its-dependencies)
5
+
6
+ [II. Setting up local MinIO S3 storage](#setting-up-local-minio-s3-storage)
7
+
8
+ [III. Setting up your OpenShift AI workbench](#setting-up-your-openshift-ai-workbench)
9
+
10
+ [IV. Train model and evaluate](#train-model-and-evaluate)
11
+
12
+ [V. Convert model to Caikit format and save to S3 storage](#convert-model-to-caikit-format-and-save-to-s3-storage)
13
+
14
+ [V. Deploy model onto Caikit-TGIS Serving Runtime](#deploy-model-onto-caikit-tgis-serving-runtime)
15
+
16
+ [VI. Model inference](#model-inference)
17
+
18
+ **Prerequisites**
19
+ * To support training and inference, your cluster needs a node with CPUS, 4 GPUs, and GB memory. Instructions to add GPU support to RHOAI can be found [here](https://docs.google.com/document/d/1T2oc-KZRMboUVuUSGDZnt3VRZ5s885aDRJGYGMkn_Wo/edit#heading=h.9xmhoufikqid).
20
+ * You have a cluster administrator permissions
21
+ * You have installed the OpenShift CLI (`oc`)
22
+ * You have installed the `Red Hat OpenShift Service Mesh Operator`
23
+ * You have installed the `Red Hat OpenShift Serverless Operator`
24
+ * You have installed the `Red Hat OpenShift AI Operator` and created a **DataScienceCluster** object
25
+
26
+
27
+ ### Installation of KServe & its dependencies
28
+ Instructions adapted from [Manually installing KServe](https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_self-managed/2-latest/html/serving_models/serving-large-models_serving-large-models#manually-installing-kserve_serving-large-models)
29
+ 1. Git clone this repository
30
+ ```
31
+ git clone https://github.com/trustyai-explainability/trustyai-detoxify-sft.git
32
+ ```
33
+
34
+ 2. Login to your OpenShift cluster as a cluster adminstrator
35
+ ```
36
+ oc login --token=<token>
37
+ ```
38
+ 2. Create the required namespace for Red Hat OpenShift Service Mesh
39
+ ```
40
+ oc create ns istio-system
41
+ ```
42
+
43
+ 3. Create a `ServiceMeshControlPlane` object
44
+ ```
45
+ oc apply -f manifests/kserve/smcp.yaml -n istio-system
46
+ ```
47
+ 4. Sanity check to verify creation of the service mesh instance
48
+ ```
49
+ oc get pods -n istio-system
50
+ ```
51
+ Expected output:
52
+ ```
53
+ NAME READY STATUS RESTARTS AGE
54
+ istio-egressgateway-7c46668687-fzsqj 1/1 Running 0 22h
55
+ istio-ingressgateway-77f94d8f85-fhsp9 1/1 Running 0 22h
56
+ istiod-data-science-smcp-cc8cfd9b8-2rkg4 1/1 Running 0 22h
57
+ ```
58
+
59
+ 5. Create the required namespace for a `KnativeServing` instance
60
+ ```
61
+ oc create ns knative-serving
62
+ ```
63
+
64
+ 6. Create a `ServiceMeshMember` object
65
+ ```
66
+ oc apply -f manifests/kserve/default-smm.yaml -n knative-serving
67
+ ```
68
+
69
+ 7. Create and define a `KnativeServing` object
70
+ ```
71
+ oc apply -f manifests/kserve/knativeserving-istio.yaml -n knative-serving
72
+ ```
73
+ 8. Sanity check to validate creation of the Knative Serving instance
74
+ ```
75
+ oc get pods -n knative-serving
76
+ ```
77
+ Expected output:
78
+ ```
79
+ NAME READY STATUS RESTARTS AGE
80
+ activator-7586f6f744-nvdlb 2/2 Running 0 22h
81
+ activator-7586f6f744-sd77w 2/2 Running 0 22h
82
+ autoscaler-764fdf5d45-p2v98 2/2 Running 0 22h
83
+ autoscaler-764fdf5d45-x7dc6 2/2 Running 0 22h
84
+ autoscaler-hpa-7c7c4cd96d-2lkzg 1/1 Running 0 22h
85
+ autoscaler-hpa-7c7c4cd96d-gks9j 1/1 Running 0 22h
86
+ controller-5fdfc9567c-6cj9d 1/1 Running 0 22h
87
+ controller-5fdfc9567c-bf5x7 1/1 Running 0 22h
88
+ domain-mapping-56ccd85968-2hjvp 1/1 Running 0 22h
89
+ domain-mapping-56ccd85968-lg6mw 1/1 Running 0 22h
90
+ domainmapping-webhook-769b88695c-gp2hk 1/1 Running 0 22h
91
+ domainmapping-webhook-769b88695c-npn8g 1/1 Running 0 22h
92
+ net-istio-controller-7dfc6f668c-jb4xk 1/1 Running 0 22h
93
+ net-istio-controller-7dfc6f668c-jxs5p 1/1 Running 0 22h
94
+ net-istio-webhook-66d8f75d6f-bgd5r 1/1 Running 0 22h
95
+ net-istio-webhook-66d8f75d6f-hld75 1/1 Running 0 22h
96
+ webhook-7d49878bc4-8xjbr 1/1 Running 0 22h
97
+ webhook-7d49878bc4-s4xx4 1/1 Running 0 22h
98
+ ```
99
+
100
+ 9. From the web console, install KServe by going to **Operators -> Installed Operators** and click on the **Red Hat OpenShift AI Operator**
101
+
102
+ 10. Click on the **DSC Intialization** tab and click on the **default-dsci** object
103
+
104
+ 11. Click on the **YAML** tab and in the `spec` section, change the `serviceMesh.managementState` to `Unmanaged`
105
+ ```
106
+ spec:
107
+ serviceMesh:
108
+ managementState: Unmanaged
109
+ ```
110
+
111
+ 12. Click **Save**
112
+
113
+ 12. Click on the **Data Science Cluster** tab and click on the **default-dsc** object
114
+
115
+ 13. Click on the **YAML** tab and in the `spec` section, change the `components.kserve.managementState` and the `components.kserve.serving.managementState` to `Managed`
116
+ ```
117
+ spec:
118
+ components:
119
+ kserve:
120
+ managementState: Managed
121
+ serving:
122
+ managementState: Managed
123
+
124
+ ```
125
+ 15. Click **Save**
126
+
127
+ ### Setting up local MinIO S3 storage
128
+ 1. Create a namespace for your project called "detoxify-sft"
129
+ ```
130
+ oc create namespace detoxify-sft
131
+ ```
132
+ 2. Set up your local MinIO S3 storage in your newly created namespace
133
+ ```
134
+ oc apply -f manifests/minio/setup-s3.yaml -n detoxify-sft
135
+ ```
136
+ 3. Run the following sanity checks
137
+ ```
138
+ oc get pods -n detoxify-sft | grep "minio"
139
+ ```
140
+ Expected output:
141
+ ```
142
+ NAME READY STATUS RESTARTS AGE
143
+ minio-7586f6f744-nvdl 1/1 Running 0 22h
144
+ ```
145
+
146
+ ```
147
+ oc get route -n detoxify-sft | grep "minio"
148
+ ```
149
+ Expected output:
150
+ ```
151
+ NAME STATUS LOCATION SERVICE
152
+ minio-api Accepted https://minio-api... minio-service
153
+ minio-ui Accepted https://minio-ui... minio-service
154
+ ```
155
+ 4. Get the MinIO UI location URL and open it in a web browser
156
+ ```
157
+ oc get route minio-ui -n detoxify-sft
158
+ ```
159
+ 5. Login using the credentials in `manifests/minio/setup-s3.yaml`
160
+
161
+ **user**: `minio`
162
+
163
+ **password**: `minio123`
164
+
165
+ 6. Click on **Create a Bucket** and choose a name for your bucket and click on **Create Bucket**
166
+
167
+ ### Setting up your OpenShift AI workbench
168
+ 1. Go to Red Hat OpenShift AI from the web console
169
+
170
+ 2. Click on **Data Science Projects** and then click on **Create data science project**
171
+
172
+ 3. Give your project a name and then click **Create**
173
+
174
+ 4. Click on the **Workbenches** tab and then create a workbench with a Pytorch notebook image, set the container size to Large, and select a single NVIDIA GPU. Click on **Create Workbench**
175
+
176
+ 5. Click on **Add data connection** to create a matching data connection for MinIO
177
+
178
+ 6. Fill out the required fields and then click on **Add data collection**
179
+
180
+ 7. Once your workbench status changes from **Starting** to **Running**, click on **Open** to open JupyterHub in a web browser
181
+
182
+ 8. In your JupyterHub environment, launch a terminal and clone this project
183
+ ```
184
+ git clone https://github.com/trustyai-explainability/trustyai-detoxify-sft.git
185
+ ```
186
+ 8. Go into the `notebooks` directory
187
+
188
+ ### Train model and evaluate
189
+ 1. Open the `01-sft.ipynb` file
190
+
191
+ 2. Run each cell in the notebook
192
+
193
+ 3. Once the model trained and uploaded to HuggingFace Hub, open the `02-eval.ipynb` file and run each cell to compare the model trained on raw input-output pairs vs. the one trained on detoxified prompts
194
+
195
+ ### Convert model to Caikit format and save to S3 storage
196
+ 1. Open the `03-save_convert_model.ipynb` and run each cell in the notebook to convert the model Caikit format and save it to a MinIO bucket
197
+
198
+ ### Deploy model onto Caikit-TGIS Serving Runtime
199
+ 1. In the OpenShift AI dashboard, navigate to the project details page and click the **Models** tab
200
+
201
+ 2. In the **Single-model serving platform** tile, click on deploy model. Provide the following values:
202
+
203
+ **Model Name**: `opt-350m-caikit`
204
+
205
+ **Serving Runtime**: `Caikit-TGIS Serving Runtime`
206
+
207
+ **Model framework**: `caikit`
208
+
209
+ **Existing data connection**: `My Storage`
210
+
211
+ **Path**: `models/opt-350m-caikit`
212
+
213
+ 3. Click **Deploy**
214
+
215
+ 4. Increase the `initialDelaySeconds`
216
+ ```
217
+ oc patch template caikit-tgis-serving-template --type=='merge' -p '{"spec":{"containers":[{"readinessProbe":"initialDelaySeconds":300, "livenessProbe":"initialDelaySeconds":300}]}}'
218
+ ```
219
+ 5. Wait for the model **Status** to show a green checkmark
220
+
221
+ ### Model inference
222
+ 1. Return to the JupyterHub environment to test out the deployed model
223
+
224
+ 2. Click on `03-inference_request.ipynb` and run each cell to make an inference request to the detoxified model