shreyajn commited on
Commit
2553552
1 Parent(s): ea41a2e

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +63 -31
README.md CHANGED
@@ -38,10 +38,10 @@ More details on model performance across various devices, can be found
38
 
39
  | Device | Chipset | Target Runtime | Inference Time (ms) | Peak Memory Range (MB) | Precision | Primary Compute Unit | Target Model
40
  | ---|---|---|---|---|---|---|---|
41
- | Samsung Galaxy S23 Ultra (Android 13) | Snapdragon® 8 Gen 2 | TFLite | 608.885 ms | 80 - 511 MB | FP16 | GPU | [WhisperEncoder.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoder.tflite)
42
- | Samsung Galaxy S23 Ultra (Android 13) | Snapdragon® 8 Gen 2 | TFLite | 25.793 ms | 16 - 19 MB | FP16 | NPU | [WhisperDecoder.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoder.tflite)
43
- | Samsung Galaxy S23 Ultra (Android 13) | Snapdragon® 8 Gen 2 | QNN Model Library | 1873.577 ms | 0 - 226 MB | FP16 | NPU | [WhisperEncoder.so](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoder.so)
44
- | Samsung Galaxy S23 Ultra (Android 13) | Snapdragon® 8 Gen 2 | QNN Model Library | 12.151 ms | 61 - 130 MB | FP16 | NPU | [WhisperDecoder.so](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoder.so)
45
 
46
 
47
 
@@ -103,16 +103,16 @@ python -m qai_hub_models.models.whisper_small_en.export
103
  ```
104
  Profile Job summary of WhisperEncoder
105
  --------------------------------------------------
106
- Device: SA8255 (Proxy) (13)
107
- Estimated Inference Time: 1882.60 ms
108
- Estimated Peak Memory Range: 0.52-223.35 MB
109
  Compute Units: NPU (1329) | Total (1329)
110
 
111
  Profile Job summary of WhisperDecoder
112
  --------------------------------------------------
113
- Device: SA8255 (Proxy) (13)
114
- Estimated Inference Time: 11.79 ms
115
- Estimated Peak Memory Range: 60.69-127.91 MB
116
  Compute Units: NPU (2255) | Total (2255)
117
 
118
 
@@ -134,29 +134,49 @@ in memory using the `jit.trace` and then call the `submit_compile_job` API.
134
  import torch
135
 
136
  import qai_hub as hub
137
- from qai_hub_models.models.whisper_small_en import Model
138
 
139
  # Load the model
140
- torch_model = Model.from_pretrained()
 
 
 
141
 
142
  # Device
143
  device = hub.Device("Samsung Galaxy S23")
144
 
 
145
  # Trace model
146
- input_shape = torch_model.get_input_spec()
147
- sample_inputs = torch_model.sample_inputs()
148
 
149
- pt_model = torch.jit.trace(torch_model, [torch.tensor(data[0]) for _, data in sample_inputs.items()])
150
 
151
  # Compile model on a specific device
152
- compile_job = hub.submit_compile_job(
153
- model=pt_model,
154
  device=device,
155
- input_specs=torch_model.get_input_spec(),
156
  )
157
 
158
  # Get target model to run on-device
159
- target_model = compile_job.get_target_model()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
160
 
161
  ```
162
 
@@ -168,10 +188,16 @@ After compiling models from step 1. Models can be profiled model on-device using
168
  provisioned in the cloud. Once the job is submitted, you can navigate to a
169
  provided job URL to view a variety of on-device performance metrics.
170
  ```python
171
- profile_job = hub.submit_profile_job(
172
- model=target_model,
173
- device=device,
174
- )
 
 
 
 
 
 
175
 
176
  ```
177
 
@@ -180,14 +206,20 @@ Step 3: **Verify on-device accuracy**
180
  To verify the accuracy of the model on-device, you can run on-device inference
181
  on sample input data on the same cloud hosted device.
182
  ```python
183
- input_data = torch_model.sample_inputs()
184
- inference_job = hub.submit_inference_job(
185
- model=target_model,
186
- device=device,
187
- inputs=input_data,
188
- )
189
-
190
- on_device_output = inference_job.download_output_data()
 
 
 
 
 
 
191
 
192
  ```
193
  With the output of the model, you can compute like PSNR, relative errors or
 
38
 
39
  | Device | Chipset | Target Runtime | Inference Time (ms) | Peak Memory Range (MB) | Precision | Primary Compute Unit | Target Model
40
  | ---|---|---|---|---|---|---|---|
41
+ | Samsung Galaxy S23 Ultra (Android 13) | Snapdragon® 8 Gen 2 | TFLite | 623.35 ms | 97 - 531 MB | FP16 | GPU | [WhisperEncoder.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoder.tflite)
42
+ | Samsung Galaxy S23 Ultra (Android 13) | Snapdragon® 8 Gen 2 | TFLite | 54.275 ms | 122 - 125 MB | FP16 | NPU | [WhisperDecoder.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoder.tflite)
43
+ | Samsung Galaxy S23 Ultra (Android 13) | Snapdragon® 8 Gen 2 | QNN Model Library | 1890.126 ms | 0 - 243 MB | FP16 | NPU | [WhisperEncoder.so](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoder.so)
44
+ | Samsung Galaxy S23 Ultra (Android 13) | Snapdragon® 8 Gen 2 | QNN Model Library | 11.949 ms | 61 - 128 MB | FP16 | NPU | [WhisperDecoder.so](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoder.so)
45
 
46
 
47
 
 
103
  ```
104
  Profile Job summary of WhisperEncoder
105
  --------------------------------------------------
106
+ Device: Snapdragon X Elite CRD (11)
107
+ Estimated Inference Time: 1087.18 ms
108
+ Estimated Peak Memory Range: 0.46-0.46 MB
109
  Compute Units: NPU (1329) | Total (1329)
110
 
111
  Profile Job summary of WhisperDecoder
112
  --------------------------------------------------
113
+ Device: Snapdragon X Elite CRD (11)
114
+ Estimated Inference Time: 10.68 ms
115
+ Estimated Peak Memory Range: 60.75-60.75 MB
116
  Compute Units: NPU (2255) | Total (2255)
117
 
118
 
 
134
  import torch
135
 
136
  import qai_hub as hub
137
+ from qai_hub_models.models.whisper_small_en import WhisperEncoder,WhisperDecoder
138
 
139
  # Load the model
140
+ encoder_model = WhisperEncoder.from_pretrained()
141
+
142
+ decoder_model = WhisperDecoder.from_pretrained()
143
+
144
 
145
  # Device
146
  device = hub.Device("Samsung Galaxy S23")
147
 
148
+
149
  # Trace model
150
+ encoder_input_shape = encoder_model.get_input_spec()
151
+ encoder_sample_inputs = encoder_model.sample_inputs()
152
 
153
+ traced_encoder_model = torch.jit.trace(encoder_model, [torch.tensor(data[0]) for _, data in encoder_sample_inputs.items()])
154
 
155
  # Compile model on a specific device
156
+ encoder_compile_job = hub.submit_compile_job(
157
+ model=traced_encoder_model ,
158
  device=device,
159
+ input_specs=encoder_model.get_input_spec(),
160
  )
161
 
162
  # Get target model to run on-device
163
+ encoder_target_model = encoder_compile_job.get_target_model()
164
+
165
+ # Trace model
166
+ decoder_input_shape = decoder_model.get_input_spec()
167
+ decoder_sample_inputs = decoder_model.sample_inputs()
168
+
169
+ traced_decoder_model = torch.jit.trace(decoder_model, [torch.tensor(data[0]) for _, data in decoder_sample_inputs.items()])
170
+
171
+ # Compile model on a specific device
172
+ decoder_compile_job = hub.submit_compile_job(
173
+ model=traced_decoder_model ,
174
+ device=device,
175
+ input_specs=decoder_model.get_input_spec(),
176
+ )
177
+
178
+ # Get target model to run on-device
179
+ decoder_target_model = decoder_compile_job.get_target_model()
180
 
181
  ```
182
 
 
188
  provisioned in the cloud. Once the job is submitted, you can navigate to a
189
  provided job URL to view a variety of on-device performance metrics.
190
  ```python
191
+
192
+ encoder_profile_job = hub.submit_profile_job(
193
+ model=encoder_target_model,
194
+ device=device,
195
+ )
196
+
197
+ decoder_profile_job = hub.submit_profile_job(
198
+ model=decoder_target_model,
199
+ device=device,
200
+ )
201
 
202
  ```
203
 
 
206
  To verify the accuracy of the model on-device, you can run on-device inference
207
  on sample input data on the same cloud hosted device.
208
  ```python
209
+ encoder_input_data = encoder_model.sample_inputs()
210
+ encoder_inference_job = hub.submit_inference_job(
211
+ model=encoder_target_model,
212
+ device=device,
213
+ inputs=encoder_input_data,
214
+ )
215
+ encoder_inference_job.download_output_data()
216
+ decoder_input_data = decoder_model.sample_inputs()
217
+ decoder_inference_job = hub.submit_inference_job(
218
+ model=decoder_target_model,
219
+ device=device,
220
+ inputs=decoder_input_data,
221
+ )
222
+ decoder_inference_job.download_output_data()
223
 
224
  ```
225
  With the output of the model, you can compute like PSNR, relative errors or