qaihm-bot commited on
Commit
c6bcf71
1 Parent(s): ab13cea

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +11 -73
README.md CHANGED
@@ -34,9 +34,12 @@ More details on model performance across various devices, can be found
34
  - Model size: 12.1 MB
35
 
36
 
 
 
37
  | Device | Chipset | Target Runtime | Inference Time (ms) | Peak Memory Range (MB) | Precision | Primary Compute Unit | Target Model
38
  | ---|---|---|---|---|---|---|---|
39
- | Samsung Galaxy S23 Ultra (Android 13) | Snapdragon® 8 Gen 2 | TFLite | 6.971 ms | 9 - 12 MB | INT8 | NPU | [Yolo-NAS-Quantized.tflite](https://huggingface.co/qualcomm/Yolo-NAS-Quantized/blob/main/Yolo-NAS-Quantized.tflite)
 
40
 
41
 
42
  ## Installation
@@ -94,83 +97,18 @@ device. This script does the following:
94
  python -m qai_hub_models.models.yolonas_quantized.export
95
  ```
96
 
97
- ## How does this work?
98
-
99
- This [export script](https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/Yolo-NAS-Quantized/export.py)
100
- leverages [Qualcomm® AI Hub](https://aihub.qualcomm.com/) to optimize, validate, and deploy this model
101
- on-device. Lets go through each step below in detail:
102
-
103
- Step 1: **Compile model for on-device deployment**
104
-
105
- To compile a PyTorch model for on-device deployment, we first trace the model
106
- in memory using the `jit.trace` and then call the `submit_compile_job` API.
107
-
108
- ```python
109
- import torch
110
-
111
- import qai_hub as hub
112
- from qai_hub_models.models.yolonas_quantized import Model
113
-
114
- # Load the model
115
- torch_model = Model.from_pretrained()
116
- torch_model.eval()
117
-
118
- # Device
119
- device = hub.Device("Samsung Galaxy S23")
120
-
121
- # Trace model
122
- input_shape = torch_model.get_input_spec()
123
- sample_inputs = torch_model.sample_inputs()
124
-
125
- pt_model = torch.jit.trace(torch_model, [torch.tensor(data[0]) for _, data in sample_inputs.items()])
126
-
127
- # Compile model on a specific device
128
- compile_job = hub.submit_compile_job(
129
- model=pt_model,
130
- device=device,
131
- input_specs=torch_model.get_input_spec(),
132
- )
133
-
134
- # Get target model to run on-device
135
- target_model = compile_job.get_target_model()
136
-
137
  ```
 
 
 
 
 
 
138
 
139
 
140
- Step 2: **Performance profiling on cloud-hosted device**
141
-
142
- After compiling models from step 1. Models can be profiled model on-device using the
143
- `target_model`. Note that this scripts runs the model on a device automatically
144
- provisioned in the cloud. Once the job is submitted, you can navigate to a
145
- provided job URL to view a variety of on-device performance metrics.
146
- ```python
147
- profile_job = hub.submit_profile_job(
148
- model=target_model,
149
- device=device,
150
- )
151
-
152
  ```
153
 
154
- Step 3: **Verify on-device accuracy**
155
-
156
- To verify the accuracy of the model on-device, you can run on-device inference
157
- on sample input data on the same cloud hosted device.
158
- ```python
159
- input_data = torch_model.sample_inputs()
160
- inference_job = hub.submit_inference_job(
161
- model=target_model,
162
- device=device,
163
- inputs=input_data,
164
- )
165
-
166
- on_device_output = inference_job.download_output_data()
167
-
168
- ```
169
- With the output of the model, you can compute like PSNR, relative errors or
170
- spot check the output with expected output.
171
 
172
- **Note**: This on-device profiling and inference requires access to Qualcomm®
173
- AI Hub. [Sign up for access](https://myaccount.qualcomm.com/signup).
174
 
175
 
176
  ## Run demo on a cloud-hosted device
@@ -209,7 +147,7 @@ Explore all available models on [Qualcomm® AI Hub](https://aihub.qualcomm.com/)
209
  ## License
210
  - The license for the original implementation of Yolo-NAS-Quantized can be found
211
  [here](https://github.com/Deci-AI/super-gradients/blob/master/LICENSE.YOLONAS.md).
212
- - The license for the compiled assets for on-device deployment can be found [here]({deploy_license_url})
213
 
214
  ## References
215
  * [YOLO-NAS by Deci Achieves SOTA Performance on Object Detection Using Neural Architecture Search](https://deci.ai/blog/yolo-nas-object-detection-foundation-model/)
 
34
  - Model size: 12.1 MB
35
 
36
 
37
+
38
+
39
  | Device | Chipset | Target Runtime | Inference Time (ms) | Peak Memory Range (MB) | Precision | Primary Compute Unit | Target Model
40
  | ---|---|---|---|---|---|---|---|
41
+ | Samsung Galaxy S23 Ultra (Android 13) | Snapdragon® 8 Gen 2 | TFLite | 6.973 ms | 10 - 13 MB | INT8 | NPU | [Yolo-NAS-Quantized.tflite](https://huggingface.co/qualcomm/Yolo-NAS-Quantized/blob/main/Yolo-NAS-Quantized.tflite)
42
+
43
 
44
 
45
  ## Installation
 
97
  python -m qai_hub_models.models.yolonas_quantized.export
98
  ```
99
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
100
  ```
101
+ Profile Job summary of Yolo-NAS-Quantized
102
+ --------------------------------------------------
103
+ Device: RB5 (Proxy) (12)
104
+ Estimated Inference Time: 131.37 ms
105
+ Estimated Peak Memory Range: 14.60-23.46 MB
106
+ Compute Units: CPU (203) | Total (203)
107
 
108
 
 
 
 
 
 
 
 
 
 
 
 
 
109
  ```
110
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
111
 
 
 
112
 
113
 
114
  ## Run demo on a cloud-hosted device
 
147
  ## License
148
  - The license for the original implementation of Yolo-NAS-Quantized can be found
149
  [here](https://github.com/Deci-AI/super-gradients/blob/master/LICENSE.YOLONAS.md).
150
+ - The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf)
151
 
152
  ## References
153
  * [YOLO-NAS by Deci Achieves SOTA Performance on Object Detection Using Neural Architecture Search](https://deci.ai/blog/yolo-nas-object-detection-foundation-model/)