Update README.md
Browse files
README.md
CHANGED
@@ -27,6 +27,17 @@ This model is the ONNX version of [https://huggingface.co/SamLowe/roberta-base-g
|
|
27 |
- is faster in inference than normal Transformers, particularly for smaller batch sizes
|
28 |
- in my tests about 2x to 3x as fast for a batch size of 1 on a 8 core 11th gen i7 CPU using ONNXRuntime
|
29 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
### Quantized (INT8) ONNX version
|
31 |
|
32 |
`onnx/model_quantized.onnx` is the int8 quantized version
|
@@ -36,6 +47,19 @@ This model is the ONNX version of [https://huggingface.co/SamLowe/roberta-base-g
|
|
36 |
- is faster in inference than both the full precision ONNX above, and the normal Transformers model
|
37 |
- about 2x as fast for a batch size of 1 on an 8 core 11th gen i7 CPU using ONNXRuntime vs the full precision model above
|
38 |
- which makes it circa 5x as fast as the full precision normal Transformers model (on the above mentioned CPU, for a batch of 1)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
|
40 |
### How to use
|
41 |
|
|
|
27 |
- is faster in inference than normal Transformers, particularly for smaller batch sizes
|
28 |
- in my tests about 2x to 3x as fast for a batch size of 1 on a 8 core 11th gen i7 CPU using ONNXRuntime
|
29 |
|
30 |
+
#### Metrics
|
31 |
+
|
32 |
+
Using a fixed threshold of 0.5 to convert the scores to binary predictions for each label:
|
33 |
+
|
34 |
+
- Accuracy: 0.474
|
35 |
+
- Precision: 0.575
|
36 |
+
- Recall: 0.396
|
37 |
+
- F1: 0.450
|
38 |
+
|
39 |
+
See more details in the SamLowe/roberta-base-go_emotions model card for the increases possible through selecting label-specific thresholds to maximise F1 scores, or another metric.
|
40 |
+
|
41 |
### Quantized (INT8) ONNX version
|
42 |
|
43 |
`onnx/model_quantized.onnx` is the int8 quantized version
|
|
|
47 |
- is faster in inference than both the full precision ONNX above, and the normal Transformers model
|
48 |
- about 2x as fast for a batch size of 1 on an 8 core 11th gen i7 CPU using ONNXRuntime vs the full precision model above
|
49 |
- which makes it circa 5x as fast as the full precision normal Transformers model (on the above mentioned CPU, for a batch of 1)
|
50 |
+
|
51 |
+
#### Metrics for Quantized (INT8) Model
|
52 |
+
|
53 |
+
Using a fixed threshold of 0.5 to convert the scores to binary predictions for each label:
|
54 |
+
|
55 |
+
- Accuracy: 0.475
|
56 |
+
- Precision: 0.582
|
57 |
+
- Recall: 0.398
|
58 |
+
- F1: 0.447
|
59 |
+
|
60 |
+
Note how the metrics are almost identical to the full precision metrics above.
|
61 |
+
|
62 |
+
See more details in the SamLowe/roberta-base-go_emotions model card for the increases possible through selecting label-specific thresholds to maximise F1 scores, or another metric.
|
63 |
|
64 |
### How to use
|
65 |
|