updated README.md (added latency table)
Browse files
README.md
CHANGED
@@ -25,13 +25,40 @@ This repository contains TensorRT engines with mixed precission int8 + fp32. You
|
|
25 |
|
26 |
ONNX model generated by [ENOT-AutoDL](https://pypi.org/project/enot-autodl/) and build script will be published soon.
|
27 |
|
28 |
-
##
|
29 |
|
30 |
-
| |INT8|FP32|
|
31 |
-
|
32 |
-
| **Lambada Acc** |78.
|
33 |
-
| **Model size (GB)** |8.5|24.2|
|
34 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
|
36 |
## How to use
|
37 |
|
|
|
25 |
|
26 |
ONNX model generated by [ENOT-AutoDL](https://pypi.org/project/enot-autodl/) and build script will be published soon.
|
27 |
|
28 |
+
## Metrics:
|
29 |
|
30 |
+
| |TensorRT INT8+FP32|torch FP16|torch FP32|
|
31 |
+
|---|:---:|:---:|:---:|
|
32 |
+
| **Lambada Acc** |78.79%|79.17%|-|
|
33 |
+
| **Model size (GB)** |8.5|12.1|24.2|
|
34 |
|
35 |
+
### Test environment
|
36 |
+
|
37 |
+
* GPU RTX 4090
|
38 |
+
* CPU 11th Gen Intel(R) Core(TM) i7-11700K
|
39 |
+
* TensorRT 8.5.3.1
|
40 |
+
* pytorch 1.13.1+cu116
|
41 |
+
|
42 |
+
## Latency:
|
43 |
+
|
44 |
+
|Input sequance length|Number of generated tokens|TensorRT INT8+FP32 ms|torch FP16 ms|Acceleration|
|
45 |
+
|:---:|:---:|:---:|:---:|:---:|
|
46 |
+
|64|64|1040|1610|1.55|
|
47 |
+
|64|128|2089|3224|1.54|
|
48 |
+
|64|256|4236|6479|1.53|
|
49 |
+
|128|64|1060|1619|1.53|
|
50 |
+
|128|128|2120|3241|1.53|
|
51 |
+
|128|256|4296|6510|1.52|
|
52 |
+
|256|64|1109|1640|1.49|
|
53 |
+
|256|128|2204|3276|1.49|
|
54 |
+
|256|256|4443|6571|1.49|
|
55 |
+
|
56 |
+
### Test environment
|
57 |
+
|
58 |
+
* GPU RTX 4090
|
59 |
+
* CPU 11th Gen Intel(R) Core(TM) i7-11700K
|
60 |
+
* TensorRT 8.5.3.1
|
61 |
+
* pytorch 1.13.1+cu116
|
62 |
|
63 |
## How to use
|
64 |
|