unsubscribe
commited on
Commit
•
f39ab7a
1
Parent(s):
c52ae5b
Update README.md
Browse files
README.md
CHANGED
@@ -24,7 +24,9 @@ ______________________________________________________________________
|
|
24 |
|
25 |
## News 🎉
|
26 |
|
27 |
-
- \[2023/08\]
|
|
|
|
|
28 |
- \[2023/07\] TurboMind supports Llama-2 70B with GQA.
|
29 |
- \[2023/07\] TurboMind supports Llama-2 7B/13B.
|
30 |
- \[2023/07\] TurboMind supports tensor-parallel inference of InternLM.
|
|
|
24 |
|
25 |
## News 🎉
|
26 |
|
27 |
+
- \[2023/08\] Turbomind supports 4-bit inference, 2.4x faster than FP16, the fastest open-source implementation🚀.
|
28 |
+
- \[2023/08\] LMDeploy has launched on the [HuggingFace Hub](https://huggingface.co/lmdeploy), providing ready-to-use 4-bit models.
|
29 |
+
- \[2023/08\] LMDeploy supports 4-bit quantization using the [AWQ](https://arxiv.org/abs/2306.00978) algorithm.
|
30 |
- \[2023/07\] TurboMind supports Llama-2 70B with GQA.
|
31 |
- \[2023/07\] TurboMind supports Llama-2 7B/13B.
|
32 |
- \[2023/07\] TurboMind supports tensor-parallel inference of InternLM.
|