yangwang92 commited on
Commit
7c73a10
β€’
1 Parent(s): 4da29fc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -5
README.md CHANGED
@@ -1,12 +1,24 @@
1
  ---
2
- title: VPTQ
3
- emoji: πŸŒ–
4
  colorFrom: blue
5
  colorTo: green
6
  sdk: static
7
- pinned: false
8
  license: mit
9
- short_description: Vector Post-Training Quantization (VPTQ) Inference Demo
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: VPTQ Demo
3
+ emoji: πŸš€
4
  colorFrom: blue
5
  colorTo: green
6
  sdk: static
7
+ pinned: true
8
  license: mit
9
+ short_description: Vector Post Training Quantization Inference Demo
10
  ---
11
 
12
+ Vector Post-Training Quantization (VPTQ) is a novel Post-Training Quantization method that leverages Vector Quantization to high accuracy on LLMs at an extremely low bit-width (<2-bit). VPTQ can compress 70B, even the 405B model, to 1-2 bits without retraining and maintain high accuracy.
13
+
14
+ * Better Accuracy on 1-2 bits, (405B @ <2bit, 70B @ 2bit)
15
+ * Lightweight Quantization Algorithm: only cost ~17 hours to quantize 405B Llama-3.1
16
+ * Agile Quantization Inference: low decode overhead, best throughput, and TTFT
17
+
18
+ [Github/Codes](https://github.com/microsoft/VPTQ)
19
+
20
+ [Online Demo](https://huggingface.co/spaces/microsoft/VPTQ)
21
+
22
+ [Paper](https://arxiv.org/abs/2409.17066)
23
+
24
+