OpenSourceRonin commited on
Commit
c7f64f9
1 Parent(s): 8453bfa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -84
README.md CHANGED
@@ -36,21 +36,6 @@ Scaling model size significantly challenges the deployment and inference of Larg
36
 
37
  Read tech report at [**Tech Report**](https://github.com/microsoft/VPTQ/blob/main/VPTQ_tech_report.pdf) and [**arXiv Paper**](https://arxiv.org/pdf/2409.17066)
38
 
39
- ### Early Results from Tech Report
40
- VPTQ achieves better accuracy and higher throughput with lower quantization overhead across models of different sizes. The following experimental results are for reference only; VPTQ can achieve better outcomes under reasonable parameters, especially in terms of model accuracy and inference speed.
41
-
42
- <img src="assets/vptq.png" width="500">
43
-
44
- | Model | bitwidth | W2↓ | C4↓ | AvgQA↑ | tok/s↑ | mem(GB) | cost/h↓ |
45
- | ----------- | -------- | ---- | ---- | ------ | ------ | ------- | ------- |
46
- | LLaMA-2 7B | 2.02 | 6.13 | 8.07 | 58.2 | 39.9 | 2.28 | 2 |
47
- | | 2.26 | 5.95 | 7.87 | 59.4 | 35.7 | 2.48 | 3.1 |
48
- | LLaMA-2 13B | 2.02 | 5.32 | 7.15 | 62.4 | 26.9 | 4.03 | 3.2 |
49
- | | 2.18 | 5.28 | 7.04 | 63.1 | 18.5 | 4.31 | 3.6 |
50
- | LLaMA-2 70B | 2.07 | 3.93 | 5.72 | 68.6 | 9.7 | 19.54 | 19 |
51
- | | 2.11 | 3.92 | 5.71 | 68.7 | 9.7 | 20.01 | 19 |
52
-
53
- ---
54
 
55
  ## Installation
56
 
@@ -148,72 +133,4 @@ A environment variable is available to control share link or not.
148
  `export SHARE_LINK=1`
149
  ```
150
  python -m vptq.app
151
- ```
152
-
153
- ---
154
-
155
- ## Road Map
156
- - [ ] Merge the quantization algorithm into the public repository.
157
- - [ ] Submit the VPTQ method to various inference frameworks (e.g., vLLM, llama.cpp).
158
- - [ ] Improve the implementation of the inference kernel.
159
- - [ ] **TBC**
160
-
161
- ## Project main members:
162
- * Yifei Liu (@lyf-00)
163
- * Jicheng Wen (@wejoncy)
164
- * Yang Wang (@YangWang92)
165
-
166
- ## Acknowledgement
167
-
168
- * We thank for **James Hensman** for his crucial insights into the error analysis related to Vector Quantization (VQ), and his comments on LLMs evaluation are invaluable to this research.
169
- * We are deeply grateful for the inspiration provided by the papers QUIP, QUIP#, GPTVQ, AQLM, WoodFisher, GPTQ, and OBC.
170
-
171
- ## Publication
172
-
173
- EMNLP 2024 Main
174
- ```bibtex
175
- @inproceedings{
176
- vptq,
177
- title={VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models},
178
- author={Yifei Liu and
179
- Jicheng Wen and
180
- Yang Wang and
181
- Shengyu Ye and
182
- Li Lyna Zhang and
183
- Ting Cao and
184
- Cheng Li and
185
- Mao Yang},
186
- booktitle={The 2024 Conference on Empirical Methods in Natural Language Processing},
187
- year={2024},
188
- }
189
- ```
190
-
191
- ---
192
-
193
- ## Limitation of VPTQ
194
- * ⚠️ VPTQ should only be used for research and experimental purposes. Further testing and validation are needed before you use it.
195
- * ⚠️ The repository only provides a method of model quantization algorithm. The open-source community may provide models based on the technical report and quantization algorithm by themselves, but the repository cannot guarantee the performance of those models.
196
- * ⚠️ VPTQ is not capable of testing all potential applications and domains, and VPTQ cannot guarantee the accuracy and effectiveness of VPTQ across other tasks or scenarios.
197
- * ⚠️ Our tests are all based on English texts; other languages are not included in the current testing.
198
-
199
- ## Contributing
200
-
201
- This project welcomes contributions and suggestions. Most contributions require you to agree to a
202
- Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
203
- the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
204
-
205
- When you submit a pull request, a CLA bot will automatically determine whether you need to provide
206
- a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
207
- provided by the bot. You will only need to do this once across all repos using our CLA.
208
-
209
- This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
210
- For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
211
- contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
212
-
213
- ## Trademarks
214
-
215
- This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft
216
- trademarks or logos is subject to and must follow
217
- [Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
218
- Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
219
- Any use of third-party trademarks or logos are subject to those third-party's policies.
 
36
 
37
  Read tech report at [**Tech Report**](https://github.com/microsoft/VPTQ/blob/main/VPTQ_tech_report.pdf) and [**arXiv Paper**](https://arxiv.org/pdf/2409.17066)
38
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
 
40
  ## Installation
41
 
 
133
  `export SHARE_LINK=1`
134
  ```
135
  python -m vptq.app
136
+ ```