Spaces:

VPTQ-community
/

README

Running

App Files Files Community

OpenSourceRonin commited on 5 days ago

Commit

c7f64f9

•

1 Parent(s): 8453bfa

Update README.md

Browse files

Files changed (1) hide show

README.md +1 -84

README.md CHANGED Viewed

@@ -36,21 +36,6 @@ Scaling model size significantly challenges the deployment and inference of Larg
 Read tech report at [**Tech Report**](https://github.com/microsoft/VPTQ/blob/main/VPTQ_tech_report.pdf) and [**arXiv Paper**](https://arxiv.org/pdf/2409.17066)
-### Early Results from Tech Report
-VPTQ achieves better accuracy and higher throughput with lower quantization overhead across models of different sizes. The following experimental results are for reference only; VPTQ can achieve better outcomes under reasonable parameters, especially in terms of model accuracy and inference speed.
-<img src="assets/vptq.png" width="500">
-| Model | bitwidth | W2↓  | C4↓  | AvgQA↑ | tok/s↑ | mem(GB) | cost/h↓ |
-| ----------- | -------- | ---- | ---- | ------ | ------ | ------- | ------- |
-| LLaMA-2 7B  | 2.02     | 6.13 | 8.07 | 58.2   | 39.9   | 2.28    | 2       |
-|             | 2.26     | 5.95 | 7.87 | 59.4   | 35.7   | 2.48    | 3.1     |
-| LLaMA-2 13B | 2.02     | 5.32 | 7.15 | 62.4   | 26.9   | 4.03    | 3.2     |
-|             | 2.18     | 5.28 | 7.04 | 63.1   | 18.5   | 4.31    | 3.6     |
-| LLaMA-2 70B | 2.07     | 3.93 | 5.72 | 68.6   | 9.7    | 19.54   | 19      |
-|             | 2.11     | 3.92 | 5.71 | 68.7   | 9.7    | 20.01   | 19      |
----
 ## Installation
@@ -148,72 +133,4 @@ A environment variable is available to control share link or not.
 `export SHARE_LINK=1`
 ```
 python -m vptq.app
-```
----
-## Road Map
-- [ ] Merge the quantization algorithm into the public repository.
-- [ ] Submit the VPTQ method to various inference frameworks (e.g., vLLM, llama.cpp).
-- [ ] Improve the implementation of the inference kernel.
-- [ ] **TBC**
-## Project main members:
-* Yifei Liu (@lyf-00)
-* Jicheng Wen (@wejoncy)
-* Yang Wang (@YangWang92)
-## Acknowledgement
-* We thank for **James Hensman** for his crucial insights into the error analysis related to Vector Quantization (VQ), and his comments on LLMs evaluation are invaluable to this research.
-* We are deeply grateful for the inspiration provided by the papers QUIP, QUIP#, GPTVQ, AQLM, WoodFisher, GPTQ, and OBC.
-## Publication
-EMNLP 2024 Main
-```bibtex
-@inproceedings{
-  vptq,
-  title={VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models},
-  author={Yifei Liu and
-          Jicheng Wen and
-          Yang Wang and
-          Shengyu Ye and
-          Li Lyna Zhang and
-          Ting Cao and
-          Cheng Li and
-          Mao Yang},
-  booktitle={The 2024 Conference on Empirical Methods in Natural Language Processing},
-  year={2024},
-}
-```
----
-## Limitation of VPTQ
-* ⚠️ VPTQ should only be used for research and experimental purposes. Further testing and validation are needed before you use it.
-* ⚠️ The repository only provides a method of model quantization algorithm. The open-source community may provide models based on the technical report and quantization algorithm by themselves, but the repository cannot guarantee the performance of those models.
-* ⚠️ VPTQ is not capable of testing all potential applications and domains, and VPTQ cannot guarantee the accuracy and effectiveness of VPTQ across other tasks or scenarios.
-* ⚠️ Our tests are all based on English texts; other languages are not included in the current testing.
-## Contributing
-This project welcomes contributions and suggestions.  Most contributions require you to agree to a
-Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
-the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
-When you submit a pull request, a CLA bot will automatically determine whether you need to provide
-a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
-provided by the bot. You will only need to do this once across all repos using our CLA.
-This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
-For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
-contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
-## Trademarks
-This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft
-trademarks or logos is subject to and must follow
-[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
-Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
-Any use of third-party trademarks or logos are subject to those third-party's policies.

 Read tech report at [**Tech Report**](https://github.com/microsoft/VPTQ/blob/main/VPTQ_tech_report.pdf) and [**arXiv Paper**](https://arxiv.org/pdf/2409.17066)
 ## Installation
 `export SHARE_LINK=1`
 ```
 python -m vptq.app
+```