Spaces:

VPTQ-community
/

README

Running

App Files Files Community

OpenSourceRonin commited on 5 days ago

Commit

cf6c77c

•

1 Parent(s): c7f64f9

Update README.md

Browse files

Files changed (1) hide show

README.md +2 -68

README.md CHANGED Viewed

@@ -9,7 +9,7 @@ pinned: false
 **Disclaimer**:
-VPTQ-community is a open source community to reproduced models on the paper *VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models* [github](https://github.com/microsoft/vptq)
 It is intended only for experimental purposes.
@@ -36,31 +36,7 @@ Scaling model size significantly challenges the deployment and inference of Larg
 Read tech report at [**Tech Report**](https://github.com/microsoft/VPTQ/blob/main/VPTQ_tech_report.pdf) and [**arXiv Paper**](https://arxiv.org/pdf/2409.17066)
-## Installation
-### Dependencies
-- python 3.10+
-- torch >= 2.2.0
-- transformers >= 4.44.0
-- Accelerate >= 0.33.0
-- latest datasets
-### Installation
-> Preparation steps that might be needed: Set up CUDA PATH.
-```bash
-export PATH=/usr/local/cuda-12/bin/:$PATH  # set dependent on your environment
-```
-*Will Take several minutes to compile CUDA kernels*
-```python
-pip install git+https://github.com/microsoft/VPTQ.git --no-build-isolation
-```
-## Evaluation
-### Models from Open Source Community
 ⚠️ The repository only provides a method of model quantization algorithm.
@@ -92,45 +68,3 @@ pip install git+https://github.com/microsoft/VPTQ.git --no-build-isolation
 |  Qwen 2.5 14B Instruct   | [HF 🤗](https://huggingface.co/collections/VPTQ-community/vptq-qwen-25-14b-instruct-without-finetune-66f827f83c7ffa7931b8376c)    | [4 bits](https://huggingface.co/VPTQ-community/Qwen2.5-14B-Instruct-v8-k65536-65536-woft) [3 bits](https://huggingface.co/VPTQ-community/Qwen2.5-14B-Instruct-v8-k65536-256-woft) [2 bits (1)](https://huggingface.co/VPTQ-community/Qwen2.5-14B-Instruct-v8-k256-256-woft) [2 bits (2)](https://huggingface.co/VPTQ-community/Qwen2.5-14B-Instruct-v8-k65536-0-woft)  [2 bits (3)](https://huggingface.co/VPTQ-community/Qwen2.5-14B-Instruct-v16-k65536-65536-woft)  |
 |  Qwen 2.5 72B Instruct   | [HF 🤗](https://huggingface.co/collections/VPTQ-community/vptq-qwen-25-72b-instruct-without-finetune-66f3bf1b3757dfa1ecb481c0)    | [4 bits](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k65536-65536-woft) [3 bits](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k65536-256-woft) [2.38 bits](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k1024-512-woft) [2.25 bits (1)](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k512-512-woft) [2.25 bits (2)](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k65536-4-woft) [2 bits (1)](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k65536-0-woft) [2 bits (2)](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v16-k65536-65536-woft) [1.94 bits](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v16-k65536-32768-woft) |
-### Language Generation Example
-To generate text using the pre-trained model, you can use the following code snippet:
-The model [*VPTQ-community/Meta-Llama-3.1-70B-Instruct-v8-k65536-0-woft*](https://huggingface.co/VPTQ-community/Meta-Llama-3.1-70B-Instruct-v8-k65536-0-woft) (~2 bit) is provided by open source community. The repository cannot guarantee the performance of those models.
-```python
-python -m vptq --model=VPTQ-community/Meta-Llama-3.1-70B-Instruct-v8-k65536-0-woft --prompt="Explain: Do Not Go Gentle into That Good Night"
-```
-![Llama3 1-70b-prompt](https://github.com/user-attachments/assets/d8729aca-4e1d-4fe1-ac71-c14da4bdd97f)
-### Terminal Chatbot Example
-Launching a chatbot:
-Note that you must use a chat model for this to work
-```python
-python -m vptq --model=VPTQ-community/Meta-Llama-3.1-70B-Instruct-v8-k65536-0-woft --chat
-```
-![Llama3 1-70b-chat](https://github.com/user-attachments/assets/af051234-d1df-4e25-95e7-17a5ce98f3ea)
-### Python API Example
-Using the Python API:
-```python
-import vptq
-import transformers
-tokenizer = transformers.AutoTokenizer.from_pretrained("VPTQ-community/Meta-Llama-3.1-70B-Instruct-v8-k65536-0-woft")
-m = vptq.AutoModelForCausalLM.from_pretrained("VPTQ-community/Meta-Llama-3.1-70B-Instruct-v8-k65536-0-woft", device_map='auto')
-inputs = tokenizer("Explain: Do Not Go Gentle into That Good Night", return_tensors="pt").to("cuda")
-out = m.generate(**inputs, max_new_tokens=100, pad_token_id=2)
-print(tokenizer.decode(out[0], skip_special_tokens=True))
-```
-### Gradio Web App Example
-A environment variable is available to control share link or not.
-`export SHARE_LINK=1`
-```
-python -m vptq.app
-```

 **Disclaimer**:
+VPTQ-community is a open source community to reproduced models on the paper *VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models* [**github**](https://github.com/microsoft/vptq)
 It is intended only for experimental purposes.
 Read tech report at [**Tech Report**](https://github.com/microsoft/VPTQ/blob/main/VPTQ_tech_report.pdf) and [**arXiv Paper**](https://arxiv.org/pdf/2409.17066)
+## Models from Open Source Community
 ⚠️ The repository only provides a method of model quantization algorithm.
 |  Qwen 2.5 14B Instruct   | [HF 🤗](https://huggingface.co/collections/VPTQ-community/vptq-qwen-25-14b-instruct-without-finetune-66f827f83c7ffa7931b8376c)    | [4 bits](https://huggingface.co/VPTQ-community/Qwen2.5-14B-Instruct-v8-k65536-65536-woft) [3 bits](https://huggingface.co/VPTQ-community/Qwen2.5-14B-Instruct-v8-k65536-256-woft) [2 bits (1)](https://huggingface.co/VPTQ-community/Qwen2.5-14B-Instruct-v8-k256-256-woft) [2 bits (2)](https://huggingface.co/VPTQ-community/Qwen2.5-14B-Instruct-v8-k65536-0-woft)  [2 bits (3)](https://huggingface.co/VPTQ-community/Qwen2.5-14B-Instruct-v16-k65536-65536-woft)  |
 |  Qwen 2.5 72B Instruct   | [HF 🤗](https://huggingface.co/collections/VPTQ-community/vptq-qwen-25-72b-instruct-without-finetune-66f3bf1b3757dfa1ecb481c0)    | [4 bits](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k65536-65536-woft) [3 bits](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k65536-256-woft) [2.38 bits](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k1024-512-woft) [2.25 bits (1)](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k512-512-woft) [2.25 bits (2)](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k65536-4-woft) [2 bits (1)](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k65536-0-woft) [2 bits (2)](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v16-k65536-65536-woft) [1.94 bits](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v16-k65536-32768-woft) |