OpenSourceRonin commited on
Commit
cf6c77c
β€’
1 Parent(s): c7f64f9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -68
README.md CHANGED
@@ -9,7 +9,7 @@ pinned: false
9
 
10
  **Disclaimer**:
11
 
12
- VPTQ-community is a open source community to reproduced models on the paper *VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models* [github](https://github.com/microsoft/vptq)
13
 
14
  It is intended only for experimental purposes.
15
 
@@ -36,31 +36,7 @@ Scaling model size significantly challenges the deployment and inference of Larg
36
 
37
  Read tech report at [**Tech Report**](https://github.com/microsoft/VPTQ/blob/main/VPTQ_tech_report.pdf) and [**arXiv Paper**](https://arxiv.org/pdf/2409.17066)
38
 
39
-
40
- ## Installation
41
-
42
- ### Dependencies
43
-
44
- - python 3.10+
45
- - torch >= 2.2.0
46
- - transformers >= 4.44.0
47
- - Accelerate >= 0.33.0
48
- - latest datasets
49
-
50
- ### Installation
51
-
52
- > Preparation steps that might be needed: Set up CUDA PATH.
53
- ```bash
54
- export PATH=/usr/local/cuda-12/bin/:$PATH # set dependent on your environment
55
- ```
56
-
57
- *Will Take several minutes to compile CUDA kernels*
58
- ```python
59
- pip install git+https://github.com/microsoft/VPTQ.git --no-build-isolation
60
- ```
61
-
62
- ## Evaluation
63
- ### Models from Open Source Community
64
 
65
  ⚠️ The repository only provides a method of model quantization algorithm.
66
 
@@ -92,45 +68,3 @@ pip install git+https://github.com/microsoft/VPTQ.git --no-build-isolation
92
  | Qwen 2.5 14B Instruct | [HF πŸ€—](https://huggingface.co/collections/VPTQ-community/vptq-qwen-25-14b-instruct-without-finetune-66f827f83c7ffa7931b8376c) | [4 bits](https://huggingface.co/VPTQ-community/Qwen2.5-14B-Instruct-v8-k65536-65536-woft) [3 bits](https://huggingface.co/VPTQ-community/Qwen2.5-14B-Instruct-v8-k65536-256-woft) [2 bits (1)](https://huggingface.co/VPTQ-community/Qwen2.5-14B-Instruct-v8-k256-256-woft) [2 bits (2)](https://huggingface.co/VPTQ-community/Qwen2.5-14B-Instruct-v8-k65536-0-woft) [2 bits (3)](https://huggingface.co/VPTQ-community/Qwen2.5-14B-Instruct-v16-k65536-65536-woft) |
93
  | Qwen 2.5 72B Instruct | [HF πŸ€—](https://huggingface.co/collections/VPTQ-community/vptq-qwen-25-72b-instruct-without-finetune-66f3bf1b3757dfa1ecb481c0) | [4 bits](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k65536-65536-woft) [3 bits](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k65536-256-woft) [2.38 bits](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k1024-512-woft) [2.25 bits (1)](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k512-512-woft) [2.25 bits (2)](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k65536-4-woft) [2 bits (1)](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k65536-0-woft) [2 bits (2)](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v16-k65536-65536-woft) [1.94 bits](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v16-k65536-32768-woft) |
94
 
95
-
96
- ### Language Generation Example
97
- To generate text using the pre-trained model, you can use the following code snippet:
98
-
99
- The model [*VPTQ-community/Meta-Llama-3.1-70B-Instruct-v8-k65536-0-woft*](https://huggingface.co/VPTQ-community/Meta-Llama-3.1-70B-Instruct-v8-k65536-0-woft) (~2 bit) is provided by open source community. The repository cannot guarantee the performance of those models.
100
-
101
- ```python
102
- python -m vptq --model=VPTQ-community/Meta-Llama-3.1-70B-Instruct-v8-k65536-0-woft --prompt="Explain: Do Not Go Gentle into That Good Night"
103
- ```
104
- ![Llama3 1-70b-prompt](https://github.com/user-attachments/assets/d8729aca-4e1d-4fe1-ac71-c14da4bdd97f)
105
-
106
-
107
- ### Terminal Chatbot Example
108
- Launching a chatbot:
109
- Note that you must use a chat model for this to work
110
-
111
- ```python
112
- python -m vptq --model=VPTQ-community/Meta-Llama-3.1-70B-Instruct-v8-k65536-0-woft --chat
113
- ```
114
- ![Llama3 1-70b-chat](https://github.com/user-attachments/assets/af051234-d1df-4e25-95e7-17a5ce98f3ea)
115
-
116
-
117
- ### Python API Example
118
- Using the Python API:
119
-
120
- ```python
121
- import vptq
122
- import transformers
123
- tokenizer = transformers.AutoTokenizer.from_pretrained("VPTQ-community/Meta-Llama-3.1-70B-Instruct-v8-k65536-0-woft")
124
- m = vptq.AutoModelForCausalLM.from_pretrained("VPTQ-community/Meta-Llama-3.1-70B-Instruct-v8-k65536-0-woft", device_map='auto')
125
-
126
- inputs = tokenizer("Explain: Do Not Go Gentle into That Good Night", return_tensors="pt").to("cuda")
127
- out = m.generate(**inputs, max_new_tokens=100, pad_token_id=2)
128
- print(tokenizer.decode(out[0], skip_special_tokens=True))
129
- ```
130
-
131
- ### Gradio Web App Example
132
- A environment variable is available to control share link or not.
133
- `export SHARE_LINK=1`
134
- ```
135
- python -m vptq.app
136
- ```
 
9
 
10
  **Disclaimer**:
11
 
12
+ VPTQ-community is a open source community to reproduced models on the paper *VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models* [**github**](https://github.com/microsoft/vptq)
13
 
14
  It is intended only for experimental purposes.
15
 
 
36
 
37
  Read tech report at [**Tech Report**](https://github.com/microsoft/VPTQ/blob/main/VPTQ_tech_report.pdf) and [**arXiv Paper**](https://arxiv.org/pdf/2409.17066)
38
 
39
+ ## Models from Open Source Community
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
 
41
  ⚠️ The repository only provides a method of model quantization algorithm.
42
 
 
68
  | Qwen 2.5 14B Instruct | [HF πŸ€—](https://huggingface.co/collections/VPTQ-community/vptq-qwen-25-14b-instruct-without-finetune-66f827f83c7ffa7931b8376c) | [4 bits](https://huggingface.co/VPTQ-community/Qwen2.5-14B-Instruct-v8-k65536-65536-woft) [3 bits](https://huggingface.co/VPTQ-community/Qwen2.5-14B-Instruct-v8-k65536-256-woft) [2 bits (1)](https://huggingface.co/VPTQ-community/Qwen2.5-14B-Instruct-v8-k256-256-woft) [2 bits (2)](https://huggingface.co/VPTQ-community/Qwen2.5-14B-Instruct-v8-k65536-0-woft) [2 bits (3)](https://huggingface.co/VPTQ-community/Qwen2.5-14B-Instruct-v16-k65536-65536-woft) |
69
  | Qwen 2.5 72B Instruct | [HF πŸ€—](https://huggingface.co/collections/VPTQ-community/vptq-qwen-25-72b-instruct-without-finetune-66f3bf1b3757dfa1ecb481c0) | [4 bits](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k65536-65536-woft) [3 bits](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k65536-256-woft) [2.38 bits](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k1024-512-woft) [2.25 bits (1)](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k512-512-woft) [2.25 bits (2)](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k65536-4-woft) [2 bits (1)](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k65536-0-woft) [2 bits (2)](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v16-k65536-65536-woft) [1.94 bits](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v16-k65536-32768-woft) |
70