grapevine-AI
commited on
Commit
•
9f3b057
1
Parent(s):
02260b9
Update README.md
Browse files
README.md
CHANGED
@@ -9,6 +9,35 @@ I think this is the world's first successful example to make Qwen2-57B-A14B-Inst
|
|
9 |
[TFMC/imatrix-dataset-for-japanese-llm](https://huggingface.co/datasets/TFMC/imatrix-dataset-for-japanese-llm).<br>
|
10 |
This dataset contains English and many Japanese sentence.
|
11 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
# License
|
13 |
Apache 2.0
|
14 |
|
|
|
9 |
[TFMC/imatrix-dataset-for-japanese-llm](https://huggingface.co/datasets/TFMC/imatrix-dataset-for-japanese-llm).<br>
|
10 |
This dataset contains English and many Japanese sentence.
|
11 |
|
12 |
+
# How to made it
|
13 |
+
First I made Q6_K then tried to convert it to i-quants with ``--allow-requantize`` option.<br>
|
14 |
+
Surprisingly (re)quantizeing process has been completed without ploblems.<br>
|
15 |
+
I will show more detail below.
|
16 |
+
|
17 |
+
### Step1: Making GGUF of f16
|
18 |
+
At first, I converted safetensors to GGUF.
|
19 |
+
|
20 |
+
### Step2: Converting f16 to Q8_0
|
21 |
+
Second, I converted f16 to Q8_0.<br>
|
22 |
+
This is aiming to accelerate next process because I don't have enough memory to deal with high precision tensor.
|
23 |
+
|
24 |
+
### Step3: Calculating imatrix
|
25 |
+
I calculate imatrix with the Q8_0.<br>
|
26 |
+
I seem to some people succeed to calculate imatrix so I think anyone can make imatrix.<br>
|
27 |
+
At this time, I use '-fa' option because I want to finish culculation as much as possible.<br>
|
28 |
+
However, later I knew some people claim Qwen2 needs `-fa` option to work correctly.
|
29 |
+
|
30 |
+
### Step4: Making Q6_K temporarily
|
31 |
+
This is most important step. Firstly I converted f16 to Q6_K.<br>
|
32 |
+
Never try to make i-quants directly. No one may succeed to make it directly.
|
33 |
+
|
34 |
+
### Step5: Converting Q6 to i-quants with the imatrix
|
35 |
+
I converted the Q6_K to i-quants with imatrix.<br>
|
36 |
+
Strangely the process has been finished and the i-quants may work.
|
37 |
+
|
38 |
+
### environment
|
39 |
+
GeForce RTX 3090 and llama.cpp windows binary b3065
|
40 |
+
|
41 |
# License
|
42 |
Apache 2.0
|
43 |
|