OllieStanley
commited on
Commit
•
d82939e
1
Parent(s):
6ffcd7e
Update README.md
Browse files
README.md
CHANGED
@@ -10,7 +10,7 @@ Thanks to Mick for writing the `xor_codec.py` script which enables this process
|
|
10 |
|
11 |
## The Process
|
12 |
|
13 |
-
Note: This process applies to `oasst-
|
14 |
|
15 |
To use OpenAssistant LLaMa-Based Models, you need to have a copy of the original LLaMa model weights and add them to a `llama` subdirectory here.
|
16 |
|
@@ -54,7 +54,7 @@ edd1a5897748864768b1fab645b31491 ./tokenizer_config.json
|
|
54 |
Once you have LLaMa weights in the correct format, you can apply the XOR decoding:
|
55 |
|
56 |
```
|
57 |
-
python xor_codec.py oasst-
|
58 |
```
|
59 |
|
60 |
You should expect to see one warning message during execution:
|
@@ -63,24 +63,24 @@ You should expect to see one warning message during execution:
|
|
63 |
|
64 |
This is normal. If similar messages appear for other files, something has gone wrong.
|
65 |
|
66 |
-
Now run `find -type f -exec md5sum "{}" + > checklist.chk` in the output directory (here `oasst-
|
67 |
|
68 |
```
|
69 |
-
|
70 |
-
|
71 |
-
|
72 |
27b0dc092f99aa2efaf467b2d8026c3f ./added_tokens.json
|
73 |
-
|
74 |
-
|
75 |
-
|
76 |
eeec4125e9c7560836b4873b6f8e3025 ./tokenizer.model
|
77 |
-
|
78 |
-
|
79 |
deb33dd4ffc3d2baddcce275a00b7c1b ./tokenizer.json
|
80 |
-
|
81 |
ed59bfee4e87b9193fea5897d610ab24 ./tokenizer_config.json
|
82 |
-
|
83 |
-
|
84 |
```
|
85 |
|
86 |
If so you have successfully decoded the weights and should be able to use the model with HuggingFace Transformers.
|
|
|
10 |
|
11 |
## The Process
|
12 |
|
13 |
+
Note: This process applies to `oasst-rlhf-2-llama-30b-7k-steps` model. The same process can be applied to other models in future, but the checksums will be different..
|
14 |
|
15 |
To use OpenAssistant LLaMa-Based Models, you need to have a copy of the original LLaMa model weights and add them to a `llama` subdirectory here.
|
16 |
|
|
|
54 |
Once you have LLaMa weights in the correct format, you can apply the XOR decoding:
|
55 |
|
56 |
```
|
57 |
+
python xor_codec.py oasst-rlhf-2-llama-30b-7k-steps/ oasst-rlhf-2-llama-30b-7k-steps-xor/ llama30b_hf/
|
58 |
```
|
59 |
|
60 |
You should expect to see one warning message during execution:
|
|
|
63 |
|
64 |
This is normal. If similar messages appear for other files, something has gone wrong.
|
65 |
|
66 |
+
Now run `find -type f -exec md5sum "{}" + > checklist.chk` in the output directory (here `oasst-rlhf-2-llama-30b-7k-steps`). You should get a file with exactly these contents:
|
67 |
|
68 |
```
|
69 |
+
d08594778f00abe70b93899628e41246 ./pytorch_model-00007-of-00007.bin
|
70 |
+
f11acc069334434d68c45a80ee899fe5 ./pytorch_model-00003-of-00007.bin
|
71 |
+
9f41bd4d5720d28567b3e7820b4a8023 ./pytorch_model-00001-of-00007.bin
|
72 |
27b0dc092f99aa2efaf467b2d8026c3f ./added_tokens.json
|
73 |
+
148bfd184af630a7633b4de2f41bfc49 ./generation_config.json
|
74 |
+
b6e90377103e9270cbe46b13aed288ec ./pytorch_model-00005-of-00007.bin
|
75 |
+
4c5941b4ee12dc0d8e6b5ca3f6819f4d ./pytorch_model-00004-of-00007.bin
|
76 |
eeec4125e9c7560836b4873b6f8e3025 ./tokenizer.model
|
77 |
+
2c92d306969c427275f34b4ebf66f087 ./pytorch_model-00006-of-00007.bin
|
78 |
+
9a4d2468ecf85bf07420b200faefb4af ./config.json
|
79 |
deb33dd4ffc3d2baddcce275a00b7c1b ./tokenizer.json
|
80 |
+
13a3641423840eb89f9a86507a90b2bf ./pytorch_model.bin.index.json
|
81 |
ed59bfee4e87b9193fea5897d610ab24 ./tokenizer_config.json
|
82 |
+
704373f0c0d62be75e5f7d41d39a7e57 ./special_tokens_map.json
|
83 |
+
ed991042b2a449123824f689bb94b29e ./pytorch_model-00002-of-00007.bin
|
84 |
```
|
85 |
|
86 |
If so you have successfully decoded the weights and should be able to use the model with HuggingFace Transformers.
|