mabaochang
commited on
Commit
•
732daf4
1
Parent(s):
0ce6066
Update README.md
Browse files
README.md
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
---
|
2 |
-
license:
|
3 |
tags:
|
4 |
- text2text-generation
|
5 |
pipeline_tag: text2text-generation
|
@@ -53,12 +53,39 @@ c066b68b4139328e87a694020fc3a6c3 ./special_tokens_map.json.ca3d163bab0553818272
|
|
53 |
39ec1b33fbf9a0934a8ae0f9a24c7163 ./tokenizer.model.9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347.enc
|
54 |
```
|
55 |
|
56 |
-
2. Decrypt the files using https://github.com/LianjiaTech/BELLE/tree/main/models
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
57 |
```
|
58 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
59 |
```
|
60 |
|
61 |
3. Check md5sum
|
|
|
|
|
|
|
62 |
```
|
63 |
md5sum ./*
|
64 |
a57bf2d0d7ec2590740bc4175262610b ./config.json
|
@@ -87,7 +114,7 @@ After you decrypt the files, BELLE-LLAMA-7B-2M can be easily loaded with LlamaFo
|
|
87 |
from transformers import LlamaForCausalLM, AutoTokenizer
|
88 |
import torch
|
89 |
|
90 |
-
ckpt = '
|
91 |
device = torch.device('cuda')
|
92 |
model = LlamaForCausalLM.from_pretrained(ckpt, device_map='auto', low_cpu_mem_usage=True)
|
93 |
tokenizer = AutoTokenizer.from_pretrained(ckpt)
|
|
|
1 |
---
|
2 |
+
license: gpl-3.0
|
3 |
tags:
|
4 |
- text2text-generation
|
5 |
pipeline_tag: text2text-generation
|
|
|
53 |
39ec1b33fbf9a0934a8ae0f9a24c7163 ./tokenizer.model.9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347.enc
|
54 |
```
|
55 |
|
56 |
+
2. Decrypt the files using the scripts in https://github.com/LianjiaTech/BELLE/tree/main/models
|
57 |
+
|
58 |
+
You can use the following command in Bash.
|
59 |
+
Please replace "/path/to_encrypted" with the path where you stored your encrypted file,
|
60 |
+
replace "/path/to_original_llama_7B" with the path where you stored your original llama7B file,
|
61 |
+
and replace "/path/to_finetuned_model" with the path where you want to save your final trained model.
|
62 |
+
|
63 |
+
```bash
|
64 |
+
mkdir /path/to_finetuned_model
|
65 |
+
for f in "/path/to_encrypted"/*; \
|
66 |
+
do if [ -f "$f" ]; then \
|
67 |
+
python3 decrypt.py "$f" "/path/to_original_llama_7B/consolidated.00.pth" "/path/to_finetuned_model/"; \
|
68 |
+
fi; \
|
69 |
+
done
|
70 |
+
```
|
71 |
+
|
72 |
+
After executing the aforementioned command, you will obtain the following files.
|
73 |
+
|
74 |
```
|
75 |
+
./config.json
|
76 |
+
./generation_config.json
|
77 |
+
./pytorch_model-00001-of-00002.bin
|
78 |
+
./pytorch_model-00002-of-00002.bin
|
79 |
+
./pytorch_model.bin.index.json
|
80 |
+
./special_tokens_map.json
|
81 |
+
./tokenizer_config.json
|
82 |
+
./tokenizer.model
|
83 |
```
|
84 |
|
85 |
3. Check md5sum
|
86 |
+
|
87 |
+
You can verify the integrity of these files by performing an MD5 checksum to ensure their complete recovery.
|
88 |
+
Here are the MD5 checksums for the relevant files:
|
89 |
```
|
90 |
md5sum ./*
|
91 |
a57bf2d0d7ec2590740bc4175262610b ./config.json
|
|
|
114 |
from transformers import LlamaForCausalLM, AutoTokenizer
|
115 |
import torch
|
116 |
|
117 |
+
ckpt = '/path/to_finetuned_model/'
|
118 |
device = torch.device('cuda')
|
119 |
model = LlamaForCausalLM.from_pretrained(ckpt, device_map='auto', low_cpu_mem_usage=True)
|
120 |
tokenizer = AutoTokenizer.from_pretrained(ckpt)
|