Hellisotherpeople
commited on
Commit
•
c84e52b
1
Parent(s):
0132716
Update README.md
Browse files
README.md
CHANGED
@@ -13,7 +13,7 @@ tags:
|
|
13 |
# Model Card for toxic-dpo-v2-mistral7b-lora
|
14 |
|
15 |
This is an example of how easy it is to effeciently fine-tune a SOTA 7B paramater language model adapter (in this case badsai_jaskier-7b-dpo-v5.6 was used, but the lora works on most mistral 7b models)
|
16 |
-
to remove "alignment" and "safety" restrictions on a single 3090 GPU with about 12 hours worth of compute.
|
17 |
|
18 |
The dataset used was https://huggingface.co/datasets/unalignment/toxic-dpo-v0.2
|
19 |
Trained for 9 epochs
|
|
|
13 |
# Model Card for toxic-dpo-v2-mistral7b-lora
|
14 |
|
15 |
This is an example of how easy it is to effeciently fine-tune a SOTA 7B paramater language model adapter (in this case badsai_jaskier-7b-dpo-v5.6 was used, but the lora works on most mistral 7b models)
|
16 |
+
to remove "alignment" and "safety" restrictions on a single 3090 GPU with about 12 hours worth of compute on a dataset of about 600 examples.
|
17 |
|
18 |
The dataset used was https://huggingface.co/datasets/unalignment/toxic-dpo-v0.2
|
19 |
Trained for 9 epochs
|