Update README.md
Browse files
README.md
CHANGED
@@ -8,6 +8,8 @@ tags:
|
|
8 |
---
|
9 |
Trained on [NobodyExistsOnTheInternet/ToxicQAFinal](https://huggingface.co/datasets/NobodyExistsOnTheInternet/ToxicQAFinal). I converted the set to a preference dataset using refusals generated from LLaMa-3-Instruct-8B. I have not recreated the rejections with LLaMa-3.1-Instruct-8B yet.
|
10 |
|
|
|
|
|
11 |
![train/rewards](https://huggingface.co/PJMixers/LLaMa-3.1-Instruct-ToxicQAFinal-ORPO-8B-QDoRA/resolve/main/images/rewards.png)
|
12 |
![train/logits](https://huggingface.co/PJMixers/LLaMa-3.1-Instruct-ToxicQAFinal-ORPO-8B-QDoRA/resolve/main/images/logits.png)
|
13 |
![train/logps](https://huggingface.co/PJMixers/LLaMa-3.1-Instruct-ToxicQAFinal-ORPO-8B-QDoRA/resolve/main/images/logps.png)
|
|
|
8 |
---
|
9 |
Trained on [NobodyExistsOnTheInternet/ToxicQAFinal](https://huggingface.co/datasets/NobodyExistsOnTheInternet/ToxicQAFinal). I converted the set to a preference dataset using refusals generated from LLaMa-3-Instruct-8B. I have not recreated the rejections with LLaMa-3.1-Instruct-8B yet.
|
10 |
|
11 |
+
This is an early test.
|
12 |
+
|
13 |
![train/rewards](https://huggingface.co/PJMixers/LLaMa-3.1-Instruct-ToxicQAFinal-ORPO-8B-QDoRA/resolve/main/images/rewards.png)
|
14 |
![train/logits](https://huggingface.co/PJMixers/LLaMa-3.1-Instruct-ToxicQAFinal-ORPO-8B-QDoRA/resolve/main/images/logits.png)
|
15 |
![train/logps](https://huggingface.co/PJMixers/LLaMa-3.1-Instruct-ToxicQAFinal-ORPO-8B-QDoRA/resolve/main/images/logps.png)
|