Update README.md
Browse files
README.md
CHANGED
@@ -22,21 +22,56 @@ As some people have told us our models are sloppy, Ikari decided to say fuck it
|
|
22 |
|
23 |
Our dataset stayed the same since day one, we added data over time, cleaned them, and repeat. After not releasing model for a while because we were never satisfied, we think it's time to come back!
|
24 |
|
|
|
|
|
|
|
|
|
|
|
25 |
|
26 |
## Credits:
|
27 |
- Undi
|
28 |
- IkariDev
|
29 |
|
30 |
-
## Training data used:
|
31 |
-
We will point out all dataset we used here, please be patient the time we get them all back kek.
|
32 |
|
33 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
34 |
|
35 |
-
|
36 |
|
37 |
-
|
38 |
-
|
39 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
|
41 |
## Others
|
42 |
|
|
|
22 |
|
23 |
Our dataset stayed the same since day one, we added data over time, cleaned them, and repeat. After not releasing model for a while because we were never satisfied, we think it's time to come back!
|
24 |
|
25 |
+
# Prompt template: Mistral
|
26 |
+
|
27 |
+
```
|
28 |
+
<s>[INST] {input} [/INST] {output}</s>
|
29 |
+
```
|
30 |
|
31 |
## Credits:
|
32 |
- Undi
|
33 |
- IkariDev
|
34 |
|
35 |
+
## Training data we used to make our dataset:
|
|
|
36 |
|
37 |
+
- [Epiculous/Gnosis](https://huggingface.co/Epiculous/Gnosis)
|
38 |
+
- [ChaoticNeutrals/Luminous_Opus](https://huggingface.co/datasets/ChaoticNeutrals/Luminous_Opus)
|
39 |
+
- [ChaoticNeutrals/Synthetic-Dark-RP](https://huggingface.co/datasets/ChaoticNeutrals/Synthetic-Dark-RP)
|
40 |
+
- [ChaoticNeutrals/Synthetic-RP](https://huggingface.co/datasets/ChaoticNeutrals/Synthetic-RP)
|
41 |
+
- [Gryphe/Sonnet3.5-SlimOrcaDedupCleaned](https://huggingface.co/datasets/Gryphe/Sonnet3.5-SlimOrcaDedupCleaned)
|
42 |
+
- [Gryphe/Opus-WritingPrompts](https://huggingface.co/datasets/Gryphe/Opus-WritingPrompts)
|
43 |
+
- [meseca/writing-opus-6k](https://huggingface.co/datasets/meseca/writing-opus-6k)
|
44 |
+
- [meseca/opus-instruct-9k](https://huggingface.co/datasets/meseca/opus-instruct-9k)
|
45 |
+
- [PJMixers/grimulkan_theory-of-mind-ShareGPT](https://huggingface.co/datasets/PJMixers/grimulkan_theory-of-mind-ShareGPT)
|
46 |
+
- [NobodyExistsOnTheInternet/ToxicQAFinal](https://huggingface.co/datasets/NobodyExistsOnTheInternet/ToxicQAFinal)
|
47 |
+
- [Undi95/toxic-dpo-v0.1-sharegpt](https://huggingface.co/datasets/Undi95/toxic-dpo-v0.1-sharegpt)
|
48 |
+
- [cgato/SlimOrcaDedupCleaned](https://huggingface.co/datasets/cgato/SlimOrcaDedupCleaned)
|
49 |
+
- [kalomaze/Opus_Instruct_25k](https://huggingface.co/datasets/kalomaze/Opus_Instruct_25k)
|
50 |
+
- [Doctor-Shotgun/no-robots-sharegpt](https://huggingface.co/datasets/Doctor-Shotgun/no-robots-sharegpt)
|
51 |
+
- [Norquinal/claude_multiround_chat_30k](https://huggingface.co/datasets/Norquinal/claude_multiround_chat_30k)
|
52 |
+
- [nothingiisreal/Claude-3-Opus-Instruct-15K](https://huggingface.co/datasets/nothingiisreal/Claude-3-Opus-Instruct-15K)
|
53 |
+
- All the Aesirs dataset, cleaned, unslopped
|
54 |
+
- All le luminae dataset, cleaned, unslopped
|
55 |
+
- Small part of Airoboros reduced
|
56 |
|
57 |
+
We sadly didn't find the sources of the following, DM us if you recognize your set !
|
58 |
|
59 |
+
- Opus_Instruct-v2-6.5K-Filtered-v2-sharegpt
|
60 |
+
- claude_sharegpt_trimmed
|
61 |
+
- CapybaraPure_Decontaminated-ShareGPT_reduced
|
62 |
+
|
63 |
+
## Datasets credits:
|
64 |
+
- Epiculous
|
65 |
+
- ChaoticNeutrals
|
66 |
+
- Gryphe
|
67 |
+
- meseca
|
68 |
+
- PJMixers
|
69 |
+
- NobodyExistsOnTheInternet
|
70 |
+
- cgato
|
71 |
+
- kalomaze
|
72 |
+
- Doctor-Shotgun
|
73 |
+
- Norquinal
|
74 |
+
- nothingiisreal
|
75 |
|
76 |
## Others
|
77 |
|