davda54 commited on
Commit
d5c0069
1 Parent(s): f230207

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -34,9 +34,9 @@ The released weights are still a work in progress and they might change in the f
34
  The corpus was compiled by this process:
35
 
36
  1. We gathered all openly available datasets: [Aya](https://huggingface.co/datasets/CohereForAI/aya_dataset), [OASST 1](https://huggingface.co/datasets/OpenAssistant/oasst1), [OASST 2](https://huggingface.co/datasets/OpenAssistant/oasst2), [OIG-small-chip2](https://huggingface.co/datasets/laion/OIG), [No Robots](https://huggingface.co/datasets/HuggingFaceH4/no_robots), [Dolly](https://huggingface.co/datasets/databricks/databricks-dolly-15k) and [Glaive code assistant](Glaive-code-assistant-v2).
37
- 2. These were first manually inspected and filtered, and then automatically filtered with [Mixtral-8x7B](mistralai/Mixtral-8x7B-Instruct-v0.1) to remove incorrect, offensive, non-English and American-centric responses.
38
- 3. The responses were augmented to be more descriptive by [Mixtral-8x7B](mistralai/Mixtral-8x7B-Instruct-v0.1).
39
- 4. Since most of that dataset contains only a single dialogue turn, we generated more turns using [Mixtral-8x7B](mistralai/Mixtral-8x7B-Instruct-v0.1).
40
  5. Finally, we translated the resulting dataset into Bokmål and Nynorsk using [NorMistral-7b-warm](https://huggingface.co/norallm/normistral-7b-warm).
41
 
42
  ## About the base model
 
34
  The corpus was compiled by this process:
35
 
36
  1. We gathered all openly available datasets: [Aya](https://huggingface.co/datasets/CohereForAI/aya_dataset), [OASST 1](https://huggingface.co/datasets/OpenAssistant/oasst1), [OASST 2](https://huggingface.co/datasets/OpenAssistant/oasst2), [OIG-small-chip2](https://huggingface.co/datasets/laion/OIG), [No Robots](https://huggingface.co/datasets/HuggingFaceH4/no_robots), [Dolly](https://huggingface.co/datasets/databricks/databricks-dolly-15k) and [Glaive code assistant](Glaive-code-assistant-v2).
37
+ 2. These were first manually inspected and filtered, and then automatically filtered with [Mixtral-8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) to remove incorrect, offensive, non-English and American-centric responses.
38
+ 3. The responses were augmented to be more descriptive by [Mixtral-8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1).
39
+ 4. Since most of that dataset contains only a single dialogue turn, we generated more turns using [Mixtral-8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1).
40
  5. Finally, we translated the resulting dataset into Bokmål and Nynorsk using [NorMistral-7b-warm](https://huggingface.co/norallm/normistral-7b-warm).
41
 
42
  ## About the base model