laurentiubp commited on
Commit
0968ca2
1 Parent(s): 32e32d6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -12,13 +12,14 @@ short_description: CataLlama models official page
12
 
13
  **CataLlama is a fine-tune of Llama-3 8B on the Catalan language.**
14
 
15
- CataLlama-v0.1 was trained on roughly **445 million new tokens** in three separate stages:
16
 
17
  - **Language enhancement** with raw text - we could also call this "continued pre-training" at a very small scale.
18
  - **Supervised fine-tuning** on instructions consisting of 70% Catalan Language and 30% English Language.
19
  - **DPO fine-tuning** on preferences consisting of 70% Catalan language and 30% English Language.
20
 
21
- CataLlama-v0.2 was trained on roughly **620 million new tokens** in a very similar manner to v0.1, except for the base model which is obtained via a merge.
 
22
 
23
  **Note:** This model is not intended to beat benchmarks, but to demonstrate techniques for augmenting LLMs on new languages and preserve rare languages as part of our world heritage.
24
 
 
12
 
13
  **CataLlama is a fine-tune of Llama-3 8B on the Catalan language.**
14
 
15
+ **CataLlama-v0.1** was trained on roughly **445 million new tokens** in three separate stages:
16
 
17
  - **Language enhancement** with raw text - we could also call this "continued pre-training" at a very small scale.
18
  - **Supervised fine-tuning** on instructions consisting of 70% Catalan Language and 30% English Language.
19
  - **DPO fine-tuning** on preferences consisting of 70% Catalan language and 30% English Language.
20
 
21
+
22
+ **CataLlama-v0.2** was trained on roughly **620 million new tokens** in a very similar manner to v0.1, except for the base model which is obtained via a merge.
23
 
24
  **Note:** This model is not intended to beat benchmarks, but to demonstrate techniques for augmenting LLMs on new languages and preserve rare languages as part of our world heritage.
25