zolicsaki commited on
Commit
61e9f47
1 Parent(s): 13d7f89

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -20
README.md CHANGED
@@ -28,6 +28,7 @@ SambaLingo-Russian-Base is a pretrained Bi-lingual Russian and English model tha
28
  - **Language(s):** Russian, English
29
  - **Finetuned from model:** [Llama 2](https://huggingface.co/meta-llama/Llama-2-7b-hf)
30
  - **Try the chat version of this model**: [SambaLingo-chat-space](https://huggingface.co/spaces/sambanovasystems/SambaLingo-chat-space).
 
31
  - **Blog Post**: [sambalingo-open-source-language-experts](https://sambanova.ai/blog/sambalingo-open-source-language-experts)
32
 
33
  ## Getting Started
@@ -53,19 +54,7 @@ All pre-training is done on the [Cultura-X](https://huggingface.co/datasets/uonl
53
  We extended the vocabulary of the base llama model from 32,000 tokens to 57,000 tokens by adding up to 25,000 non-overlapping tokens from the new language.
54
 
55
  ## Evaluation Results
56
- | | SambaLingo-Russian-Base | saiga_mistral_7b_merged | rugpt3large_based_on_gpt2 | bloom-7b1 | xglm-7.5B | mGPT-13B |
57
- |------------------------------------------|-----------------------------------|--------------------------------------|----------------------|--------------------|---------------------|--------|
58
- | Holdout Perplexity (Lower is better) | **1.444** | 1.556 | 1.611 | 1.797 | 1.504 | 1.806 |
59
- | FLORES en->ru (8 shot, CHRF) | **0.472** | 0.425 | 0.319 | 0.204 | 0.263 | 0.211 |
60
- | FLORES ru->en (8 shot, CHRF) | **0.587** | 0.527 | 0.317 | 0.258 | 0.429 | 0.251 |
61
- | FLORES en->ru (8 shot, BLEU) | **0.194** | 0.145 | 0.074 | 0.012 | 0.045 | 0.021 |
62
- | FLORES ru->en (8 shot, BLEU) | **0.301** | 0.249 | 0.062 | 0.032 | 0.152 | 0.039 |
63
- | Belebele (3 shot) | **39.00%** | 34.44% | 24.33% | 29.00% | 21.89% | 23.67% |
64
- | SIB-200 (3 shot) | 69.12% | **78.92%** | 32.84% | 46.08% | 63.73% | 42.65% |
65
- | XNLI (0 shot) | 35.29% | **49.78%** | 45.61% | 42.61% | 46.39% | 45.39% |
66
- | XStoryCloze (0 shot) | **71.67%** | 68.96% | 60.75% | 52.68% | 63.40% | 59.43% |
67
- | XWinograd (0 shot) | **69.21%** | 66.67% | 60.63% | 57.14% | 63.17% | 60.00% |
68
-
69
 
70
  ## Uses
71
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
@@ -107,12 +96,12 @@ We would like to give a special thanks to the following groups:
107
 
108
  ## Cite SambaLingo
109
  ```
110
- @software{sambalingo,
111
- title = {{SambaLingo: Open Source Language Experts}},
112
- author = {SambaNova Systems},
113
- url = {https://huggingface.co/sambanovasystems/SambaLingo-Russian-Base}
114
- month = {2},
115
- year = {2024},
116
- version = {1.0},
117
  }
118
  ```
 
28
  - **Language(s):** Russian, English
29
  - **Finetuned from model:** [Llama 2](https://huggingface.co/meta-llama/Llama-2-7b-hf)
30
  - **Try the chat version of this model**: [SambaLingo-chat-space](https://huggingface.co/spaces/sambanovasystems/SambaLingo-chat-space).
31
+ - **Paper:** [SambaLingo: Teaching Large Language Models New Languages](https://arxiv.org/abs/2404.05829)
32
  - **Blog Post**: [sambalingo-open-source-language-experts](https://sambanova.ai/blog/sambalingo-open-source-language-experts)
33
 
34
  ## Getting Started
 
54
  We extended the vocabulary of the base llama model from 32,000 tokens to 57,000 tokens by adding up to 25,000 non-overlapping tokens from the new language.
55
 
56
  ## Evaluation Results
57
+ For evaluation results see our paper: [SambaLingo: Teaching Large Language Models New Languages](https://arxiv.org/abs/2404.05829)
 
 
 
 
 
 
 
 
 
 
 
 
58
 
59
  ## Uses
60
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 
96
 
97
  ## Cite SambaLingo
98
  ```
99
+ @misc{csaki2024sambalingo,
100
+ title={SambaLingo: Teaching Large Language Models New Languages},
101
+ author={Zoltan Csaki and Bo Li and Jonathan Li and Qiantong Xu and Pian Pawakapan and Leon Zhang and Yun Du and Hengyu Zhao and Changran Hu and Urmish Thakker},
102
+ year={2024},
103
+ eprint={2404.05829},
104
+ archivePrefix={arXiv},
105
+ primaryClass={cs.CL}
106
  }
107
  ```