orai-nlp
/

ElhBERTeu-medium

Feature Extraction

Inference Endpoints

Model card Files Files and versions Community

GorkaUrbizu commited on Oct 30, 2023

Commit

ca5049c

•

1 Parent(s): beb9522

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -11,7 +11,7 @@ tags:
 This is the medium-size version of [ElhBERTeu](https://huggingface.co/orai-nlp/ElhBERTeu) model, the BERT-base for Basque introduced in [BasqueGLUE: A Natural Language Understanding Benchmark for Basque](https://aclanthology.org/2022.lrec-1.172/).
-To train ElhBERTeu-medium was trained over the same corpus as for ElhBERTeu, for which we employed different corpora sources from several domains: updated (2021) national and local news sources, Basque Wikipedia, as well as novel news sources and texts from other domains, such as science (both academic and divulgative), literature or subtitles. More details about the corpora used and their sizes are shown in the following table. Texts from news sources were oversampled (duplicated) as done during the training of BERTeus. In total 575M tokens were used for pre-training ElhBERTeu.
 |Domain     | Size     |
 |-----------|----------|

 This is the medium-size version of [ElhBERTeu](https://huggingface.co/orai-nlp/ElhBERTeu) model, the BERT-base for Basque introduced in [BasqueGLUE: A Natural Language Understanding Benchmark for Basque](https://aclanthology.org/2022.lrec-1.172/).
+ElhBERTeu-medium was trained over the same corpus as for ElhBERTeu, for which we employed different corpora sources from several domains: updated (2021) national and local news sources, Basque Wikipedia, as well as novel news sources and texts from other domains, such as science (both academic and divulgative), literature or subtitles. More details about the corpora used and their sizes are shown in the following table. Texts from news sources were oversampled (duplicated) as done during the training of BERTeus. In total 575M tokens were used for pre-training ElhBERTeu.
 |Domain     | Size     |
 |-----------|----------|