File size: 979 Bytes
e3bf454 065add3 1ae113d 8853c63 6cf1634 8853c63 6cf1634 9651ccd a8b77e8 3a35ad0 a8b77e8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
---
license: cc-by-nc-4.0
datasets:
- mozilla-foundation/common_voice_11_0
language:
- fr
- es
- pt
- da
- de
- nl
- fy
- zh
- ja
- ar
- sw
- gn
library_name: fairseq
---
**HUTTER-12: H(uBERT) UTTER model covering 12 languages.**
* Total training hours: 1,622 from Romance (French: 300h, Spanish: 300h, Portuguese: 102.3h), West-Germanic (Danish: 3.5h, German: 300h, Dutch: 72.1h, Frisian: 41.2h) and other languages (Chinese (zh-CN): 104.6h, Japanese: 37h, Arabic: 61h, Swahili 300h, Guaraní: 0.4h)
* Number of updates: 400K
* Number of iterations: 3
* Clustering approach: mini-batch K-means (100% of the data)
* Dataset: CommonVoice v13
# Funding
<img src="https://cdn-uploads.huggingface.co/production/uploads/62262e19d36494a6f743a28d/HbzC1C-uHe25ewTy2wyoK.png" width=7% height=7%>
This is an output of the European Project UTTER (Unified Transcription and Translation for Extended Reality) under grant number 101070631. For more information go to https://he-utter.eu/
|