File size: 1,987 Bytes
96e8020
 
 
 
ef46c45
eac3a78
61f2f82
3c01bec
 
96e8020
61f2f82
3c01bec
 
 
ef46c45
3c01bec
 
 
 
 
 
 
ef46c45
3e128f6
 
 
 
 
 
 
 
96e8020
 
61f2f82
96e8020
3e128f6
 
 
96e8020
 
 
3e128f6
96e8020
 
 
3e128f6
96e8020
 
 
3e128f6
 
 
96e8020
 
 
 
 
 
3e128f6
61f2f82
96e8020
 
 
61f2f82
96e8020
 
48bc185
61f2f82
96e8020
 
3c01bec
96e8020
 
61f2f82
 
 
3e128f6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
---
license: apache-2.0
tags:
- generated_from_trainer
base_model: facebook/wav2vec2-xls-r-300m
datasets:
- common_voice_15_0
metrics:
- wer
model-index:
- name: wav2vec2-xls-r-300m-br
  results:
  - task:
      type: automatic-speech-recognition
      name: Automatic Speech Recognition
    dataset:
      name: common_voice_15_0
      type: common_voice_15_0
      config: br
      split: None
      args: br
    metrics:
    - type: wer
      value: 41
      name: WER
    - type: cer
      value: 14.7
      name: CER
language:
- br
pipeline_tag: automatic-speech-recognition
---

# wav2vec2-xls-r-300m-br

This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on Mozilla Common Voice 15 Breton dataset and [Roadennoù](https://github.com/gweltou/roadennou) dataset. It achieves the following results on the MCV15-br test set:
- Wer: 41.0
- Cer: 14.7

## Model description

This model was trained to assess the performance wav2vec2-xls-r-300m for fine-tuning a Breton ASR model. 

## Intended uses & limitations

This model is a research model. Usage for production is not recommended.

## Training and evaluation data

The training dataset consists of MCV15-br train dataset and 90% of the Roadennoù dataset.
The validation dataset consists of  MCV15-br validation dataset and the remaining 10% of the Roadennoù dataset.
The final test dataset consists of MCV15-br test dataset.

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 6e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 40
- mixed_precision_training: Native AMP


### Framework versions

- Transformers 4.39.1
- Pytorch 2.0.1+cu117
- Datasets 2.18.0
- Tokenizers 0.15.2