rgomez-itg commited on
Commit
4e01a27
1 Parent(s): bde3400

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -0
README.md CHANGED
@@ -1,3 +1,81 @@
1
  ---
2
  license: cc-by-nc-nd-4.0
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-nc-nd-4.0
3
+ datasets:
4
+ - openslr
5
+ - mozilla-foundation/common_voice_13_0
6
+ language:
7
+ - gl
8
+ pipeline_tag: automatic-speech-recognition
9
+ tags:
10
+ - ITG
11
+ - PyTorch
12
+ - Transformers
13
+ - whisper
14
+ - whisper-small
15
  ---
16
+
17
+ # whisper-base-gl
18
+
19
+ ## Description
20
+
21
+ This is a fine-tuned version of the [openai/whisper-small](https://huggingface.co/openai/whisper-small) pre-trained model for ASR in galician.
22
+
23
+ ---
24
+
25
+ ## Dataset
26
+
27
+ We used two datasets combined:
28
+ 1. The [OpenSLR galician](https://huggingface.co/datasets/openslr/viewer/SLR77) dataset, available in the openslr repository.
29
+ 2. The [Common Voice 13 galician](https://huggingface.co/datasets/mozilla-foundation/common_voice_13_0/viewer/gl) dataset, available in the Common Voice repository.
30
+
31
+ ---
32
+
33
+
34
+ ## Example inference script
35
+
36
+ ### Check this example script to run our model in inference mode
37
+
38
+ ```python
39
+ import torch
40
+ from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
41
+
42
+ filename = "demo.wav" #change this line to the name of your audio file
43
+ sample_rate = 16_000
44
+ processor = AutoProcessor.from_pretrained('ITG/whisper-small-gl')
45
+ model = AutoModelForSpeechSeq2Seq.from_pretrained('ITG/whisper-small-gl')
46
+ device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
47
+ model.to(device)
48
+
49
+ with torch.no_grad():
50
+ speech_array, _ = librosa.load(filename, sr=sample_rate)
51
+ inputs = processor(speech_array, sampling_rate=sample_rate, return_tensors="pt").to(device)
52
+ input_features = inputs.input_features
53
+ generated_ids = model.generate(inputs=input_features, max_length=225)
54
+ decode_output = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
55
+ print(f"ASR Galician whisper-small output: {decode_output}")
56
+ ```
57
+ ---
58
+
59
+ ## Fine-tuning hyper-parameters
60
+
61
+ | **Hyper-parameter** | **Value** |
62
+ |:----------------------------------------:|:---------------------------:|
63
+ | Training batch size | 16 |
64
+ | Evaluation batch size | 8 |
65
+ | Learning rate | 1e-5 |
66
+ | Gradient checkpointing | true |
67
+ | Gradient accumulation steps | 1 |
68
+ | Max training epochs | 100 |
69
+ | Max steps | 4000 |
70
+ | Generate max length | 225 |
71
+ | Warmup training steps (%) | 12,5% |
72
+ | FP16 | true |
73
+ | Metric for best model | wer |
74
+ | Greater is better | false |
75
+
76
+
77
+ ## Fine-tuning in a different dataset or style
78
+
79
+ If you're interested in fine-tuning your own whisper model, we suggest starting with the [openai/whisper-small model](https://huggingface.co/openai/whisper-small). Additionally, you may find the Transformers
80
+ step-by-step guide for [fine-tuning whisper on multilingual ASR datasets](https://huggingface.co/blog/fine-tune-whisper) to be a valuable resource. This guide served as a helpful reference during the training
81
+ process of this Galician whisper-small model!