ITG
/

whisper-small-gl

 ---
 license: cc-by-nc-nd-4.0
+datasets:
+- openslr
+- mozilla-foundation/common_voice_13_0
+language:
+- gl
+pipeline_tag: automatic-speech-recognition
+tags:
+- ITG
+- PyTorch
+- Transformers
+- whisper
+- whisper-small
 ---
+# whisper-base-gl
+## Description
+This is a fine-tuned version of the [openai/whisper-small](https://huggingface.co/openai/whisper-small) pre-trained model for ASR in galician.
+---
+## Dataset
+We used two datasets combined:
+1. The [OpenSLR galician](https://huggingface.co/datasets/openslr/viewer/SLR77) dataset, available in the openslr repository.
+2. The [Common Voice 13 galician](https://huggingface.co/datasets/mozilla-foundation/common_voice_13_0/viewer/gl) dataset, available in the Common Voice repository.
+---
+## Example inference script
+### Check this example script to run our model in inference mode
+```python
+import torch
+from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
+filename = "demo.wav"  #change this line to the name of your audio file
+sample_rate = 16_000
+processor = AutoProcessor.from_pretrained('ITG/whisper-small-gl')
+model = AutoModelForSpeechSeq2Seq.from_pretrained('ITG/whisper-small-gl')
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+model.to(device)
+with torch.no_grad():
+  speech_array, _ = librosa.load(filename, sr=sample_rate)
+  inputs = processor(speech_array, sampling_rate=sample_rate, return_tensors="pt").to(device)
+  input_features = inputs.input_features
+  generated_ids = model.generate(inputs=input_features, max_length=225)
+  decode_output = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
+print(f"ASR Galician whisper-small output: {decode_output}")
+```
+---
+## Fine-tuning hyper-parameters
+|            **Hyper-parameter**           |          **Value**          |
+|:----------------------------------------:|:---------------------------:|
+|            Training batch size           |             16              |
+|           Evaluation batch size          |             8               |
+|               Learning rate              |             1e-5            |
+|           Gradient checkpointing         |             true            |
+|         Gradient accumulation steps      |             1               |
+|            Max training epochs           |             100             |
+|                Max steps                 |             4000            |
+|            Generate max length           |             225             |
+|         Warmup training steps (%)        |             12,5%           |
+|                  FP16                    |             true            |
+|          Metric for best model           |             wer             |
+|            Greater is better             |             false           |
+## Fine-tuning in a different dataset or style
+If you're interested in fine-tuning your own whisper model, we suggest starting with the [openai/whisper-small model](https://huggingface.co/openai/whisper-small). Additionally, you may find the Transformers
+step-by-step guide for [fine-tuning whisper on multilingual ASR datasets](https://huggingface.co/blog/fine-tune-whisper) to be a valuable resource. This guide served as a helpful reference during the training
+process of this Galician whisper-small model!