File size: 5,054 Bytes
213ad52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5cd0e04
 
 
213ad52
b527eb8
 
 
 
5cd0e04
 
213ad52
5cd0e04
213ad52
5cd0e04
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7128e96
 
 
5cd0e04
 
 
 
 
213ad52
5cd0e04
 
 
 
 
 
 
 
 
213ad52
 
5cd0e04
 
213ad52
 
 
 
 
 
 
 
 
 
 
 
 
 
7128e96
213ad52
 
 
 
 
 
 
5cd0e04
 
 
 
 
 
 
 
 
 
 
 
 
 
 
213ad52
 
 
2522b1a
213ad52
 
 
 
 
 
5cd0e04
 
 
 
 
 
 
 
 
 
 
 
 
 
 
213ad52
5cd0e04
 
213ad52
5cd0e04
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
---
license: apache-2.0
language:
- ca
datasets:
- projecte-aina/3catparla_asr
tags:
- audio
- automatic-speech-recognition
- catalan
- faster-whisper
- whisper-large-v3
- catalonia
- barcelona-supercomputing-center
- projecte-aina
- 3catparla
---
# faster-whisper-large-v3-ca-3catparla

## Table of Contents
<details>
<summary>Click to expand</summary>

- [Model Description](#model-description)
- [Intended Uses and Limitations](#intended-uses-and-limitations)
- [How to Get Started with the Model](#how-to-get-started-with-the-model)
- [Conversion Details](#conversion-details)
- [Citation](#citation)
- [Additional information](#additional-information)

</details>

## Summary

The "faster-whisper-large-v3-ca-3catparla" is an acoustic model based on a [faster-whisper](https://github.com/guillaumekln/faster-whisper/tree/master) version of [projecte-aina/whisper-large-v3-ca-3catparla](https://huggingface.co/projecte-aina/whisper-large-v3-ca-3catparla) suitable for Automatic Speech Recognition in Catalan.

## Model Description

The "faster-whisper-large-v3-ca-3catparla" is the result of converting the [projecte-aina/whisper-large-v3-ca-3catparla](https://huggingface.co/projecte-aina/whisper-large-v3-ca-3catparla) into a lighter model using a python module called [faster-whisper](https://github.com/guillaumekln/faster-whisper/tree/master).

The specific dataset used to create the [projecte-aina/whisper-large-v3-ca-3catparla](https://huggingface.co/projecte-aina/whisper-large-v3-ca-3catparla) model is called ["3CatParla"](https://huggingface.co/datasets/projecte-aina/3catparla_asr).

## Intended Uses and Limitations

This model can used for Automatic Speech Recognition (ASR) in Catalan. The model is intended to transcribe audio files in Catalan to plain text without punctuation.

## How to Get Started with the Model

To see an updated and functional version of this code, please see our our [Notebook](https://colab.research.google.com/drive/1v_3m1aR9CwYXgPVBlhwDI9Hz4V5Dlh95?usp=sharing
).

### Installation

In order to use this model, you may install [faster-whisper](https://github.com/guillaumekln/faster-whisper/tree/master) 

Create a virtual environment:
```bash
python -m venv /path/to/venv
```
Activate the environment:
```bash
source /path/to/venv/bin/activate
```
Install the modules:
```bash
pip install faster-whisper
```

### For Inference
In order to transcribe audio in Catalan using this model, you can follow this example:

```python
from faster_whisper import WhisperModel

model_size = "projecte-aina/faster-whisper-large-v3-ca-3catparla"

# Run on GPU with FP16
model = WhisperModel(model_size, device="cuda", compute_type="float16")

# or run on GPU with INT8
#model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")
# or run on CPU with INT8
# model = WhisperModel(model_size, device="cpu", compute_type="int8")

segments, info = model.transcribe("audio_in_catalan.mp3", beam_size=5, task="transcribe",language="ca")

print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
```

## Conversion Details

### Conversion procedure

This model is not a direct result of training. It is a conversion of a [Whisper](https://huggingface.co/openai/whisper-large-v3) model using [faster-whisper](https://github.com/guillaumekln/faster-whisper/tree/master). The procedure to create the model is as follows:

```bash
ct2-transformers-converter --model projecte-aina/whisper-large-v3-ca-3catparla 
   --output_dir faster-whisper-large-v3-ca-3catparla 
   --copy_files preprocessor_config.json 
   --quantization float16
```

## Citation
If this model contributes to your research, please cite the work:
```bibtex
@misc{mena2024fastwhis3catparla,
      title={Acoustic Model in Catalan: faster-whisper-large-v3-ca-3catparla.}, 
      author={Hernandez Mena, Carlos Daniel; Armentano-Oller, Carme; Solito, Sarah; Külebi, Baybars},
      organization={Barcelona Supercomputing Center},
      url={https://huggingface.co/projecte-aina/faster-whisper-large-v3-ca-3catparla},
      year={2024},
}
```

## Additional Information

### Author

The conversion process was perform during July (2024) in the [Language Technologies Unit](https://huggingface.co/BSC-LT) of the [Barcelona Supercomputing Center](https://www.bsc.es/) by [Carlos Daniel Hernández Mena](https://huggingface.co/carlosdanielhernandezmena).

### Contact
For further information, please send an email to <langtech@bsc.es>.

### Copyright
Copyright(c) 2024 by Language Technologies Unit, Barcelona Supercomputing Center.

### License

[Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0)

### Funding
This work has been promoted and financed by the Generalitat de Catalunya through the [Aina project](https://projecteaina.cat/).

The conversion of the model was possible thanks to the compute time provided by [Barcelona Supercomputing Center](https://www.bsc.es/) through MareNostrum 5.