File size: 5,054 Bytes
213ad52 5cd0e04 213ad52 b527eb8 5cd0e04 213ad52 5cd0e04 213ad52 5cd0e04 7128e96 5cd0e04 213ad52 5cd0e04 213ad52 5cd0e04 213ad52 7128e96 213ad52 5cd0e04 213ad52 2522b1a 213ad52 5cd0e04 213ad52 5cd0e04 213ad52 5cd0e04 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
---
license: apache-2.0
language:
- ca
datasets:
- projecte-aina/3catparla_asr
tags:
- audio
- automatic-speech-recognition
- catalan
- faster-whisper
- whisper-large-v3
- catalonia
- barcelona-supercomputing-center
- projecte-aina
- 3catparla
---
# faster-whisper-large-v3-ca-3catparla
## Table of Contents
<details>
<summary>Click to expand</summary>
- [Model Description](#model-description)
- [Intended Uses and Limitations](#intended-uses-and-limitations)
- [How to Get Started with the Model](#how-to-get-started-with-the-model)
- [Conversion Details](#conversion-details)
- [Citation](#citation)
- [Additional information](#additional-information)
</details>
## Summary
The "faster-whisper-large-v3-ca-3catparla" is an acoustic model based on a [faster-whisper](https://github.com/guillaumekln/faster-whisper/tree/master) version of [projecte-aina/whisper-large-v3-ca-3catparla](https://huggingface.co/projecte-aina/whisper-large-v3-ca-3catparla) suitable for Automatic Speech Recognition in Catalan.
## Model Description
The "faster-whisper-large-v3-ca-3catparla" is the result of converting the [projecte-aina/whisper-large-v3-ca-3catparla](https://huggingface.co/projecte-aina/whisper-large-v3-ca-3catparla) into a lighter model using a python module called [faster-whisper](https://github.com/guillaumekln/faster-whisper/tree/master).
The specific dataset used to create the [projecte-aina/whisper-large-v3-ca-3catparla](https://huggingface.co/projecte-aina/whisper-large-v3-ca-3catparla) model is called ["3CatParla"](https://huggingface.co/datasets/projecte-aina/3catparla_asr).
## Intended Uses and Limitations
This model can used for Automatic Speech Recognition (ASR) in Catalan. The model is intended to transcribe audio files in Catalan to plain text without punctuation.
## How to Get Started with the Model
To see an updated and functional version of this code, please see our our [Notebook](https://colab.research.google.com/drive/1v_3m1aR9CwYXgPVBlhwDI9Hz4V5Dlh95?usp=sharing
).
### Installation
In order to use this model, you may install [faster-whisper](https://github.com/guillaumekln/faster-whisper/tree/master)
Create a virtual environment:
```bash
python -m venv /path/to/venv
```
Activate the environment:
```bash
source /path/to/venv/bin/activate
```
Install the modules:
```bash
pip install faster-whisper
```
### For Inference
In order to transcribe audio in Catalan using this model, you can follow this example:
```python
from faster_whisper import WhisperModel
model_size = "projecte-aina/faster-whisper-large-v3-ca-3catparla"
# Run on GPU with FP16
model = WhisperModel(model_size, device="cuda", compute_type="float16")
# or run on GPU with INT8
#model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")
# or run on CPU with INT8
# model = WhisperModel(model_size, device="cpu", compute_type="int8")
segments, info = model.transcribe("audio_in_catalan.mp3", beam_size=5, task="transcribe",language="ca")
print("Detected language '%s' with probability %f" % (info.language, info.language_probability))
for segment in segments:
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
```
## Conversion Details
### Conversion procedure
This model is not a direct result of training. It is a conversion of a [Whisper](https://huggingface.co/openai/whisper-large-v3) model using [faster-whisper](https://github.com/guillaumekln/faster-whisper/tree/master). The procedure to create the model is as follows:
```bash
ct2-transformers-converter --model projecte-aina/whisper-large-v3-ca-3catparla
--output_dir faster-whisper-large-v3-ca-3catparla
--copy_files preprocessor_config.json
--quantization float16
```
## Citation
If this model contributes to your research, please cite the work:
```bibtex
@misc{mena2024fastwhis3catparla,
title={Acoustic Model in Catalan: faster-whisper-large-v3-ca-3catparla.},
author={Hernandez Mena, Carlos Daniel; Armentano-Oller, Carme; Solito, Sarah; Külebi, Baybars},
organization={Barcelona Supercomputing Center},
url={https://huggingface.co/projecte-aina/faster-whisper-large-v3-ca-3catparla},
year={2024},
}
```
## Additional Information
### Author
The conversion process was perform during July (2024) in the [Language Technologies Unit](https://huggingface.co/BSC-LT) of the [Barcelona Supercomputing Center](https://www.bsc.es/) by [Carlos Daniel Hernández Mena](https://huggingface.co/carlosdanielhernandezmena).
### Contact
For further information, please send an email to <langtech@bsc.es>.
### Copyright
Copyright(c) 2024 by Language Technologies Unit, Barcelona Supercomputing Center.
### License
[Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0)
### Funding
This work has been promoted and financed by the Generalitat de Catalunya through the [Aina project](https://projecteaina.cat/).
The conversion of the model was possible thanks to the compute time provided by [Barcelona Supercomputing Center](https://www.bsc.es/) through MareNostrum 5.
|