license: apache-2.0
language:
- ca
datasets:
- projecte-aina/3catparla_asr
tags:
- audio
- automatic-speech-recognition
- catalan
- faster-whisper
- whisper-large-v3
- catalonia
- barcelona-supercomputing-center
- projecte-aina
- 3catparla
faster-whisper-large-v3-ca-3catparla
Table of Contents
Click to expand
Summary
The "faster-whisper-large-v3-ca-3catparla" is an acoustic model based on a faster-whisper version of projecte-aina/whisper-large-v3-ca-3catparla suitable for Automatic Speech Recognition in Catalan.
Model Description
The "faster-whisper-large-v3-ca-3catparla" is the result of converting the projecte-aina/whisper-large-v3-ca-3catparla into a lighter model using a python module called faster-whisper.
The specific dataset used to create the projecte-aina/whisper-large-v3-ca-3catparla model is called "3CatParla".
Intended Uses and Limitations
This model can used for Automatic Speech Recognition (ASR) in Catalan. The model is intended to transcribe audio files in Catalan to plain text without punctuation.
How to Get Started with the Model
To see an updated and functional version of this code, please see our our Notebook.
Installation
In order to use this model, you may install faster-whisper
Create a virtual environment:
python -m venv /path/to/venv
Activate the environment:
source /path/to/venv/bin/activate
Install the modules:
pip install faster-whisper
For Inference
In order to transcribe audio in Catalan using this model, you can follow this example:
from faster_whisper import WhisperModel
model_size = "projecte-aina/faster-whisper-large-v3-ca-3catparla"
# Run on GPU with FP16
model = WhisperModel(model_size, device="cuda", compute_type="float16")
# or run on GPU with INT8
#model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")
# or run on CPU with INT8
# model = WhisperModel(model_size, device="cpu", compute_type="int8")
segments, info = model.transcribe("audio_in_catalan.mp3", beam_size=5, task="transcribe",language="ca")
print("Detected language '%s' with probability %f" % (info.language, info.language_probability))
for segment in segments:
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
Conversion Details
Conversion procedure
This model is not a direct result of training. It is a conversion of a Whisper model using faster-whisper. The procedure to create the model is as follows:
ct2-transformers-converter --model projecte-aina/whisper-large-v3-ca-3catparla
--output_dir faster-whisper-large-v3-ca-3catparla
--copy_files preprocessor_config.json
--quantization float16
Citation
If this model contributes to your research, please cite the work:
@misc{mena2024fastwhis3catparla,
title={Acoustic Model in Catalan: faster-whisper-large-v3-ca-3catparla.},
author={Hernandez Mena, Carlos Daniel; Armentano-Oller, Carme; Solito, Sarah; Külebi, Baybars},
organization={Barcelona Supercomputing Center},
url={https://huggingface.co/projecte-aina/faster-whisper-large-v3-ca-3catparla},
year={2024},
}
Additional Information
Author
The conversion process was perform during July (2024) in the Language Technologies Unit of the Barcelona Supercomputing Center by Carlos Daniel Hernández Mena.
Contact
For further information, please send an email to langtech@bsc.es.
Copyright
Copyright(c) 2024 by Language Technologies Unit, Barcelona Supercomputing Center.
License
Funding
This work has been promoted and financed by the Generalitat de Catalunya through the Aina project.
The conversion of the model was possible thanks to the compute time provided by Barcelona Supercomputing Center through MareNostrum 5.