Edit model card

Finnish Wav2vec2-XLarge ASR

GetmanY1/wav2vec2-xlarge-fi-150k fine-tuned on 4600 hours of Finnish speech on 16kHz sampled speech audio:

When using the model make sure that your speech input is also sampled at 16Khz.

Model description

The Finnish Wav2Vec2 X-Large has the same architecture and uses the same training objective as the multilingual one described in paper.

GetmanY1/wav2vec2-xlarge-fi-150k is a large-scale, 1-billion parameter monolingual model pre-trained on 158k hours of unlabeled Finnish speech, including KAVI radio and television archive materials, Lahjoita puhetta (Donate Speech), Finnish Parliament, Finnish VoxPopuli.

You can read more about the pre-trained model from this paper. The training scripts are available on GitHub.

Intended uses

You can use this model for Finnish ASR (speech-to-text).

How to use

To transcribe audio files the model can be used as a standalone acoustic model as follows:

from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
from datasets import load_dataset
import torch

# load model and processor
processor = Wav2Vec2Processor.from_pretrained("GetmanY1/wav2vec2-xlarge-fi-150k-finetuned")
model = Wav2Vec2ForCTC.from_pretrained("GetmanY1/wav2vec2-xlarge-fi-150k-finetuned")

# load dummy dataset and read soundfiles
ds = load_dataset("mozilla-foundation/common_voice_16_1", "fi", split='test')

# tokenize
input_values = processor(ds[0]["audio"]["array"], return_tensors="pt", padding="longest").input_values  # Batch size 1

# retrieve logits
logits = model(input_values).logits

# take argmax and decode
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)

Team Members

Feel free to contact us for more details 🤗

Downloads last month
94
Safetensors
Model size
963M params
Tensor type
F32
·
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Model tree for GetmanY1/wav2vec2-xlarge-fi-150k-finetuned

Finetuned
this model

Collection including GetmanY1/wav2vec2-xlarge-fi-150k-finetuned

Evaluation results