Edit model card

Wav2vec2-Bert-Fongbe

This model is a fine-tuned version of facebook/w2v-bert-2.0. This has a WER of 24.20 on Aloresb dataset, fongbe language.

Model description

This model is a fine-tuned version of the wav2vec2-BERT architecture on the AlorésB dataset for the Fongbe language. Fongbe, a Gbe language, is predominantly spoken in the southern region of Benin. The model has been fine-tuned specifically for Automatic Speech Recognition (ASR) tasks in this language. It can be useful for transcription services, research, and linguistic studies involving Fongbe.

Details

  • Model Name: wav2vec2-bert-fongbe
  • Base Model: facebook/w2v-bert-2.0
  • Fine-tuned on: Aloresb dataset
  • Languages: Fongbe
  • Architecture: Wav2vec2 + BERT
  • Fine-tuning Dataset: Aloresb (Fongbe)

How to use

import torch
import soundfile as sf
from transformers import AutoModelForCTC, Wav2Vec2BertProcessor

model_name = "OctaSpace/wav2vec2-bert-fongbe"

asr_model = AutoModelForCTC.from_pretrained(model_name).to(device)
processor = Wav2Vec2BertProcessor.from_pretrained(model_name)

audio_input, _ = sf.read(file)

inputs = processor([audio_input], sampling_rate=16_000).input_features
features = torch.tensor(inputs)

with torch.no_grad():
  logits = asr_model(features).logits

predicted_ids = torch.argmax(logits, dim=-1)
predictions = processor.batch_decode(predicted_ids, skip_special_tokens=True)

Training Procedure

The model was fine-tuned on the Aloresb dataset, which contains audio recordings and transcriptions in Fongbe.

Training Parameters:

Optimizer: AdamW Learning Rate: 3e-5 Batch Size: 3 Epochs: 3 Evaluation Results The model was evaluated using the Word Error Rate (WER) metric on a test set. Here are the results:

WER: 24.20%

Downloads last month
2
Safetensors
Model size
606M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train OctaSpace/wav2vec2-bert-fongbe

Space using OctaSpace/wav2vec2-bert-fongbe 1