Wav2Vec2-Conformer-Large-960h with Relative Position Embeddings + 4-gram

This model is identical to Facebook's wav2vec2-conformer-rel-pos-large-960h-ft, but is augmented with an English 4-gram. The 4-gram.arpa.gz of Librispeech's official ngrams is used.

Evaluation

This code snippet shows how to evaluate patrickvonplaten/wav2vec2-conformer-rel-pos-large-960h-ft-4-gram on LibriSpeech's "clean" and "other" test data.

from datasets import load_dataset
from transformers import AutoModelForCTC, AutoProcessor
import torch
from jiwer import wer

model_id = "patrickvonplaten/wav2vec2-conformer-rel-pos-large-960h-ft-4-gram"

librispeech_eval = load_dataset("librispeech_asr", "other", split="test")

model = AutoModelForCTC.from_pretrained(model_id).to("cuda")
processor = AutoProcessor.from_pretrained(model_id)

def map_to_pred(batch):
    inputs = processor(batch["audio"]["array"], sampling_rate=16_000, return_tensors="pt")

    inputs = {k: v.to("cuda") for k,v in inputs.items()}

    with torch.no_grad():
        logits = model(**inputs).logits

    transcription = processor.batch_decode(logits.cpu().numpy()).text[0]
    batch["transcription"] = transcription
    return batch

result = librispeech_eval.map(map_to_pred, remove_columns=["audio"])

print(wer(result["text"], result["transcription"]))

Result (WER):

"clean" "other"
1.94 3.54
Downloads last month
18
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train patrickvonplaten/wav2vec2-conformer-rel-pos-large-960h-ft-4-gram

Evaluation results