Incorrect config file
The configuration attached to this model is of mbart 50 which makes it completely unusable.
Hey
@shrey-jasuja
, this is a SpeechEncoderDecoderModel
, which uses a speech encoder and a text (mbart) decoder. As said in the model card:
The encoder was warm-started from the facebook/wav2vec2-xls-r-1b checkpoint and the decoder from the facebook/mbart-large-50 checkpoint. Consequently, the encoder-decoder model was fine-tuned on 21 {lang} -> en translation pairs of the Covost2 dataset.
I understand but the inference code under the current form doesn't work. The tokenizer needs to be defined explicitly. The following changes worked for me:
import torch
from transformers import SpeechEncoderDecoderModel,MBart50Tokenizer
from datasets import load_dataset
tokenizer = MBart50Tokenizer.from_pretrained("facebook/mbart-large-50")
from transformers import Wav2Vec2FeatureExtractor
feature_extractor = Wav2Vec2FeatureExtractor("facebook/wav2vec2-xls-r-2b-21-to-en")
from transformers import pipeline
asr=pipeline(model="facebook/wav2vec2-xls-r-2b-21-to-en",tokenizer=tokenizer,feature_extractor=feature_extractor,device=0)
audio = item['file']
translation = asr(audio)["text"]
Pinging @sanchit-gandhi for advice :)
Indeed - the code examples here are incorrect. Will be fixed by #6!