--- license: apache-2.0 base_model: facebook/wav2vec2-large-xlsr-53 tags: - generated_from_trainer datasets: - common_voice_13_0 metrics: - wer model-index: - name: wav2vec2-large-xlsr-mvc-swahili results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: common_voice_13_0 type: common_voice_13_0 config: sw split: test args: sw metrics: - name: Wer type: wer value: 0.2 language: - sw --- # wav2vec2-large-xlsr-mvc-swahili This model is a finetuned version of facebook/wav2vec2-large-xlsr-53. Following inspiration from [alamsher/wav2vec2-large-xlsr-53-common-voice-s](https://huggingface.co/alamsher/wav2vec2-large-xlsr-53-common-voice-sw) # How to use the model There was an issue with vocab, seems like there are special characters included and they were not considered during training You could try ```python from transformers import AutoProcessor, AutoModelForCTC repo_name = "eddiegulay/wav2vec2-large-xlsr-mvc-swahili" processor = AutoProcessor.from_pretrained(repo_name) model = AutoModelForCTC.from_pretrained(repo_name) # if you have GPU # move model to CUDA model = model.to("cuda") def transcribe(audio_path): # Load the audio file audio_input, sample_rate = torchaudio.load(audio_path) target_sample_rate = 16000 audio_input = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=target_sample_rate)(audio_input) # Preprocess the audio data input_dict = processor(audio_input[0], return_tensors="pt", padding=True, sampling_rate=16000) # Perform inference and transcribe logits = model(input_dict.input_values.to("cuda")).logits pred_ids = torch.argmax(logits, dim=-1)[0] transcription = processor.decode(pred_ids) return transcription transcript = transcribe('your_audio.mp3') ```