Requirements to run?
Hello,
Amazing! Any information on the training data?
This is the first time I have seen a TTS implemented for Malagasy, and honestly for a first, it's a good one! Intonation is a bit odd, but pronunciation for native words is mostly on point.
Can we have a clear list of requirements in the model documentation?
I managed to run and generate a wav file using these steps:
Script:
import numpy as np
import torch
import wave
from transformers import VitsModel, AutoTokenizer
model = VitsModel.from_pretrained("facebook/mms-tts-mlg")
tokenizer = AutoTokenizer.from_pretrained("facebook/mms-tts-mlg")
with open('text.txt', 'r') as f:
text = f.read()
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
output = model(**inputs).waveform
# Assuming `output` is your Torch tensor containing audio data
# Convert Torch tensor to NumPy array
output_np = output.detach().cpu().numpy()
# Ensure data is in the correct shape
if len(output_np.shape) == 1: # If it's a 1D array, reshape it to have two dimensions
output_np = output_np.reshape(-1, 1)
elif len(output_np.shape) > 2: # If it has more than 2 dimensions, raise an error
raise ValueError("Data has more than 2 dimensions")
# Normalize audio data to the range [-1, 1]
max_val = np.max(np.abs(output_np))
if max_val > 1:
output_np = output_np / max_val
# Scale the audio data to fit within the range of signed 16-bit integers
scaled_output = np.int16(output_np * 32767)
# Specify the output file path
output_path = "techno.wav"
# Ensure rate is a positive integer
rate = int(model.config.sampling_rate)
if rate <= 0:
raise ValueError("Sampling rate must be a positive integer")
# Open the WAV file for writing
with wave.open(output_path, 'w') as wf:
# Set parameters
wf.setnchannels(1) # Mono audio
wf.setsampwidth(2) # 16-bit encoding
wf.setframerate(rate) # Sampling rate
# Write audio data
wf.writeframes(scaled_output.tobytes())
Direct Requirements:
torch==2.2.0
Wave==0.0.2
numpy==1.24.4
transformers==4.38.2
Milay be ty ketrika ity
tena mafy ity le
haha, ilay licence io ve de kobon facebook raha hoe tena ketrehan atao anat app ray misy atao payant otan nen le an vaza
reny, d atao fin touch anatsarana nyfreo fa ny asa b ilay tsy maintsy mila data propre entrainenena complet ray baiboly otranio rahahoe ataofeon bevav
Tena raha mbola finetunena ary io mbola tsy azo commercialisena foana . Raha vao ny basen'ilay utilisation à fin commercial dia tsy afaka varotana foana ihany
@radomd92 The model struggles with "r" and "g" sounds, for example gagagaga or gogogogo and also moramora. But it is do far the best Malagasyt TTS I've ever tried. There is still room for improvement but anyway I am amazed.