Show examples using "torchaudio.save" instead of "scipy.io.wavfile.write" in README
Browse filesSince torchaudio is already showcased in the examples for loading audio, it's more consistent to also showcase it for saving audio.
README.md
CHANGED
@@ -185,13 +185,13 @@ model = SeamlessM4Tv2Model.from_pretrained("facebook/seamless-m4t-v2-large")
|
|
185 |
|
186 |
# from text
|
187 |
text_inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt")
|
188 |
-
|
189 |
|
190 |
# from audio
|
191 |
audio, orig_freq = torchaudio.load("https://www2.cs.uic.edu/~i101/SoundFiles/preamble10.wav")
|
192 |
audio = torchaudio.functional.resample(audio, orig_freq=orig_freq, new_freq=16_000) # must be a 16 kHz waveform array
|
193 |
audio_inputs = processor(audios=audio, return_tensors="pt")
|
194 |
-
|
195 |
```
|
196 |
|
197 |
3. Listen to the audio samples either in an ipynb notebook:
|
@@ -200,18 +200,18 @@ audio_array_from_audio = model.generate(**audio_inputs, tgt_lang="rus")[0].cpu()
|
|
200 |
from IPython.display import Audio
|
201 |
|
202 |
sample_rate = model.sampling_rate
|
203 |
-
Audio(
|
204 |
-
# Audio(
|
205 |
```
|
206 |
|
207 |
-
Or save them as a `.wav` file
|
208 |
|
209 |
```py
|
210 |
-
import
|
211 |
|
212 |
sample_rate = model.sampling_rate
|
213 |
-
|
214 |
-
#
|
215 |
```
|
216 |
For more details on using the SeamlessM4T model for inference using the 🤗 Transformers library, refer to the
|
217 |
**[SeamlessM4T v2 docs](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t_v2)** or to this **hands-on [Google Colab](https://colab.research.google.com/github/ylacombe/scripts_and_notebooks/blob/main/v2_seamless_m4t_hugging_face.ipynb).**
|
|
|
185 |
|
186 |
# from text
|
187 |
text_inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt")
|
188 |
+
audio_tensor_from_text = model.generate(**text_inputs, tgt_lang="rus")[0]
|
189 |
|
190 |
# from audio
|
191 |
audio, orig_freq = torchaudio.load("https://www2.cs.uic.edu/~i101/SoundFiles/preamble10.wav")
|
192 |
audio = torchaudio.functional.resample(audio, orig_freq=orig_freq, new_freq=16_000) # must be a 16 kHz waveform array
|
193 |
audio_inputs = processor(audios=audio, return_tensors="pt")
|
194 |
+
audio_tensor_from_audio = model.generate(**audio_inputs, tgt_lang="rus")[0]
|
195 |
```
|
196 |
|
197 |
3. Listen to the audio samples either in an ipynb notebook:
|
|
|
200 |
from IPython.display import Audio
|
201 |
|
202 |
sample_rate = model.sampling_rate
|
203 |
+
Audio(audio_tensor_from_text.cpu().numpy().squeeze(), rate=sample_rate)
|
204 |
+
# Audio(audio_tensor_from_audio.cpu().numpy().squeeze(), rate=sample_rate)
|
205 |
```
|
206 |
|
207 |
+
Or save them as a `.wav` file:
|
208 |
|
209 |
```py
|
210 |
+
import torchaudio
|
211 |
|
212 |
sample_rate = model.sampling_rate
|
213 |
+
torchaudio.save(uri="out_from_text.wav", src=audio_tensor_from_text, sample_rate=sample_rate, channels_first=True)
|
214 |
+
# torchaudio.save(uri="out_from_audio.wav", src=audio_tensor_from_audio, sample_rate=sample_rate, channels_first=True)
|
215 |
```
|
216 |
For more details on using the SeamlessM4T model for inference using the 🤗 Transformers library, refer to the
|
217 |
**[SeamlessM4T v2 docs](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t_v2)** or to this **hands-on [Google Colab](https://colab.research.google.com/github/ylacombe/scripts_and_notebooks/blob/main/v2_seamless_m4t_hugging_face.ipynb).**
|