abellion commited on
Commit
b8d7691
1 Parent(s): 3cda576

Show examples using "torchaudio.save" instead of "scipy.io.wavfile.write" in README

Browse files

Since torchaudio is already showcased in the examples for loading audio, it's more consistent to also showcase it for saving audio.

Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -185,13 +185,13 @@ model = SeamlessM4Tv2Model.from_pretrained("facebook/seamless-m4t-v2-large")
185
 
186
  # from text
187
  text_inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt")
188
- audio_array_from_text = model.generate(**text_inputs, tgt_lang="rus")[0].cpu().numpy().squeeze()
189
 
190
  # from audio
191
  audio, orig_freq = torchaudio.load("https://www2.cs.uic.edu/~i101/SoundFiles/preamble10.wav")
192
  audio = torchaudio.functional.resample(audio, orig_freq=orig_freq, new_freq=16_000) # must be a 16 kHz waveform array
193
  audio_inputs = processor(audios=audio, return_tensors="pt")
194
- audio_array_from_audio = model.generate(**audio_inputs, tgt_lang="rus")[0].cpu().numpy().squeeze()
195
  ```
196
 
197
  3. Listen to the audio samples either in an ipynb notebook:
@@ -200,18 +200,18 @@ audio_array_from_audio = model.generate(**audio_inputs, tgt_lang="rus")[0].cpu()
200
  from IPython.display import Audio
201
 
202
  sample_rate = model.sampling_rate
203
- Audio(audio_array_from_text, rate=sample_rate)
204
- # Audio(audio_array_from_audio, rate=sample_rate)
205
  ```
206
 
207
- Or save them as a `.wav` file using a third-party library, e.g. `scipy`:
208
 
209
  ```py
210
- import scipy
211
 
212
  sample_rate = model.sampling_rate
213
- scipy.io.wavfile.write("out_from_text.wav", rate=sample_rate, data=audio_array_from_text)
214
- # scipy.io.wavfile.write("out_from_audio.wav", rate=sample_rate, data=audio_array_from_audio)
215
  ```
216
  For more details on using the SeamlessM4T model for inference using the 🤗 Transformers library, refer to the
217
  **[SeamlessM4T v2 docs](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t_v2)** or to this **hands-on [Google Colab](https://colab.research.google.com/github/ylacombe/scripts_and_notebooks/blob/main/v2_seamless_m4t_hugging_face.ipynb).**
 
185
 
186
  # from text
187
  text_inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt")
188
+ audio_tensor_from_text = model.generate(**text_inputs, tgt_lang="rus")[0]
189
 
190
  # from audio
191
  audio, orig_freq = torchaudio.load("https://www2.cs.uic.edu/~i101/SoundFiles/preamble10.wav")
192
  audio = torchaudio.functional.resample(audio, orig_freq=orig_freq, new_freq=16_000) # must be a 16 kHz waveform array
193
  audio_inputs = processor(audios=audio, return_tensors="pt")
194
+ audio_tensor_from_audio = model.generate(**audio_inputs, tgt_lang="rus")[0]
195
  ```
196
 
197
  3. Listen to the audio samples either in an ipynb notebook:
 
200
  from IPython.display import Audio
201
 
202
  sample_rate = model.sampling_rate
203
+ Audio(audio_tensor_from_text.cpu().numpy().squeeze(), rate=sample_rate)
204
+ # Audio(audio_tensor_from_audio.cpu().numpy().squeeze(), rate=sample_rate)
205
  ```
206
 
207
+ Or save them as a `.wav` file:
208
 
209
  ```py
210
+ import torchaudio
211
 
212
  sample_rate = model.sampling_rate
213
+ torchaudio.save(uri="out_from_text.wav", src=audio_tensor_from_text, sample_rate=sample_rate, channels_first=True)
214
+ # torchaudio.save(uri="out_from_audio.wav", src=audio_tensor_from_audio, sample_rate=sample_rate, channels_first=True)
215
  ```
216
  For more details on using the SeamlessM4T model for inference using the 🤗 Transformers library, refer to the
217
  **[SeamlessM4T v2 docs](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t_v2)** or to this **hands-on [Google Colab](https://colab.research.google.com/github/ylacombe/scripts_and_notebooks/blob/main/v2_seamless_m4t_hugging_face.ipynb).**