Update README with transformers usage + fix typos

#7
by ylacombe HF staff - opened
Files changed (1) hide show
  1. README.md +55 -2
README.md CHANGED
@@ -111,7 +111,7 @@ library_name: seamless_communication
111
 
112
  # SeamlessM4T v2
113
 
114
- SeamlessM4T is our foundational all-in-one **M**assively **M**ultilingual and **M**ultimodal **M**achine **T**ranslation model delivering high-quality translation for speech and text in nearly 100 languages.
115
 
116
  SeamlessM4T models support the tasks of:
117
  - Speech-to-speech translation (S2ST)
@@ -125,12 +125,13 @@ SeamlessM4T models support:
125
  - 💬 96 Languages for text input/output.
126
  - 🔊 35 languages for speech output.
127
 
128
- 🌟 We are releasing SemalessM4T v2, an updated version with our novel *UnitY2* architecture.
129
  This new model improves over SeamlessM4T v1 in quality as well as inference speed in speech generation tasks.
130
 
131
  The v2 version of SeamlessM4T is a multitask adaptation of our novel *UnitY2* architecture.
132
  *Unity2* with its hierarchical character-to-unit upsampling and non-autoregressive text-to-unit decoding considerably improves over SeamlessM4T v1 in quality and inference speed.
133
 
 
134
 
135
  ![SeamlessM4T architectures](seamlessm4t_arch.svg)
136
 
@@ -153,6 +154,58 @@ To reproduce our results or to evaluate using the same metrics over your own tes
153
  ## Finetuning SeamlessM4T models
154
  Please check out the [Finetuning README here](https://github.com/facebookresearch/seamless_communication/tree/main/src/seamless_communication/cli/m4t/finetune).
155
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
156
  ## Supported Languages:
157
 
158
  Listed below, are the languages supported by SeamlessM4T-large (v1/v2).
 
111
 
112
  # SeamlessM4T v2
113
 
114
+ **SeamlessM4T** is our foundational all-in-one **M**assively **M**ultilingual and **M**ultimodal **M**achine **T**ranslation model delivering high-quality translation for speech and text in nearly 100 languages.
115
 
116
  SeamlessM4T models support the tasks of:
117
  - Speech-to-speech translation (S2ST)
 
125
  - 💬 96 Languages for text input/output.
126
  - 🔊 35 languages for speech output.
127
 
128
+ 🌟 We are releasing SeamlessM4T v2, an updated version with our novel *UnitY2* architecture.
129
  This new model improves over SeamlessM4T v1 in quality as well as inference speed in speech generation tasks.
130
 
131
  The v2 version of SeamlessM4T is a multitask adaptation of our novel *UnitY2* architecture.
132
  *Unity2* with its hierarchical character-to-unit upsampling and non-autoregressive text-to-unit decoding considerably improves over SeamlessM4T v1 in quality and inference speed.
133
 
134
+ **SeamlessM4T v2 is also supported by 🤗 Transformers, more on it [in the dedicated section below](#transformers-usage).**
135
 
136
  ![SeamlessM4T architectures](seamlessm4t_arch.svg)
137
 
 
154
  ## Finetuning SeamlessM4T models
155
  Please check out the [Finetuning README here](https://github.com/facebookresearch/seamless_communication/tree/main/src/seamless_communication/cli/m4t/finetune).
156
 
157
+ ## Transformers usage
158
+
159
+ SeamlessM4T is available in the 🤗 Transformers library, requiring minimal dependencies. Steps to get started:
160
+
161
+ 1. First install the 🤗 [Transformers library](https://github.com/huggingface/transformers) from main and [sentencepiece](https://github.com/google/sentencepiece):
162
+
163
+ ```
164
+ pip install git+https://github.com/huggingface/transformers.git sentencepiece
165
+ ```
166
+
167
+ 2. Run the following Python code to generate speech samples. Here the target language is Russian:
168
+
169
+ ```py
170
+ from transformers import AutoProcessor, SeamlessM4Tv2Model
171
+
172
+ processor = AutoProcessor.from_pretrained("facebook/seamless-m4t-v2-large")
173
+ model = SeamlessM4Tv2Model.from_pretrained("facebook/seamless-m4t-v2-large")
174
+
175
+ # from text
176
+ text_inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt")
177
+ audio_array_from_text = model.generate(**text_inputs, tgt_lang="rus")[0].cpu().numpy().squeeze()
178
+
179
+ # from audio
180
+ audio, orig_freq = torchaudio.load("https://www2.cs.uic.edu/~i101/SoundFiles/preamble10.wav")
181
+ audio = torchaudio.functional.resample(audio, orig_freq=orig_freq, new_freq=16_000) # must be a 16 kHz waveform array
182
+ audio_inputs = processor(audios=audio, return_tensors="pt")
183
+ audio_array_from_audio = model.generate(**audio_inputs, tgt_lang="rus")[0].cpu().numpy().squeeze()
184
+ ```
185
+
186
+ 3. Listen to the audio samples either in an ipynb notebook:
187
+
188
+ ```py
189
+ from IPython.display import Audio
190
+
191
+ sample_rate = model.sampling_rate
192
+ Audio(audio_array_from_text, rate=sample_rate)
193
+ # Audio(audio_array_from_audio, rate=sample_rate)
194
+ ```
195
+
196
+ Or save them as a `.wav` file using a third-party library, e.g. `scipy`:
197
+
198
+ ```py
199
+ import scipy
200
+
201
+ sample_rate = model.sampling_rate
202
+ scipy.io.wavfile.write("out_from_text.wav", rate=sample_rate, data=audio_array_from_text)
203
+ # scipy.io.wavfile.write("out_from_audio.wav", rate=sample_rate, data=audio_array_from_audio)
204
+ ```
205
+ For more details on using the SeamlessM4T model for inference using the 🤗 Transformers library, refer to the
206
+ **[SeamlessM4T v2 docs](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t_v2)** or to this **hands-on [Google Colab](https://colab.research.google.com/github/ylacombe/scripts_and_notebooks/blob/main/v2_seamless_m4t_hugging_face.ipynb).**
207
+
208
+
209
  ## Supported Languages:
210
 
211
  Listed below, are the languages supported by SeamlessM4T-large (v1/v2).