Update README with transformers usage + fix typos
#7
by
ylacombe
HF staff
- opened
README.md
CHANGED
@@ -111,7 +111,7 @@ library_name: seamless_communication
|
|
111 |
|
112 |
# SeamlessM4T v2
|
113 |
|
114 |
-
SeamlessM4T is our foundational all-in-one **M**assively **M**ultilingual and **M**ultimodal **M**achine **T**ranslation model delivering high-quality translation for speech and text in nearly 100 languages.
|
115 |
|
116 |
SeamlessM4T models support the tasks of:
|
117 |
- Speech-to-speech translation (S2ST)
|
@@ -125,12 +125,13 @@ SeamlessM4T models support:
|
|
125 |
- 💬 96 Languages for text input/output.
|
126 |
- 🔊 35 languages for speech output.
|
127 |
|
128 |
-
🌟 We are releasing
|
129 |
This new model improves over SeamlessM4T v1 in quality as well as inference speed in speech generation tasks.
|
130 |
|
131 |
The v2 version of SeamlessM4T is a multitask adaptation of our novel *UnitY2* architecture.
|
132 |
*Unity2* with its hierarchical character-to-unit upsampling and non-autoregressive text-to-unit decoding considerably improves over SeamlessM4T v1 in quality and inference speed.
|
133 |
|
|
|
134 |
|
135 |
![SeamlessM4T architectures](seamlessm4t_arch.svg)
|
136 |
|
@@ -153,6 +154,58 @@ To reproduce our results or to evaluate using the same metrics over your own tes
|
|
153 |
## Finetuning SeamlessM4T models
|
154 |
Please check out the [Finetuning README here](https://github.com/facebookresearch/seamless_communication/tree/main/src/seamless_communication/cli/m4t/finetune).
|
155 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
156 |
## Supported Languages:
|
157 |
|
158 |
Listed below, are the languages supported by SeamlessM4T-large (v1/v2).
|
|
|
111 |
|
112 |
# SeamlessM4T v2
|
113 |
|
114 |
+
**SeamlessM4T** is our foundational all-in-one **M**assively **M**ultilingual and **M**ultimodal **M**achine **T**ranslation model delivering high-quality translation for speech and text in nearly 100 languages.
|
115 |
|
116 |
SeamlessM4T models support the tasks of:
|
117 |
- Speech-to-speech translation (S2ST)
|
|
|
125 |
- 💬 96 Languages for text input/output.
|
126 |
- 🔊 35 languages for speech output.
|
127 |
|
128 |
+
🌟 We are releasing SeamlessM4T v2, an updated version with our novel *UnitY2* architecture.
|
129 |
This new model improves over SeamlessM4T v1 in quality as well as inference speed in speech generation tasks.
|
130 |
|
131 |
The v2 version of SeamlessM4T is a multitask adaptation of our novel *UnitY2* architecture.
|
132 |
*Unity2* with its hierarchical character-to-unit upsampling and non-autoregressive text-to-unit decoding considerably improves over SeamlessM4T v1 in quality and inference speed.
|
133 |
|
134 |
+
**SeamlessM4T v2 is also supported by 🤗 Transformers, more on it [in the dedicated section below](#transformers-usage).**
|
135 |
|
136 |
![SeamlessM4T architectures](seamlessm4t_arch.svg)
|
137 |
|
|
|
154 |
## Finetuning SeamlessM4T models
|
155 |
Please check out the [Finetuning README here](https://github.com/facebookresearch/seamless_communication/tree/main/src/seamless_communication/cli/m4t/finetune).
|
156 |
|
157 |
+
## Transformers usage
|
158 |
+
|
159 |
+
SeamlessM4T is available in the 🤗 Transformers library, requiring minimal dependencies. Steps to get started:
|
160 |
+
|
161 |
+
1. First install the 🤗 [Transformers library](https://github.com/huggingface/transformers) from main and [sentencepiece](https://github.com/google/sentencepiece):
|
162 |
+
|
163 |
+
```
|
164 |
+
pip install git+https://github.com/huggingface/transformers.git sentencepiece
|
165 |
+
```
|
166 |
+
|
167 |
+
2. Run the following Python code to generate speech samples. Here the target language is Russian:
|
168 |
+
|
169 |
+
```py
|
170 |
+
from transformers import AutoProcessor, SeamlessM4Tv2Model
|
171 |
+
|
172 |
+
processor = AutoProcessor.from_pretrained("facebook/seamless-m4t-v2-large")
|
173 |
+
model = SeamlessM4Tv2Model.from_pretrained("facebook/seamless-m4t-v2-large")
|
174 |
+
|
175 |
+
# from text
|
176 |
+
text_inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt")
|
177 |
+
audio_array_from_text = model.generate(**text_inputs, tgt_lang="rus")[0].cpu().numpy().squeeze()
|
178 |
+
|
179 |
+
# from audio
|
180 |
+
audio, orig_freq = torchaudio.load("https://www2.cs.uic.edu/~i101/SoundFiles/preamble10.wav")
|
181 |
+
audio = torchaudio.functional.resample(audio, orig_freq=orig_freq, new_freq=16_000) # must be a 16 kHz waveform array
|
182 |
+
audio_inputs = processor(audios=audio, return_tensors="pt")
|
183 |
+
audio_array_from_audio = model.generate(**audio_inputs, tgt_lang="rus")[0].cpu().numpy().squeeze()
|
184 |
+
```
|
185 |
+
|
186 |
+
3. Listen to the audio samples either in an ipynb notebook:
|
187 |
+
|
188 |
+
```py
|
189 |
+
from IPython.display import Audio
|
190 |
+
|
191 |
+
sample_rate = model.sampling_rate
|
192 |
+
Audio(audio_array_from_text, rate=sample_rate)
|
193 |
+
# Audio(audio_array_from_audio, rate=sample_rate)
|
194 |
+
```
|
195 |
+
|
196 |
+
Or save them as a `.wav` file using a third-party library, e.g. `scipy`:
|
197 |
+
|
198 |
+
```py
|
199 |
+
import scipy
|
200 |
+
|
201 |
+
sample_rate = model.sampling_rate
|
202 |
+
scipy.io.wavfile.write("out_from_text.wav", rate=sample_rate, data=audio_array_from_text)
|
203 |
+
# scipy.io.wavfile.write("out_from_audio.wav", rate=sample_rate, data=audio_array_from_audio)
|
204 |
+
```
|
205 |
+
For more details on using the SeamlessM4T model for inference using the 🤗 Transformers library, refer to the
|
206 |
+
**[SeamlessM4T v2 docs](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t_v2)** or to this **hands-on [Google Colab](https://colab.research.google.com/github/ylacombe/scripts_and_notebooks/blob/main/v2_seamless_m4t_hugging_face.ipynb).**
|
207 |
+
|
208 |
+
|
209 |
## Supported Languages:
|
210 |
|
211 |
Listed below, are the languages supported by SeamlessM4T-large (v1/v2).
|