dathudeptrai commited on
Commit
52621ab
1 Parent(s): 5a7ff82

🦋 Update README

Browse files
Files changed (1) hide show
  1. README.md +81 -0
README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - tensorflowtts
4
+ - audio
5
+ - text-to-speech
6
+ - mel-to-wav
7
+ language: en
8
+ license: apache-2.0
9
+ datasets:
10
+ - ljspeech
11
+ widget:
12
+ - text: "Hello, how are you doing?"
13
+ ---
14
+
15
+ # Multi-band MelGAN trained on LJSpeech (En)
16
+ This repository provides a pretrained [Multi-band MelGAN](https://arxiv.org/abs/2005.05106) trained on LJSpeech dataset (Eng). For a detail of the model, we encourage you to read more about
17
+ [TensorFlowTTS](https://github.com/TensorSpeech/TensorFlowTTS).
18
+
19
+
20
+ ## Install TensorFlowTTS
21
+ First of all, please install TensorFlowTTS with the following command:
22
+ ```
23
+ pip install TensorFlowTTS
24
+ ```
25
+
26
+ ### Converting your Text to Wav Spectrogram
27
+ ```python
28
+ import soundfile as sf
29
+ import numpy as np
30
+
31
+ import tensorflow as tf
32
+
33
+ from tensorflow_tts.inference import AutoProcessor
34
+ from tensorflow_tts.inference import TFAutoModel
35
+
36
+ processor = AutoProcessor.from_pretrained("tensorspeech/tts-tacotron2-ljspeech-en")
37
+ tacotron2 = TFAutoModel.from_pretrained("tensorspeech/tts-tacotron2-ljspeech-en")
38
+ mb_melgan = TFAutoModel.from_pretrained("tensorspeech/tts-mb_melgan-ljspeech-en ")
39
+
40
+ text = "This is a demo to show how to use our model to generate mel spectrogram from raw text."
41
+
42
+ input_ids = processor.text_to_sequence(text)
43
+
44
+ # tacotron2 inference (text-to-mel)
45
+ decoder_output, mel_outputs, stop_token_prediction, alignment_history = tacotron2.inference(
46
+ input_ids=tf.expand_dims(tf.convert_to_tensor(input_ids, dtype=tf.int32), 0),
47
+ input_lengths=tf.convert_to_tensor([len(input_ids)], tf.int32),
48
+ speaker_ids=tf.convert_to_tensor([0], dtype=tf.int32),
49
+ )
50
+
51
+ # melgan inference (mel-to-wav)
52
+ audio = mb_melgan.inference(mel_outputs)[0, :, 0]
53
+
54
+ # save to file
55
+ sf.write('./audio.wav', audio, 22050, "PCM_16")
56
+ ```
57
+
58
+ #### Referencing Multi-band MelGAN
59
+ ```
60
+ @misc{yang2020multiband,
61
+ title={Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech},
62
+ author={Geng Yang and Shan Yang and Kai Liu and Peng Fang and Wei Chen and Lei Xie},
63
+ year={2020},
64
+ eprint={2005.05106},
65
+ archivePrefix={arXiv},
66
+ primaryClass={cs.SD}
67
+ }
68
+ ```
69
+
70
+ #### Referencing TensorFlowTTS
71
+ ```
72
+ @misc{TFTTS,
73
+ author = {Minh Nguyen, Alejandro Miguel Velasquez, Erogol, Kuan Chen, Dawid Kobus, Takuya Ebata,
74
+ Trinh Le and Yunchao He},
75
+ title = {TensorflowTTS},
76
+ year = {2020},
77
+ publisher = {GitHub},
78
+ journal = {GitHub repository},
79
+ howpublished = {\\url{https://github.com/TensorSpeech/TensorFlowTTS}},
80
+ }
81
+ ```