Wauplin's picture
Wauplin HF staff
Set `library_name` to `tf-keras`.
ac34c45 verified
|
raw
history blame
1.78 kB
---
library_name: tf-keras
---
## Model description
This repo contains the model and the notebook for implementing MelGAN to inverse spectrogram using feature matching [MelGAN-based spectrogram inversion using feature matching](https://keras.io/examples/audio/melgan_spectrogram_inversion/).
Full credits go to [Darshan Deshpande](https://twitter.com/getdarshan)
Reproduced by [Vu Minh Chien](https://www.linkedin.com/in/vumichien/)
Motivation: Autoregressive vocoders have been ubiquitous for the majority of the history of speech processing, but for most of their existence they have lacked parallelism. MelGAN is a non-autoregressive, fully convolutional vocoder architecture used for purposes ranging from spectral inversion and speech enhancement to present-day state-of-the-art speech synthesis when used as a decoder with models like Tacotron2 or FastSpeech that convert text to mel spectrograms.
LJSpeech dataset was used in this tutorial. The LJSpeech dataset is primarily used for text-to-speech and consists of 13,100 discrete speech samples taken from 7 non-fiction books, having a total length of approximately 24 hours
## Intended uses & limitations
The MelGAN implemented in this tutorial is similar to the original implementation with only the difference in the method of padding for convolutions where we will use 'same' instead of reflecting padding.
### Training hyperparameters
The following hyperparameters were used during training:
- generator_learning_rate: 1e-5
- discriminator_learning_rate: 1e-6
- train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- num_epochs: 20
## Model Plot
### View Model Demo
![Model Demo](./demo.png)
<details>
<summary>View Model Plot</summary>
![Model Image](./model.png)
</details>