Visit complete Generative AI roadmap

← Back to Topics List

MelGAN(Mel-spectrogram Generative Adversarial Network)

MelGAN is a type of Generative Adversarial Network (GAN) designed for speech synthesis. It was proposed by Kundan Kumar, Rithesh Kumar, Thibault de Boissiere, and Yoshua Bengio in a 2019 paper titled “MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis.”

Unlike other speech synthesis methods that operate on spectrograms, MelGAN operates directly on the raw waveform of speech signals. It takes as input a mel spectrogram, which is a compact representation of the spectral envelope of a speech signal, and generates a high-quality waveform signal that closely matches the input spectrogram.

MelGAN is a conditional GAN, meaning that it is trained to generate waveform signals conditioned on a given input spectrogram. The model is trained using a two-player adversarial game, in which a generator network learns to synthesize high-quality waveform signals, and a discriminator network learns to distinguish between the generated signals and real signals from the training data.


Sources of MelGAN:

Resources Community KGx AICbe YouTube

by Devansh Shukla

"AI Tamil Nadu formely known as AI Coimbatore is a close-Knit community initiative by Navaneeth with a goal to offer world-class AI education to anyone in Tamilnadu for free."