iSincNet (Lightweight Sincnet Spectrogram Vocoder)

[Blog] [Original SincNet Paper (M. Ravenelli, Y. Bengio)]

iSincNet is as Fast and Lightweight Sincnet Spectrogram Vocoder neural network trained to reconstruct audio waveforms from their SincNet spectogram (real and signed 2d representation). We used the GTZAN dataset which is the most-used public dataset for evaluation in machine listening research for music genre recognition (MGR). The files were collected in 2000-2001 from a variety of sources including personal CDs, radio, microphone recordings, in order to represent a variety of recording conditions (http://marsyas.info/downloads/datasets.html).

Datasets used during development:

Example Spectrogram

The First 5s second of the Audio audio/invertibility/15033000.mp3

	Non-causal Encoder	Causal Encoder
signed values
abs values

Effect of applying sincnet envelope

As discussed in Section 2.1, SincNet can be recast as a standard wavelet transform with an envelopped defined by the sinc depending explicitly on the bandwidths as envelope(x, B) = sinc(B x / 2). As a consequen the orignal cos and sine components of the filter are modulated (see example below, where we show causal filters).

Kernel	index=10	index=104
Without Sinc Envelope
With Sinc Envelope

At lower freauencies (~low indices), the sinc envelope's effect are negligible unlike higher frequency where it forced the filter to be more localised.

🎧 Pretrained Models

The following table summarizes the key characteristics and access points for the available pretrained models. All models are open-source and stored in the pretrained/ folder.

Sample Rate	FPS	#Bins	Weights	Corpus	Causal Encoder	Scale	Sinc Envelope	Open-Source
16000	128	128	📦	GTZAN	✗	Linear	✗	√
16000	128	128	📦	GTZAN	√	Linear	✗	√
16000	128	128	📦	GTZAN	√	Mel	✗	√
16000	128	256	📦	GTZAN	✗	Mel	✗	√
16000	128	512	📦	GTZAN	✗	Mel	✗	√
16000	128	128	📦	GTZAN	✗	Mel	✗	√
16000	128	128	📦	GTZAN	✗	Mel	√	√
44100	350	128	📦	GTZAN	✗	Linear	✗	√
44100	350	128	📦	GTZAN	✗	Mel	✗	√
44100	350	256	📦	GTZAN	✗	Mel	✗	√

Quick Start

pip install -r requirements.txt

Please refer to the demo notebook which shows how to load and use the model

import numpy as np
import librosa
import torch
from sincnet.model import SincNet
from datasets.utils.waveform import WaveformLoader 


SAMPLE_RATE = 16_000
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
audio_loader = WaveformLoader(sample_rate=SAMPLE_RATE) 

# load the model
params = {
    "fs": SAMPLE_RATE,
    "fps": 128,
    "scale": "mel",
    "component": "complex",
    "causal": True,
    "q_bits": 8 
}

model : SincNet = (
    SincNet(**params)
    .load_pretrained_weights(weights_folder="pretrained", verbose=False)
    .eval()
    .to(device)
)

# encode and decode an audio waveform
duration = 5
offset = 0
audio_path = ... 
waveform = audio_loader.load_segment(audio_path, offset=0, duration=5, nchannels=1)
loudness = audio_loader.measure_loudness(waveform)
waveform = audio_loader.normalise_loudness(waveform, loudness, target_lufs=-23)

with torch.no_grad():
  audio_tensor = torch.from_numpy(waveform).to(device).float()
  spectrogram = model.encode(audio_tensor.unsqueeze(0), quantize=True)
  reconstructed_audio_tensor = model.decode(spectrogram, dequantize=True)

References Papers and Related Topics

[1] Mirco Ravanelli, Yoshua Bengio, “Speaker Recognition from raw waveform with SincNet” Arxiv
[2] MS-SincResNet: Joint Learning of 1D and 2D Kernels Using Multi-scale SincNet and ResNet for Music Genre Classification Arxiv
[3] Curricular SincNet: Towards Robust Deep Speaker Recognition by Emphasizing Hard Samples in Latent Space Arxiv
[4] Interpretable SincNet-based Deep Learning for Emotion Recognition from EEG brain activity Arxiv
[5] Toward end-to-end interpretable convolutional neural networks for waveform signals Arxiv
[6] Filterband design for end-to-end speech separation Arxiv. This paper decomposes sinNet into a product sin * cos as implemented in this repo and bridgin the gap with Gabor filterbank
[7] PF-Net: Personalized Filter for Speaker Recognition from Raw Waveform Arxiv. This paper proposes to extend SincNet for more flexiblity by allowing alternative shapes to rectangle function in the spectral domain
[8] MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis Arxiv
[9] iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform Arxiv
[10] iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN Arxiv
[11] Deep Griffin-Lim Iteration Arxiv
[12] Mel-Spectrogram Inversion via Alternating Direction Method of Multipliers Arxiv
[13] HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis Arxiv

Related discussion about SincNet vs STFT mravanelli/SincNet#74

Usages and Implementations around SincNet

Roadmap and projects status

Host weights in Github and add auto-download
Benchmark of inversion vs Griffin-Lim, iSTFTNet

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
audio		audio
datasets		datasets
illustrations		illustrations
notebooks		notebooks
pretrained		pretrained
sincnet		sincnet
training		training
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.ipynb		demo.ipynb
package_training.sh		package_training.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

iSincNet (Lightweight Sincnet Spectrogram Vocoder)

Example Spectrogram

Effect of applying sincnet envelope

🎧 Pretrained Models

Quick Start

References Papers and Related Topics

Usages and Implementations around SincNet

Roadmap and projects status

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

iSincNet (Lightweight Sincnet Spectrogram Vocoder)

Example Spectrogram

Effect of applying sincnet envelope

🎧 Pretrained Models

Quick Start

References Papers and Related Topics

Usages and Implementations around SincNet

Roadmap and projects status

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages