Bitwav Encoder

A high-performance, GPU-optimized audio encoder package for Bitwav TTS. This package handles the conversion of raw audio waveforms into discrete tokens (via FSQ) and global style embeddings (SSL-based).

🚀 Features

Multiple Encoder Modes:
- StandaloneEncoder: Base implementation for straightforward inference.
- OptimizedStandaloneEncoder: Leverages torch.compile and mixed precision (FP16/BF16) for maximum throughput.
- PipelinedEncoder: Optimized for overlapping chunk inference.
- LongAudioEncoder: Handles extremely long audio (> 100s) using smart chunking and overlap strategies.
WavLM Integration: Built-in support for WavLM Base Plus as the SSL feature extractor.
Automatic Weights Management: Transparently handles local weights or automatic downloads from Hugging Face (humair025/enc).
Precision Verified: Negligible loss conversion (FP32 -> FP16) with 98.3%+ token parity.

📦 Installation

Option 1: Direct Pip Install (Recommended)

You can install the package directly from GitHub:

pip install git+https://github.com/humair-m/enc.git

Option 2: Clone and Install

git clone https://github.com/humair-m/enc.git
cd enc
pip install -e .

🛠 Usage

1. Audio Inputs

The encode method is flexible and accepts multiple input formats:

File Path: Just pass the string path to your .wav file.
Torch Tensor: A tensor of shape (channels, samples) or (samples,).
Numpy Array: A numpy array of the audio waveform.

2. Basic Example

from enc import create_standalone_encoder
import torchaudio

# 1. Initialize (Downloads weights automatically if needed)
encoder = create_standalone_encoder(device="cuda")

# 2. Encode from File Path
tokens, style = encoder.encode("path/to/audio.wav")

# 3. Encode from Tensor (e.g., already loaded)
waveform, sr = torchaudio.load("audio.wav")
tokens, style = encoder.encode(waveform, sr=sr)

High-Performance (Optimized)

from enc import create_optimized_encoder

encoder = create_optimized_encoder(
    device="cuda",
    compile_mode="max-autotune"
)

# First run will trigger torch.compile warmup
tokens, style_emb = encoder.encode(waveform, sr=24000)

Batch Processing (Pipelined)

For processing multiple files efficiently, use the PipelinedEncoder. It overlaps WavLM feature extraction with encoder processing using CUDA streams.

from enc import create_pipelined_encoder

encoder = create_pipelined_encoder(device="cuda")

# Supports list of file paths or tensors
files = ["test1.wav", "test2.wav", "test3.wav"]
results = encoder.encode_batch_pipelined(files, sr=24000, batch_size=4)

for tokens, style_emb in results:
    print(tokens.shape)

Long Audio Processing

from enc import create_long_audio_encoder

encoder = create_long_audio_encoder(device="cuda")
# Automatically handles chunking and overlap for long files
tokens, style_emb = encoder.encode("extremely_long_audio.wav")

📊 Precision & Performance

Detailed precision analysis is available in REPORT.md.

Metric	Baseline (FP32)	Optimized (FP16)
Model Size	~380MB	~190MB
Style Similarity	1.0	0.99999
Token Match	100%	98.35%

🔗 Related Repositories

Model Weights: humair025/enc (Hugging Face)
Main Project: humair-m/bitwav (GitHub)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.gitignore		.gitignore
README.md		README.md
REPORT.md		REPORT.md
__init__.py		__init__.py
encoder_long_audio.py		encoder_long_audio.py
encoder_optimized.py		encoder_optimized.py
encoder_pipelined.py		encoder_pipelined.py
encoder_standalone.py		encoder_standalone.py
examples_gpu_encoder.py		examples_gpu_encoder.py
extract_encoder.py		extract_encoder.py
setup.py		setup.py
test_long_audio.py		test_long_audio.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bitwav Encoder

🚀 Features

📦 Installation

Option 1: Direct Pip Install (Recommended)

Option 2: Clone and Install

🛠 Usage

1. Audio Inputs

2. Basic Example

High-Performance (Optimized)

Batch Processing (Pipelined)

Long Audio Processing

📊 Precision & Performance

🔗 Related Repositories

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Bitwav Encoder

🚀 Features

📦 Installation

Option 1: Direct Pip Install (Recommended)

Option 2: Clone and Install

🛠 Usage

1. Audio Inputs

2. Basic Example

High-Performance (Optimized)

Batch Processing (Pipelined)

Long Audio Processing

📊 Precision & Performance

🔗 Related Repositories

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages