Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,9 @@ __pycache__/

# Generated audio test outputs
test_output*.wav
test_cli_output*.wav
test_cli_output*.wav

# Generated data and outputs
embeddings/
output/
processed/
349 changes: 349 additions & 0 deletions FINDINGS.md

Large diffs are not rendered by default.

75 changes: 63 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,37 +4,88 @@ This repository provides a library for defining differentially private speaker a

[Click here for full documentation](https://jnear.w3.uvm.edu/dpvc/)

## Current work — controllable DP voice conversion

Active branch: **`feat/cremad-experiments`**. We've extended the library with a **controllable** VAE that exposes 9 style knobs (anger, confused, disgust, enunciated, fear, happy, neutral, sad, whisper) on top of the DP anonymization pipeline. Primary entry points:

- **[`examples/README.md`](examples/README.md)** — end-to-end reproduction guide (extraction → training → controllable inference → evaluation).
- **[`FINDINGS.md`](FINDINGS.md)** — 9 key findings with methodology and per-row takeaways.
- **[`WORKLOG.md`](WORKLOG.md)** — roadmap and progress tracking.
- **[`results/`](results/)** — raw evaluation CSVs (emotion2vec Recall/emo_sim, WER, predicted MOS) backing the findings.

OpenVoice is the **canonical controllable pipeline**. ControlVC remains in the
repository as a useful DP baseline and wrapper reference, but not as the
recommended path for style control.

## Installation

Install the library by cloning this repository and then running:
Clone this repository, then install with the extras you need. The active
OpenVoice path is covered by package extras; the ControlVC baseline has a
separate setup guide because it depends on an external repo plus Python 3.10 /
fairseq compatibility work.

```
pip install .
```
```bash
# Core library only
pip install -e .

## Example: OpenVoice
# + OpenVoice backend (required for the controllable pipeline)
pip install -e ".[openvoice]"

The library provides a wrapper around the OpenVoice voice control system. A minimal example of using it is as follows:
# + Expresso dataset extraction
pip install -e ".[openvoice,expresso]"

# + Evaluation pipeline (emotion2vec, Whisper WER, predicted MOS)
pip install -e ".[openvoice,expresso,eval]"
```

Tested Pass 1 environment in `.venv`:

- `torch==2.9.1`
- `torchaudio==2.9.1`
- `numpy==2.3.5`
- `librosa==0.9.1`
- `soundfile==0.13.1`
- `datasets==4.8.4`
- `pandas==3.0.2`
- `funasr==1.3.1`
- `openai-whisper==20250625`
- `jiwer==4.0.0`

For ControlVC-specific setup, use [`docs/controlvc_setup.md`](docs/controlvc_setup.md).

## Example: basic DP anonymization (OpenVoice)

```python
import dpvc
vc_wrapper = dpvc.OpenVoiceWrapper()
anonymizer = dpvc.Anonymizer(vc_wrapper)
anonymizer.anonymize(src_path, output_path, noise_level=1.0)
```

Here, `src_path` should be an input .wav file name, and `output_path` should be the output .wav file name. The `noise_level` parameter controls how much noise is added in the differential privacy step. The `OpenVoiceWrapper` object encapsulates the OpenVoice models, and the `anonymize` method performs the anonymization via differential privacy.
`src_path` is an input .wav, `output_path` is the anonymized output, and `noise_level` controls the magnitude of DP noise added to the speaker embedding.

See also:

See the following files for examples of use:
- `examples/openvoice_inference.py` — basic anonymization (no style control).
- `examples/openvoice_train_vae.py` — train a custom DP-VAE for the anonymizer.
- `examples/openvoice_infer_controllable.py` — **controllable** style-aware inference (the current headline flow; see [`examples/README.md`](examples/README.md) for the full pipeline).
- `docs/controlvc_setup.md` — ControlVC baseline setup and smoke-test path.

- `examples/openvoice_inference.py` contains a more complete example of anonymization using the OpenVoice wrapper
- `examples/openvoice_train_vae.py` contains an example of how to train a custom DP-VAE for use in the anonymizer
## Evaluation

The evaluation scripts under `examples/` measure the three axes the EmoVoice paper uses:

- `examples/eval_emotion.py` — emotion2vec_plus_large Recall Rate + emo_sim (target alignment)
- `examples/eval_wer.py` — OpenAI Whisper drift-from-baseline Word Error Rate (content preservation)
- `examples/eval_mos.py` — torchaudio SQUIM_SUBJECTIVE predicted MOS (naturalness)

CSV outputs from our runs live in [`results/`](results/). Schemas and reproduction steps are in [`results/README.md`](results/README.md).

## Building Documentation

The documentation is built with [MkDocs](https://www.mkdocs.org/). To build the documentation:
The documentation is built with [MkDocs](https://www.mkdocs.org/):

```
```bash
pip install mkdocs "mkdocstrings[python]" mkdocs-material
mkdocs build
```
Loading