Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -54,4 +54,4 @@ jobs:
- name: Test with unittest
working-directory: ./tests
run: |
uv run python -m unittest discover -s . -p 'test_*.py'
uv run --extra full python -m unittest discover -s . -p 'test_*.py'
31 changes: 12 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,10 +77,10 @@ into `.lrc` subtitles with LLMs such as

## Installation ⚙️

1. Install CUDA 11.x and [cuDNN 8 for CUDA 11](https://developer.nvidia.com/cudnn) first according
1. Install [CUDA](https://developer.nvidia.com/cuda-toolkit) and [cuDNN](https://developer.nvidia.com/cudnn) according
to https://opennmt.net/CTranslate2/installation.html to enable `faster-whisper`.

`faster-whisper` also needs [cuBLAS for CUDA 11](https://developer.nvidia.com/cublas) installed.
`faster-whisper` also needs [cuBLAS](https://developer.nvidia.com/cublas) installed.
<details>
<summary>For Windows Users (click to expand)</summary>

Expand All @@ -103,7 +103,7 @@ into `.lrc` subtitles with LLMs such as
3. Install [ffmpeg](https://ffmpeg.org/download.html) and add `bin` directory
to your `PATH`.

4. This project can be installed from PyPI:
4. Install from PyPI:

```shell
pip install openlrc
Expand All @@ -115,20 +115,12 @@ into `.lrc` subtitles with LLMs such as
pip install git+https://github.com/zh-plus/openlrc
```

5. Install the latest [faster-whisper](https://github.com/guillaumekln/faster-whisper) from source:
```shell
pip install "faster-whisper @ https://github.com/SYSTRAN/faster-whisper/archive/8327d8cc647266ed66f6cd878cf97eccface7351.tar.gz"
```
5. **(Optional)** If you need noise suppression (`noise_suppress=True`), install the full extras
which includes torch and DeepFilterNet:

6. Install [PyTorch](https://pytorch.org/get-started/locally/):
```shell
pip install --force-reinstall torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
```

7. Fix the `typing-extensions` issue:
```shell
pip install typing-extensions -U
```
```shell
pip install openlrc[full]
```

## Lightweight Imports

Expand Down Expand Up @@ -156,8 +148,9 @@ Heavy dependencies are loaded only when the corresponding features are first use
- `lingua` is loaded when language detection helpers are used.

> [!NOTE]
> Lightweight imports improve import-time behavior only. They do not change installation requirements:
> `pip install openlrc` still installs the full dependency set declared by the package.
> The base `pip install openlrc` does **not** include torch or DeepFilterNet.
> These are only installed with `pip install openlrc[full]` and are only needed
> for noise suppression (`noise_suppress=True`).

## Usage 🐍

Expand Down Expand Up @@ -213,7 +206,7 @@ if __name__ == '__main__':
lrcer = LRCer(transcription=TranscriptionConfig(vad_options=vad_options))
lrcer.run('./data/test.mp3', target_lang='zh-cn')

# Enhance the audio using noise suppression (consume more time).
# Enhance the audio using noise suppression (requires openlrc[full], consumes more time).
lrcer.run('./data/test.mp3', target_lang='zh-cn', noise_suppress=True)

# Change the translation model
Expand Down
10 changes: 8 additions & 2 deletions openlrc/preprocess.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,14 @@ def noise_suppression(self, audio_paths: list[Path], atten_lim_db: int = 15):
if not audio_paths:
return []

import torch
from df.enhance import enhance, init_df, load_audio, save_audio
try:
import torch
from df.enhance import enhance, init_df, load_audio, save_audio
except ImportError:
raise ImportError(
"Noise suppression requires torch and deepfilternet. "
"Install them with: pip install openlrc[full]"
)

if "atten_lim_db" in self.options:
atten_lim_db = self.options["atten_lim_db"]
Expand Down
13 changes: 7 additions & 6 deletions openlrc/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@

if TYPE_CHECKING:
from spacy.language import Language as SpacyLanguage
import torch

from openlrc.defaults import supported_languages_lingua
from openlrc.logger import logger
Expand Down Expand Up @@ -102,12 +101,14 @@ def get_audio_duration(path: str | Path) -> float:
return audio.duration


def release_memory(model: torch.nn.Module) -> None:
import torch
def release_memory(model: Any) -> None:
try:
import torch
except ImportError:
return

# gc.collect()
torch.cuda.empty_cache()
del model
if isinstance(model, torch.nn.Module):
torch.cuda.empty_cache()


def normalize(text):
Expand Down
12 changes: 9 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -36,15 +36,21 @@ dependencies = [
"pysbd>=0.3.4,<0.4",
"faster-whisper>=1.1.1,<2",
"ffmpeg-normalize>=1.27.5,<2",
"deepfilternet>=0.5.6,<0.6",
"google-genai==1.11.0",
"json_repair==0.25.2",
"onnxruntime>=1.20.0,<1.24; python_version < '3.11'",
"onnxruntime>=1.20.0,<2; python_version >= '3.11'",
"pip>=25.1",
]

[project.optional-dependencies]
# Noise suppression via DeepFilterNet (requires torch).
# Install with: pip install openlrc[full]
# Only needed when using noise_suppress=True in LRCer.run() or Preprocessor.run().
full = [
"torch>=2.6.0",
"torchvision>=0.21.0",
"torchaudio>=2.0.0",
"pip>=25.1",
"deepfilternet>=0.5.6,<0.6",
]

[project.urls]
Expand Down
Loading