Skip to content

Extract config dataclasses from openlrc.py and defer heavy imports #89

@MaleicAcid

Description

@MaleicAcid

Extract config dataclasses from openlrc.py and defer heavy imports

Problem

TranscriptionConfig and TranslationConfig are plain @dataclass classes that only depend on stdlib (dataclasses, pathlib.Path). But they live in openlrc/openlrc.py, so importing them triggers the entire module-level import chain:

# openlrc/openlrc.py — all executed on `from openlrc import TranscriptionConfig`
from faster_whisper.transcribe import Segment       # -> ctranslate2, onnxruntime
from openlrc.preprocess import Preprocessor          # -> torch, deepfilternet
from openlrc.transcribe import Transcriber           # -> faster_whisper, pysbd
from openlrc.translate import LLMTranslator          # -> openai, anthropic, google-genai
from openlrc.utils import Timer, ...                 # -> spacy, tiktoken, lingua, torch

Downstream projects that embed these dataclasses (e.g. via composition for their own config types) are forced to install and load ~3GB+ of dependencies just to use two stdlib-only classes. This also prevents packaging tools (Nuitka, PyInstaller) from producing lean binaries.

Proposed Changes

1. Move config dataclasses to a lightweight module

Move TranscriptionConfig and TranslationConfig to openlrc/config.py (or the existing openlrc/models.py which already only depends on stdlib). Re-export from __init__.py:

# openlrc/__init__.py
from openlrc.config import TranscriptionConfig, TranslationConfig  # stdlib only
from openlrc.openlrc import LRCer                                  # heavy

from openlrc import TranscriptionConfig becomes instant. Non-breaking — all existing import paths still work.

2. Defer heavy imports in openlrc.py to method level

Even from openlrc import LRCer (without calling any method) currently loads torch, spacy, faster-whisper, etc. These can be deferred to the methods that use them:

Import Used in Pulls in
openlrc.preprocess.Preprocessor pre_process() torch, deepfilternet
openlrc.transcribe.Transcriber transcriber property (already lazy-init) faster_whisper, ctranslate2
openlrc.translate.LLMTranslator _translate() openai, anthropic, google-genai
faster_whisper.transcribe.Segment to_json() type hint faster_whisper

Lightweight internal imports (openlrc.context, openlrc.defaults, openlrc.logger, openlrc.models, openlrc.opt, openlrc.subtitle) can stay at module level.

Test compatibility: Tests in test_preprocess.py patch module-level names like @patch("openlrc.preprocess.enhance"). After deferral these would need to target the source module (e.g. @patch("df.enhance.enhance")). I'm aware this was the concern in #87 — happy to include test updates in the same PR.

Benefits

  • from openlrc import TranscriptionConfig loads stdlib only
  • from openlrc import LRCer no longer triggers torch/spacy/whisper at import time
  • No API changes, no new dependencies
  • Complements Decouple torch/DeepFilterNet from core dependencies #88 — lazy imports make optional extras actually effective at runtime

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions