Timing layer for AI music videos.
lyricli turns audio plus lyrics into structured timing artifacts for words, beats, sections, subtitles, and edit-aware cut points. It is built for Remotion pipelines, agent-driven video tooling, and any workflow that needs a stable JSON contract instead of ad-hoc alignment scripts.
Today lyricli is shipped as a source-first GitHub repo.
git clone https://github.com/Hybirdss/lyricli.git
cd lyricli
bun install# 1. Inspect the machine-facing surface
bun run lyricli -- commands --json
bun run lyricli -- doctor --json
bun run lyricli -- schema align --json
bun run lyricli -- schema transcribe --json
# 2. If the lyrics are trusted, align them
bun run lyricli -- align song.wav \
--lyrics lyrics.txt \
--output analysis.json \
--language ja \
--dry-run \
--json
# 3. If lyrics are empty or untrusted, transcribe instead
bun run lyricli -- transcribe song.wav \
--lyrics empty-lyrics.txt \
--output analysis.json \
--language ja \
--dry-run \
--json
# 4. Turn the same artifact into editing outputs
bun run lyricli -- cut-points analysis.json --json
bun run lyricli -- score analysis.json --shots shots.json --json
bun run lyricli -- export remotion analysis.json --shots shots.json --output remotion.json --json
bun run lyricli -- export srt analysis.json --output lyrics.srt --json
# 5. Use the compatibility helper only when you want auto mode selection
bun run lyricli -- analyze song.wav \
--lyrics lyrics.txt \
--output analysis.json \
--mode auto \
--language ja \
--dry-run \
--json
# 6. Run the higher-level helper workflow
bun run lyricli -- +mv-timing song.wav \
--lyrics lyrics.txt \
--outdir artifacts/run-1 \
--mode auto \
--language ja \
--dry-run \
--json- CLI-first: the CLI is the source of truth, not an afterthought to a GUI or a notebook.
- AI-friendly: every command returns a stable JSON envelope with explicit exit codes.
- Artifact-first:
analysis.jsonis the canonical contract; exports and scoring read from that same file. - Edit-aware: this is for timing control, not music generation or video generation.
align: lyrics-aware forced alignment for trusted non-empty lyricstranscribe: transcription-first timing for empty or untrusted lyricsanalyze: compatibility command that auto-selectsalignortranscribecut-points: edit candidates derived from the analysis graphscore: alignment score for storyboard shots against the timing graphexport remotion: Remotion-ready timing payloadexport srt: subtitle file from the same analysis artifactbench: compare multiple engine outputs against the same downstream criteria
lyricli commands [--category <name>] [--json]
lyricli doctor [--json]
lyricli schema <command> [--json]
lyricli transcribe <audio-file> --lyrics <file> --output <file> [--engine <name>] [--language <code>] [--json]
lyricli align <audio-file> --lyrics <file> --output <file> [--engine <name>] [--language <code>] [--json]
lyricli analyze <audio-file> --lyrics <file> --output <file> [--mode <auto|align|transcribe>] [--engine <name>] [--language <code>] [--json]
lyricli +mv-timing <audio-file> --lyrics <file> --outdir <dir> [--mode <auto|align|transcribe>] [--shots <file>] [--engine <name>] [--language <code>] [--json]
lyricli cut-points <analysis.json> [--json]
lyricli score <analysis.json> --shots <shots.json> [--json]
lyricli bench <analysis-a.json> <analysis-b.json> [...analysis-n.json] [--shots <file>] [--json]
lyricli export remotion <analysis.json> [--shots <file>] [--output <file>] [--json]
lyricli export srt <analysis.json> [--output <file>] [--json]Every command returns a JSON envelope:
{
"ok": true,
"command": "align",
"data": {},
"meta": {
"dryRun": false,
"durationMs": 12,
"timestamp": "2026-03-27T10:00:00.000Z"
}
}The main artifact is the analysis JSON from lyricli align, lyricli transcribe, or compatibility lyricli analyze. Commands downstream should treat that file as the contract boundary.
Key artifact fields:
mode/requestedModelanguageengine/requestedEngineengineCapabilitieswarnings/diagnostics
lyricli follows the same direction used by modern AI-friendly CLIs such as gws: keep the CLI authoritative and keep skills thin.
skills/lyricli-shared— shared CLI rules and output contractskills/lyricli-align— trusted lyrics to forced-aligned timing JSONskills/lyricli-transcribe— empty/untrusted lyrics to transcription-first timing JSONskills/lyricli-analyze— compatibility auto-routing skillskills/lyricli-score— storyboard timing reviewskills/lyricli-bench— engine comparison workflowskills/lyricli-export-remotion— Remotion export workflowskills/lyricli-export-srt— subtitle export workflowskills/lyricli-mv-timing— end-to-end timing helper workflow
See docs/skills.md.
Benchmark assets live in /data/lyricli-bench, not inside the repo. The public benchmark path is open-dataset first.
Current benchmark policy:
- primary baseline:
open-dali - first multilingual companion:
open-jamendolyrics-multilang - research-only multilingual track:
open-mavl
bun run bench:init-data
bun run bench:materialize-open
bun run bench:materialize-open-subsets
# runnable multilingual smoke suite from the public Jamendo archive
bun run bench:import-jamendo
bun run bench:materialize-jamendo-subsets
bun run bench:import-jamendo-assets -- --subset smoke-multi
bun run bench:run-open -- --suite open-jamendolyrics-multilang:smoke-multi --dry-run
# optional: import real DALI metadata when available
bun run bench:import-dali -- --info /data/lyricli-bench/incoming/dali/info/DALI_DATA_INFO.gz
bun run bench:materialize-dali-subsets
# run the default English regression subset when DALI metadata is available
bun run bench:run-open -- --suite open-dali:smoke-en --dry-run
bun run bench:statusbench:run-open writes per-case benchmark-result.json files, a subset-level report.json, and a colocated doctor.json runtime snapshot. bench:status surfaces subset report status and doctor readiness when those reports exist.
Portable expectation:
- run
bun run lyricli -- doctor --jsonfirst if you need to know whether full inference can run on the current machine bench:run-open --dry-runis the control-plane check that should work on a clean machine- full benchmark execution still depends on Python backend modules such as
demucsandqwen_asr
See docs/benchmarks.md.
v0.1.0 is the first public release.
Stable enough today:
- canonical timing schema
- explicit
align/transcribeanalysis split - backend capability matrix in artifacts
- cut-point generation
- shot alignment scoring
- Remotion export
- SRT export
- JSON introspection via
commandsandschema - benchmark metadata pipeline
- open benchmark runner with subset reports
- Jamendo multilingual smoke-suite asset importer
- installable skill pack
Still moving:
- Python engine cleanup and deduplication
- multilingual singing alignment quality
- Python backend packaging and dependency setup
- optional MCP compatibility layer
Before publishing changes, run:
bun run test
bun run release:check-public