Skip to content

feat: add local BPM detection and chord recognition via ONNX Runtime Web#983

Open
ChuxiJ wants to merge 3 commits intomainfrom
feat/issue-978-bpm-chord-detection
Open

feat: add local BPM detection and chord recognition via ONNX Runtime Web#983
ChuxiJ wants to merge 3 commits intomainfrom
feat/issue-978-bpm-chord-detection

Conversation

@ChuxiJ
Copy link
Copy Markdown

@ChuxiJ ChuxiJ commented Mar 27, 2026

Summary

  • Adds browser-local audio analysis (BPM/beat detection + chord recognition) via ONNX Runtime Web in a Web Worker
  • Beat This! (small, 8MB ONNX) for SOTA beat tracking — models lazy-loaded and cached in IndexedDB
  • Consonance-ACE (ISMIR 2025) for 170-class chord recognition with decomposed root/bass/note heads
  • AudioAnalysisPanel now has Local/Server toggle, progress bar, and chord timeline display
  • Pure TS mel spectrogram utility (FFT + mel filterbank) — no native dependencies
  • Zustand analysis store for job lifecycle tracking
  • Exposed window.__analyzeClipLocally() and window.__analysisStore for agent access

New Files

  • src/types/analysis.ts — BeatEvent, ChordEvent, worker message types
  • src/utils/melSpectrogram.ts — pure TS FFT + mel filterbank + spectrogram
  • src/services/modelManager.ts — lazy ONNX model download + IndexedDB cache
  • src/services/localAnalysisService.ts — orchestrates worker lifecycle
  • src/store/analysisStore.ts — Zustand store for analysis job tracking
  • src/workers/analysisWorker.ts — Web Worker with ONNX inference pipeline

Test plan

  • npx tsc --noEmit — 0 type errors
  • npm test — 2745 tests pass (289 files), including 31 new tests
  • npm run build — succeeds
  • Dev server loads without console errors
  • window.__analyzeClipLocally and window.__analysisStore exposed correctly
  • End-to-end: load audio clip → open analysis panel → click "Analyze (Local)" → verify BPM + chords (requires ONNX model files in public/models/)

Next steps (separate PRs)

  • Export ONNX model files (Beat This! small + consonance-ACE) and host
  • INT8 quantization for faster WASM inference
  • WebGPU acceleration for supported browsers
  • Chord visualization on the timeline

Closes #978

🤖 Generated with Claude Code

Implements browser-local audio analysis using ONNX models in a Web Worker:
- Beat This! (small) for SOTA BPM/beat detection
- Consonance-ACE (ISMIR 2025) for 170-class chord recognition

New files:
- src/types/analysis.ts — BeatEvent, ChordEvent, worker message types
- src/utils/melSpectrogram.ts — pure TS FFT + mel filterbank
- src/services/modelManager.ts — lazy ONNX model download + IndexedDB cache
- src/services/localAnalysisService.ts — orchestrates worker lifecycle
- src/store/analysisStore.ts — Zustand store for analysis job tracking
- src/workers/analysisWorker.ts — Web Worker with ONNX inference pipeline

Modified:
- AudioAnalysisPanel.tsx — local/server mode toggle, progress bar, chord display
- InferredMetas — extended with beats, chords, analysisSource fields
- vite.config.ts — onnxruntime-web WASM config
- main.tsx — expose analyzeClipLocally + analysisStore on window

Closes #978

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 27, 2026 05:57
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a browser-local audio analysis pipeline (BPM/beat detection + chord recognition) powered by ONNX Runtime Web running inside a Web Worker, plus UI and state management to trigger and display results.

Changes:

  • Introduces a new Web Worker inference pipeline and supporting analysis types/state (Zustand).
  • Adds a pure TypeScript mel-spectrogram/FFT utility with unit tests.
  • Updates AudioAnalysisPanel with Local/Server mode toggle, progress UI, and chord timeline display; adds onnxruntime-web dependency and Vite worker config.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
vite.config.ts Configures Vite worker output and excludes onnxruntime-web from optimizeDeps.
src/workers/analysisWorker.ts Implements worker-side feature extraction + ONNX inference + result posting.
src/utils/melSpectrogram.ts Adds FFT/STFT/mel spectrogram + downsampling utilities.
src/utils/tests/melSpectrogram.test.ts Unit tests for FFT/window/filterbank/spectrogram/downsampling.
src/types/project.ts Extends inferredMetas to store beats/chords and analysis source.
src/types/analysis.ts Adds analysis domain types and worker message contracts.
src/store/analysisStore.ts Zustand store for local analysis job lifecycle/progress/results.
src/store/tests/analysisStore.test.ts Unit tests for analysis job tracking store.
src/services/modelManager.ts Adds model download + IndexedDB caching utility (currently not wired into worker).
src/services/tests/modelManager.test.ts Unit tests for model download/caching logic.
src/services/localAnalysisService.ts Orchestrates audio decode/downsample + worker messaging + writes inferredMetas.
src/main.tsx Exposes analysis hooks on window for agent access.
src/components/generation/AudioAnalysisPanel.tsx Adds Local/Server toggle, local progress display, and chord timeline UI.
package.json Adds onnxruntime-web dependency.
package-lock.json Locks new dependencies pulled in by onnxruntime-web.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


// Cache in IndexedDB
await set(meta.cacheKey, buffer);

Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loadModelBytes() reports streaming progress using contentLength, but when Content-Length is missing you fall back to meta.sizeBytes (an estimate). In that case the last progress update may never reach 100%, and there is no final onProgress call after the download completes/caches. Consider sending a final progress event after the loop with bytesTotal: bytesLoaded and percent: 100 (or recompute based on actual bytes).

Suggested change
// Ensure a final 100% progress update based on actual bytes downloaded
onProgress?.({
modelName: meta.name,
bytesLoaded,
bytesTotal: bytesLoaded,
percent: 100,
});

Copilot uses AI. Check for mistakes.
Comment on lines +34 to +42
async function loadOnnxSession(modelUrl: string) {
const ort = await getOrt();
const response = await fetch(modelUrl);
if (!response.ok) throw new Error(`Failed to fetch model: ${response.status}`);
const buffer = await response.arrayBuffer();
return ort.InferenceSession.create(buffer, {
executionProviders: ['wasm'],
});
}
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description/UI text says ONNX models are cached in IndexedDB, but the worker currently always fetch()es modelUrl directly and creates the session from the network response. This bypasses the new modelManager/IndexedDB cache and also prevents reporting download progress. Consider wiring the worker to use cached bytes (e.g., fetch+cache via IndexedDB inside the worker, or prefetch via modelManager in the main thread and postMessage the ArrayBuffer).

Copilot uses AI. Check for mistakes.
Comment on lines +117 to +127
// 5. Write results to clip.inferredMetas
useProjectStore.getState().updateClip(clipId, {
inferredMetas: {
...clip.inferredMetas,
bpm: result.bpm,
keyScale: result.keyScale ?? undefined,
timeSignature: result.timeSignature ?? undefined,
beats: result.beats,
chords: result.chords,
analysisSource: 'local',
},
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updateClip() merges inferredMetas using the clip object captured before async work. If the clip’s inferredMetas are updated elsewhere while analysis is running, this spread can accidentally clobber newer fields. Prefer reading the latest clip state at update time (or using a store updater that merges against current state) before spreading inferredMetas.

Copilot uses AI. Check for mistakes.
Comment on lines +376 to +380
{mode === 'local' && !analyzing && !localResult && hasAudio && (
<p className="text-[10px] text-zinc-500">
Local analysis uses Beat This! for BPM detection and Consonance-ACE for chord recognition.
Models are downloaded on first use (~23MB total) and cached locally.
</p>
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The helper text claims “Models are downloaded on first use (~23MB total) and cached locally.” but the current worker implementation fetches model files directly (no IndexedDB cache integration), so this statement may be inaccurate. Either implement the cache path or soften the wording to match actual behavior (e.g., “downloaded on first use; browser HTTP cache may apply”).

Copilot uses AI. Check for mistakes.
Comment on lines +80 to +95
const result = await new Promise<LocalAnalysisResult>((resolve, reject) => {
const handleMessage = (e: MessageEvent<AnalysisWorkerMessage>) => {
const msg = e.data;
if (msg.type === 'progress') {
useAnalysisStore.getState().updateJobProgress(jobId, msg);
} else if (msg.type === 'result') {
worker.removeEventListener('message', handleMessage);
worker.removeEventListener('error', handleError);
useAnalysisStore.getState().completeJob(jobId, msg.result);
resolve(msg.result);
} else if (msg.type === 'error') {
worker.removeEventListener('message', handleMessage);
worker.removeEventListener('error', handleError);
reject(new Error(msg.error));
}
};
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

analyzeClipLocally() attaches per-call message listeners to a singleton worker, but worker messages don’t include a job/request identifier. If two analyses run concurrently (possible via window.__analyzeClipLocally or future UI), progress/result messages from one run will be handled by all active listeners, corrupting job state and potentially resolving/rejecting the wrong promise. Add a correlation ID (e.g., jobId) in AnalysisWorkerRequest and include it in every worker response so the service can ignore unrelated messages (or enforce single-flight by queueing/canceling).

Copilot uses AI. Check for mistakes.
nMels: 128,
fMin: 30,
fMax: 11000,
});
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

computeMelSpectrogram() can return an empty array when the clip is shorter than nFft (or otherwise yields 0 STFT frames). The subsequent inference code assumes at least 1 frame (e.g., uses melFrames[0] in run*Inference and builds tensors with nFrames=0), which is likely to throw inside ORT. Add an early guard after melFrames computation to either pad/extend audio, skip inference with a clear error, or return an empty-but-valid result.

Suggested change
});
});
// Guard against empty spectrogram (e.g., very short clips producing no STFT frames)
if (!melFrames || melFrames.length === 0) {
throw new Error(
'Unable to analyze audio: clip is too short or produced no spectral frames.',
);
}

Copilot uses AI. Check for mistakes.
export const DEFAULT_MEL_OPTIONS: MelSpectrogramOptions = {
sampleRate: 22050,
nFft: 2048,
hopLength: 441, // ~10ms at 22050Hz
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEFAULT_MEL_OPTIONS.hopLength is 441 samples, which is ~20ms at 22050Hz (441/22050), but the inline comment says “~10ms”. This is misleading for anyone tuning hop/window parameters; update the comment or change hopLength to the intended 10ms value (~220/221).

Suggested change
hopLength: 441, // ~10ms at 22050Hz
hopLength: 441, // ~20ms at 22050Hz (441 / 22050)

Copilot uses AI. Check for mistakes.
Comment on lines +188 to +196
export function computeMelSpectrogram(
samples: Float32Array,
options: Partial<MelSpectrogramOptions> = {},
): Float32Array[] {
const opts = { ...DEFAULT_MEL_OPTIONS, ...options };
const { nFft, hopLength, nMels, sampleRate, fMin, fMax } = opts;

const filters = createMelFilterbank(nFft, nMels, sampleRate, fMin, fMax);
const specFrames = powerSpectrogram(samples, nFft, hopLength);
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

computeMelSpectrogram() recreates the mel filterbank (and powerSpectrogram() recreates the Hann window) on every call. In the worker this will run per analysis job and can be a noticeable CPU cost; consider memoizing the filterbank/window by (nFft, nMels, sampleRate, fMin, fMax) (and window by nFft) to reuse across calls.

Copilot uses AI. Check for mistakes.
Comment on lines +86 to +95
for (let i = 0; i < totalOutputFrames; i++) {
const beatProb = outputData[i * 2];
const downbeatProb = outputData[i * 2 + 1];
if (beatProb > beatThreshold) {
beats.push({
time: i * frameTimeStep,
isDownbeat: downbeatProb > beatThreshold,
confidence: beatProb,
});
}
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beat extraction currently adds a BeatEvent for every frame where beatProb > threshold. If the model outputs a smooth activation curve around beats (common for beat trackers), this will emit multiple beat events per actual beat and will distort BPM estimation. Add a peak-picking step (local maxima + minimum inter-beat interval / refractory period) before emitting beats/downbeats.

Suggested change
for (let i = 0; i < totalOutputFrames; i++) {
const beatProb = outputData[i * 2];
const downbeatProb = outputData[i * 2 + 1];
if (beatProb > beatThreshold) {
beats.push({
time: i * frameTimeStep,
isDownbeat: downbeatProb > beatThreshold,
confidence: beatProb,
});
}
const beatProbs: number[] = new Array(totalOutputFrames);
const downbeatProbs: number[] = new Array(totalOutputFrames);
// First pass: store probabilities per frame
for (let i = 0; i < totalOutputFrames; i++) {
beatProbs[i] = outputData[i * 2];
downbeatProbs[i] = outputData[i * 2 + 1];
}
// Peak picking: local maxima with a minimum inter-beat interval (refractory period)
// Assume an upper BPM limit (e.g., 240 BPM) to derive a minimum plausible beat spacing.
const maxBpm = 240;
const minBeatInterval = 60 / maxBpm; // seconds
let lastBeatTime = -Infinity;
for (let i = 0; i < totalOutputFrames; i++) {
const beatProb = beatProbs[i];
if (beatProb <= beatThreshold) {
continue;
}
const prevProb = i > 0 ? beatProbs[i - 1] : -Infinity;
const nextProb = i < totalOutputFrames - 1 ? beatProbs[i + 1] : -Infinity;
// Require this frame to be a (non-strict) local maximum
if (beatProb < prevProb || beatProb < nextProb) {
continue;
}
const time = i * frameTimeStep;
if (time - lastBeatTime < minBeatInterval) {
continue;
}
const downbeatProb = downbeatProbs[i];
beats.push({
time,
isDownbeat: downbeatProb > beatThreshold,
confidence: beatProb,
});
lastBeatTime = time;

Copilot uses AI. Check for mistakes.
ChuxiJ and others added 2 commits March 27, 2026 19:58
Verified ONNX model inference for both BPM and chord detection:

Beat This! (79MB ONNX from beat_this_cpp):
- Input: mel spectrogram [1, T, 128] (n_fft=1024, hop=441, sr=22050)
- Output: beat + downbeat logits → peak-picked to beat positions
- Verified: 100 BPM on funk rock, 187.5 BPM on classical guitar
- Post-processing: max_pool1d(kernel=7) peak picking + logit > 0 threshold

consonance-ACE (20MB ONNX, exported from PyTorch checkpoint):
- Input: CQT [1, 1, 144, T] (24 bins/oct, 6 octaves from C1, hop=512)
- Output: root(13) + bass(13) + chord(12) logits per frame
- Verified: varied chord predictions with <0.00001 PyTorch/ONNX diff

Scripts:
- scripts/download-models.sh — download Beat This! ONNX (79MB)
- scripts/export-consonance-ace.py — export consonance-ACE to ONNX
- scripts/verify-onnx-models.py — end-to-end verification with real audio

ONNX files added to .gitignore (too large for git, ~100MB total).

Closes #978

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Critical fixes to make the worker produce correct results:

melSpectrogram.ts:
- Add power=1 (magnitude) mode for Beat This! (was always power=2)
- Add log1p(multiplier * mel) scaling (Beat This! uses log1p(1000*mel))
- Export BEAT_THIS_MEL_OPTIONS preset matching official preprocessing
- Add magnitudeSpectrogram() function

cqt.ts (NEW):
- Pure TypeScript CQT computation for consonance-ACE
- Matches: sr=22050, hop=512, 24 bins/oct, 6 octaves from C1
- Returns raw magnitude (not dB) as expected by the model

analysisWorker.ts — BPM inference fixes:
- Input shape: [1, nFrames, nMels] not [1, nMels, nFrames]
- Parse 2 separate output tensors (beat, downbeat) not interleaved [T,2]
- Add peak-picking: maxPool1d(kernel=7) + logit > 0 threshold
- Use Beat This! mel preset (nFft=1024, power=1, log1p scaling)

analysisWorker.ts — Chord inference fixes:
- Compute CQT instead of passing mel spectrogram
- Parse 3 output tensors: root[1,T,13], bass[1,T,13], chord[1,T,12]
- Proper decoding: argmax root/bass, sigmoid chord, interval analysis
- Normalize audio to [-1,1] before CQT (matching training)
- Filter short chords (<0.3s)

Tests:
- melSpectrogram: add magnitude, log1p, Beat This! preset tests
- cqt: shape, non-negativity, frequency accuracy, silent input
- peakPicking: maxPool1d, isolated peaks, clustered beats, BPM spacing

All 2763 tests pass, tsc clean, build succeeds.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Author

ChuxiJ commented Mar 27, 2026

Triage: Hold for Phase 3 — ONNX Runtime Web is a large dependency (~20MB+). BPM detection and chord recognition are valuable but should be evaluated for bundle size impact. Consider lazy-loading or web worker approach before merging.


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add BPM detection and chord recognition via ONNX Runtime Web

2 participants