feat: add local BPM detection and chord recognition via ONNX Runtime Web by ChuxiJ · Pull Request #983 · ace-step/ACE-Step-DAW

ChuxiJ · 2026-03-27T05:57:33Z

Summary

Adds browser-local audio analysis (BPM/beat detection + chord recognition) via ONNX Runtime Web in a Web Worker
Beat This! (small, 8MB ONNX) for SOTA beat tracking — models lazy-loaded and cached in IndexedDB
Consonance-ACE (ISMIR 2025) for 170-class chord recognition with decomposed root/bass/note heads
AudioAnalysisPanel now has Local/Server toggle, progress bar, and chord timeline display
Pure TS mel spectrogram utility (FFT + mel filterbank) — no native dependencies
Zustand analysis store for job lifecycle tracking
Exposed window.__analyzeClipLocally() and window.__analysisStore for agent access

New Files

src/types/analysis.ts — BeatEvent, ChordEvent, worker message types
src/utils/melSpectrogram.ts — pure TS FFT + mel filterbank + spectrogram
src/services/modelManager.ts — lazy ONNX model download + IndexedDB cache
src/services/localAnalysisService.ts — orchestrates worker lifecycle
src/store/analysisStore.ts — Zustand store for analysis job tracking
src/workers/analysisWorker.ts — Web Worker with ONNX inference pipeline

Test plan

npx tsc --noEmit — 0 type errors
npm test — 2745 tests pass (289 files), including 31 new tests
npm run build — succeeds
Dev server loads without console errors
window.__analyzeClipLocally and window.__analysisStore exposed correctly
End-to-end: load audio clip → open analysis panel → click "Analyze (Local)" → verify BPM + chords (requires ONNX model files in public/models/)

Next steps (separate PRs)

Export ONNX model files (Beat This! small + consonance-ACE) and host
INT8 quantization for faster WASM inference
WebGPU acceleration for supported browsers
Chord visualization on the timeline

Closes #978

🤖 Generated with Claude Code

Implements browser-local audio analysis using ONNX models in a Web Worker: - Beat This! (small) for SOTA BPM/beat detection - Consonance-ACE (ISMIR 2025) for 170-class chord recognition New files: - src/types/analysis.ts — BeatEvent, ChordEvent, worker message types - src/utils/melSpectrogram.ts — pure TS FFT + mel filterbank - src/services/modelManager.ts — lazy ONNX model download + IndexedDB cache - src/services/localAnalysisService.ts — orchestrates worker lifecycle - src/store/analysisStore.ts — Zustand store for analysis job tracking - src/workers/analysisWorker.ts — Web Worker with ONNX inference pipeline Modified: - AudioAnalysisPanel.tsx — local/server mode toggle, progress bar, chord display - InferredMetas — extended with beats, chords, analysisSource fields - vite.config.ts — onnxruntime-web WASM config - main.tsx — expose analyzeClipLocally + analysisStore on window Closes #978 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Adds a browser-local audio analysis pipeline (BPM/beat detection + chord recognition) powered by ONNX Runtime Web running inside a Web Worker, plus UI and state management to trigger and display results.

Changes:

Introduces a new Web Worker inference pipeline and supporting analysis types/state (Zustand).
Adds a pure TypeScript mel-spectrogram/FFT utility with unit tests.
Updates AudioAnalysisPanel with Local/Server mode toggle, progress UI, and chord timeline display; adds onnxruntime-web dependency and Vite worker config.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
vite.config.ts	Configures Vite worker output and excludes onnxruntime-web from optimizeDeps.
src/workers/analysisWorker.ts	Implements worker-side feature extraction + ONNX inference + result posting.
src/utils/melSpectrogram.ts	Adds FFT/STFT/mel spectrogram + downsampling utilities.
src/utils/tests/melSpectrogram.test.ts	Unit tests for FFT/window/filterbank/spectrogram/downsampling.
src/types/project.ts	Extends inferredMetas to store beats/chords and analysis source.
src/types/analysis.ts	Adds analysis domain types and worker message contracts.
src/store/analysisStore.ts	Zustand store for local analysis job lifecycle/progress/results.
src/store/tests/analysisStore.test.ts	Unit tests for analysis job tracking store.
src/services/modelManager.ts	Adds model download + IndexedDB caching utility (currently not wired into worker).
src/services/tests/modelManager.test.ts	Unit tests for model download/caching logic.
src/services/localAnalysisService.ts	Orchestrates audio decode/downsample + worker messaging + writes inferredMetas.
src/main.tsx	Exposes analysis hooks on `window` for agent access.
src/components/generation/AudioAnalysisPanel.tsx	Adds Local/Server toggle, local progress display, and chord timeline UI.
package.json	Adds `onnxruntime-web` dependency.
package-lock.json	Locks new dependencies pulled in by onnxruntime-web.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-27T06:02:22Z

src/services/modelManager.ts

+
+  // Cache in IndexedDB
+  await set(meta.cacheKey, buffer);
+


loadModelBytes() reports streaming progress using contentLength, but when Content-Length is missing you fall back to meta.sizeBytes (an estimate). In that case the last progress update may never reach 100%, and there is no final onProgress call after the download completes/caches. Consider sending a final progress event after the loop with bytesTotal: bytesLoaded and percent: 100 (or recompute based on actual bytes).

Suggested change

// Ensure a final 100% progress update based on actual bytes downloaded

onProgress?.({

modelName: meta.name,

bytesLoaded,

bytesTotal: bytesLoaded,

percent: 100,

});

Copilot · 2026-03-27T06:02:22Z

src/workers/analysisWorker.ts

+async function loadOnnxSession(modelUrl: string) {
+  const ort = await getOrt();
+  const response = await fetch(modelUrl);
+  if (!response.ok) throw new Error(`Failed to fetch model: ${response.status}`);
+  const buffer = await response.arrayBuffer();
+  return ort.InferenceSession.create(buffer, {
+    executionProviders: ['wasm'],
+  });
+}


The PR description/UI text says ONNX models are cached in IndexedDB, but the worker currently always fetch()es modelUrl directly and creates the session from the network response. This bypasses the new modelManager/IndexedDB cache and also prevents reporting download progress. Consider wiring the worker to use cached bytes (e.g., fetch+cache via IndexedDB inside the worker, or prefetch via modelManager in the main thread and postMessage the ArrayBuffer).

Copilot · 2026-03-27T06:02:22Z

src/services/localAnalysisService.ts

+    // 5. Write results to clip.inferredMetas
+    useProjectStore.getState().updateClip(clipId, {
+      inferredMetas: {
+        ...clip.inferredMetas,
+        bpm: result.bpm,
+        keyScale: result.keyScale ?? undefined,
+        timeSignature: result.timeSignature ?? undefined,
+        beats: result.beats,
+        chords: result.chords,
+        analysisSource: 'local',
+      },


updateClip() merges inferredMetas using the clip object captured before async work. If the clip’s inferredMetas are updated elsewhere while analysis is running, this spread can accidentally clobber newer fields. Prefer reading the latest clip state at update time (or using a store updater that merges against current state) before spreading inferredMetas.

Copilot · 2026-03-27T06:02:23Z

src/components/generation/AudioAnalysisPanel.tsx

+          {mode === 'local' && !analyzing && !localResult && hasAudio && (
+            <p className="text-[10px] text-zinc-500">
+              Local analysis uses Beat This! for BPM detection and Consonance-ACE for chord recognition.
+              Models are downloaded on first use (~23MB total) and cached locally.
+            </p>


The helper text claims “Models are downloaded on first use (~23MB total) and cached locally.” but the current worker implementation fetches model files directly (no IndexedDB cache integration), so this statement may be inaccurate. Either implement the cache path or soften the wording to match actual behavior (e.g., “downloaded on first use; browser HTTP cache may apply”).

Copilot · 2026-03-27T06:02:23Z

src/services/localAnalysisService.ts

+    const result = await new Promise<LocalAnalysisResult>((resolve, reject) => {
+      const handleMessage = (e: MessageEvent<AnalysisWorkerMessage>) => {
+        const msg = e.data;
+        if (msg.type === 'progress') {
+          useAnalysisStore.getState().updateJobProgress(jobId, msg);
+        } else if (msg.type === 'result') {
+          worker.removeEventListener('message', handleMessage);
+          worker.removeEventListener('error', handleError);
+          useAnalysisStore.getState().completeJob(jobId, msg.result);
+          resolve(msg.result);
+        } else if (msg.type === 'error') {
+          worker.removeEventListener('message', handleMessage);
+          worker.removeEventListener('error', handleError);
+          reject(new Error(msg.error));
+        }
+      };


analyzeClipLocally() attaches per-call message listeners to a singleton worker, but worker messages don’t include a job/request identifier. If two analyses run concurrently (possible via window.__analyzeClipLocally or future UI), progress/result messages from one run will be handled by all active listeners, corrupting job state and potentially resolving/rejecting the wrong promise. Add a correlation ID (e.g., jobId) in AnalysisWorkerRequest and include it in every worker response so the service can ignore unrelated messages (or enforce single-flight by queueing/canceling).

Copilot · 2026-03-27T06:02:23Z

src/workers/analysisWorker.ts

+      nMels: 128,
+      fMin: 30,
+      fMax: 11000,
+    });


computeMelSpectrogram() can return an empty array when the clip is shorter than nFft (or otherwise yields 0 STFT frames). The subsequent inference code assumes at least 1 frame (e.g., uses melFrames[0] in run*Inference and builds tensors with nFrames=0), which is likely to throw inside ORT. Add an early guard after melFrames computation to either pad/extend audio, skip inference with a clear error, or return an empty-but-valid result.

Suggested change

});

});

// Guard against empty spectrogram (e.g., very short clips producing no STFT frames)

if (!melFrames || melFrames.length === 0) {

throw new Error(

'Unable to analyze audio: clip is too short or produced no spectral frames.',

);

}

Copilot · 2026-03-27T06:02:23Z

src/utils/melSpectrogram.ts

+export const DEFAULT_MEL_OPTIONS: MelSpectrogramOptions = {
+  sampleRate: 22050,
+  nFft: 2048,
+  hopLength: 441,   // ~10ms at 22050Hz


DEFAULT_MEL_OPTIONS.hopLength is 441 samples, which is ~20ms at 22050Hz (441/22050), but the inline comment says “~10ms”. This is misleading for anyone tuning hop/window parameters; update the comment or change hopLength to the intended 10ms value (~220/221).

Suggested change

hopLength: 441, // ~10ms at 22050Hz

hopLength: 441, // ~20ms at 22050Hz (441 / 22050)

Copilot · 2026-03-27T06:02:23Z

src/utils/melSpectrogram.ts

+export function computeMelSpectrogram(
+  samples: Float32Array,
+  options: Partial<MelSpectrogramOptions> = {},
+): Float32Array[] {
+  const opts = { ...DEFAULT_MEL_OPTIONS, ...options };
+  const { nFft, hopLength, nMels, sampleRate, fMin, fMax } = opts;
+
+  const filters = createMelFilterbank(nFft, nMels, sampleRate, fMin, fMax);
+  const specFrames = powerSpectrogram(samples, nFft, hopLength);


computeMelSpectrogram() recreates the mel filterbank (and powerSpectrogram() recreates the Hann window) on every call. In the worker this will run per analysis job and can be a noticeable CPU cost; consider memoizing the filterbank/window by (nFft, nMels, sampleRate, fMin, fMax) (and window by nFft) to reuse across calls.

Copilot · 2026-03-27T06:02:24Z

src/workers/analysisWorker.ts

+  for (let i = 0; i < totalOutputFrames; i++) {
+    const beatProb = outputData[i * 2];
+    const downbeatProb = outputData[i * 2 + 1];
+    if (beatProb > beatThreshold) {
+      beats.push({
+        time: i * frameTimeStep,
+        isDownbeat: downbeatProb > beatThreshold,
+        confidence: beatProb,
+      });
+    }


Beat extraction currently adds a BeatEvent for every frame where beatProb > threshold. If the model outputs a smooth activation curve around beats (common for beat trackers), this will emit multiple beat events per actual beat and will distort BPM estimation. Add a peak-picking step (local maxima + minimum inter-beat interval / refractory period) before emitting beats/downbeats.

Suggested change

for (let i = 0; i < totalOutputFrames; i++) {

const beatProb = outputData[i * 2];

const downbeatProb = outputData[i * 2 + 1];

if (beatProb > beatThreshold) {

beats.push({

time: i * frameTimeStep,

isDownbeat: downbeatProb > beatThreshold,

confidence: beatProb,

});

}

const beatProbs: number[] = new Array(totalOutputFrames);

const downbeatProbs: number[] = new Array(totalOutputFrames);

// First pass: store probabilities per frame

for (let i = 0; i < totalOutputFrames; i++) {

beatProbs[i] = outputData[i * 2];

downbeatProbs[i] = outputData[i * 2 + 1];

}

// Peak picking: local maxima with a minimum inter-beat interval (refractory period)

// Assume an upper BPM limit (e.g., 240 BPM) to derive a minimum plausible beat spacing.

const maxBpm = 240;

const minBeatInterval = 60 / maxBpm; // seconds

let lastBeatTime = -Infinity;

for (let i = 0; i < totalOutputFrames; i++) {

const beatProb = beatProbs[i];

if (beatProb <= beatThreshold) {

continue;

}

const prevProb = i > 0 ? beatProbs[i - 1] : -Infinity;

const nextProb = i < totalOutputFrames - 1 ? beatProbs[i + 1] : -Infinity;

// Require this frame to be a (non-strict) local maximum

if (beatProb < prevProb || beatProb < nextProb) {

continue;

}

const time = i * frameTimeStep;

if (time - lastBeatTime < minBeatInterval) {

continue;

}

const downbeatProb = downbeatProbs[i];

beats.push({

time,

isDownbeat: downbeatProb > beatThreshold,

confidence: beatProb,

});

lastBeatTime = time;

Verified ONNX model inference for both BPM and chord detection: Beat This! (79MB ONNX from beat_this_cpp): - Input: mel spectrogram [1, T, 128] (n_fft=1024, hop=441, sr=22050) - Output: beat + downbeat logits → peak-picked to beat positions - Verified: 100 BPM on funk rock, 187.5 BPM on classical guitar - Post-processing: max_pool1d(kernel=7) peak picking + logit > 0 threshold consonance-ACE (20MB ONNX, exported from PyTorch checkpoint): - Input: CQT [1, 1, 144, T] (24 bins/oct, 6 octaves from C1, hop=512) - Output: root(13) + bass(13) + chord(12) logits per frame - Verified: varied chord predictions with <0.00001 PyTorch/ONNX diff Scripts: - scripts/download-models.sh — download Beat This! ONNX (79MB) - scripts/export-consonance-ace.py — export consonance-ACE to ONNX - scripts/verify-onnx-models.py — end-to-end verification with real audio ONNX files added to .gitignore (too large for git, ~100MB total). Closes #978 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Critical fixes to make the worker produce correct results: melSpectrogram.ts: - Add power=1 (magnitude) mode for Beat This! (was always power=2) - Add log1p(multiplier * mel) scaling (Beat This! uses log1p(1000*mel)) - Export BEAT_THIS_MEL_OPTIONS preset matching official preprocessing - Add magnitudeSpectrogram() function cqt.ts (NEW): - Pure TypeScript CQT computation for consonance-ACE - Matches: sr=22050, hop=512, 24 bins/oct, 6 octaves from C1 - Returns raw magnitude (not dB) as expected by the model analysisWorker.ts — BPM inference fixes: - Input shape: [1, nFrames, nMels] not [1, nMels, nFrames] - Parse 2 separate output tensors (beat, downbeat) not interleaved [T,2] - Add peak-picking: maxPool1d(kernel=7) + logit > 0 threshold - Use Beat This! mel preset (nFft=1024, power=1, log1p scaling) analysisWorker.ts — Chord inference fixes: - Compute CQT instead of passing mel spectrogram - Parse 3 output tensors: root[1,T,13], bass[1,T,13], chord[1,T,12] - Proper decoding: argmax root/bass, sigmoid chord, interval analysis - Normalize audio to [-1,1] before CQT (matching training) - Filter short chords (<0.3s) Tests: - melSpectrogram: add magnitude, log1p, Beat This! preset tests - cqt: shape, non-negativity, frequency accuracy, silent input - peakPicking: maxPool1d, isolated peaks, clustered beats, BPM spacing All 2763 tests pass, tsc clean, build succeeds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ChuxiJ · 2026-03-27T19:46:44Z

Triage: Hold for Phase 3 — ONNX Runtime Web is a large dependency (~20MB+). BPM detection and chord recognition are valuable but should be evaluated for bundle size impact. Consider lazy-loading or web worker approach before merging.

Generated by Claude Code

Copilot AI review requested due to automatic review settings March 27, 2026 05:57

Copilot started reviewing on behalf of ChuxiJ March 27, 2026 05:57 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

ChuxiJ and others added 2 commits March 27, 2026 19:58

ChuxiJ mentioned this pull request Mar 27, 2026

chore: Triage and resolve 27 stalled open PRs #1107

Open

31 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add local BPM detection and chord recognition via ONNX Runtime Web#983

feat: add local BPM detection and chord recognition via ONNX Runtime Web#983
ChuxiJ wants to merge 3 commits intomainfrom
feat/issue-978-bpm-chord-detection

ChuxiJ commented Mar 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

ChuxiJ commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

+  // Ensure a final 100% progress update based on actual bytes downloaded
+  onProgress?.({
+    modelName: meta.name,
+    bytesLoaded,
+    bytesTotal: bytesLoaded,
+    percent: 100,
+  });

-    });
+    });
+    // Guard against empty spectrogram (e.g., very short clips producing no STFT frames)
+    if (!melFrames || melFrames.length === 0) {
+      throw new Error(
+        'Unable to analyze audio: clip is too short or produced no spectral frames.',
+      );
+    }

	hopLength: 441, // ~10ms at 22050Hz
	hopLength: 441, // ~20ms at 22050Hz (441 / 22050)

-  for (let i = 0; i < totalOutputFrames; i++) {
-    const beatProb = outputData[i * 2];
-    const downbeatProb = outputData[i * 2 + 1];
-    if (beatProb > beatThreshold) {
-      beats.push({
-        time: i * frameTimeStep,
-        isDownbeat: downbeatProb > beatThreshold,
-        confidence: beatProb,
-      });
-    }
+  const beatProbs: number[] = new Array(totalOutputFrames);
+  const downbeatProbs: number[] = new Array(totalOutputFrames);
+  // First pass: store probabilities per frame
+  for (let i = 0; i < totalOutputFrames; i++) {
+    beatProbs[i] = outputData[i * 2];
+    downbeatProbs[i] = outputData[i * 2 + 1];
+  }
+  // Peak picking: local maxima with a minimum inter-beat interval (refractory period)
+  // Assume an upper BPM limit (e.g., 240 BPM) to derive a minimum plausible beat spacing.
+  const maxBpm = 240;
+  const minBeatInterval = 60 / maxBpm; // seconds
+  let lastBeatTime = -Infinity;
+  for (let i = 0; i < totalOutputFrames; i++) {
+    const beatProb = beatProbs[i];
+    if (beatProb <= beatThreshold) {
+      continue;
+    }
+    const prevProb = i > 0 ? beatProbs[i - 1] : -Infinity;
+    const nextProb = i < totalOutputFrames - 1 ? beatProbs[i + 1] : -Infinity;
+    // Require this frame to be a (non-strict) local maximum
+    if (beatProb < prevProb || beatProb < nextProb) {
+      continue;
+    }
+    const time = i * frameTimeStep;
+    if (time - lastBeatTime < minBeatInterval) {
+      continue;
+    }
+    const downbeatProb = downbeatProbs[i];
+    beats.push({
+      time,
+      isDownbeat: downbeatProb > beatThreshold,
+      confidence: beatProb,
+    });
+    lastBeatTime = time;

Conversation

ChuxiJ commented Mar 27, 2026

Summary

New Files

Test plan

Next steps (separate PRs)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

ChuxiJ commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants