fix: context audio preview applies timeStretchRate, audioOffset, and warpMarkers#1185
fix: context audio preview applies timeStretchRate, audioOffset, and warpMarkers#1185
Conversation
…and warpMarkers The context audio extractor was copying raw audio samples without applying the same playback transformations used by AudioEngine on the timeline. This caused stretched/warped loops to sound different (wrong pitch, wrong rhythm) in the Add a Layer preview vs actual timeline playback. Now mirrors AudioEngine._scheduleStandardClip and _scheduleWarpedClip: - Sets source.playbackRate.value for timeStretchRate - Uses audioOffset as buffer offset in source.start() - Calls computeWarpedSegments() for clips with warp markers Closes #1184 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Updates context audio extraction so the “Add a Layer” panel preview matches timeline playback, including time-stretching, buffer offsets, and warp-marker scheduling.
Changes:
- Apply
timeStretchRateviaAudioBufferSourceNode.playbackRate.valueandaudioOffsetviasource.start(..., offset, ...). - Add warp-marker support by scheduling segments from
computeWarpedSegments(). - Add unit tests validating stretch rate, audio offset, and warped-segment scheduling behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
src/services/contextAudioExtractor.ts |
Schedules sources in an OfflineAudioContext using stretch rate, audio offset, and warp-marker segments. |
tests/unit/contextAudioExtractor.test.ts |
Adds unit tests asserting playbackRate/offset application and multiple-rate warped scheduling. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const clipEnd = clip.startTime + clip.duration; | ||
| if (clip.startTime >= ctxEnd || clipEnd <= ctxStart) continue; | ||
|
|
||
| // Prefer isolatedAudioKey (pre-trimmed to clip region at generation), | ||
| // fall back to cumulativeMixKey (full project-length). | ||
| let blob: Blob | undefined; | ||
| let alreadyTrimmed = false; | ||
|
|
||
| if (clip.isolatedAudioKey) { | ||
| blob = await loadAudioBlobByKey(clip.isolatedAudioKey); | ||
| if (blob) alreadyTrimmed = true; | ||
| } | ||
| if (!blob && clip.cumulativeMixKey) { | ||
| blob = await loadAudioBlobByKey(clip.cumulativeMixKey); | ||
| } | ||
| if (!blob) continue; | ||
|
|
||
| const arrayBuffer = await blob.arrayBuffer(); | ||
| const buffer = await offlineCtx.decodeAudioData(arrayBuffer); | ||
|
|
||
| // Compute the overlap between this clip and the context window. | ||
| // isolatedAudioKey buffers start at sample 0 = clip.startTime; | ||
| // cumulativeMixKey buffers start at sample 0 = project time 0. | ||
| const overlapStart = Math.max(clip.startTime, ctxStart); | ||
| const overlapEnd = Math.min(clipEnd, ctxEnd); | ||
| if (overlapEnd <= overlapStart) continue; | ||
|
|
||
| let srcStartSample: number; | ||
| let srcEndSample: number; | ||
| if (alreadyTrimmed) { | ||
| srcStartSample = Math.floor((overlapStart - clip.startTime) * sampleRate); | ||
| srcEndSample = Math.min( | ||
| Math.floor((overlapEnd - clip.startTime) * sampleRate), | ||
| buffer.length, | ||
| ); | ||
| const audioOffset = clip.audioOffset ?? 0; | ||
| const rate = clip.timeStretchRate ?? 1; | ||
| const hasWarpMarkers = clip.warpMarkers && clip.warpMarkers.length > 0; | ||
|
|
||
| if (hasWarpMarkers) { | ||
| // Schedule warped segments — mirrors AudioEngine._scheduleWarpedClip | ||
| const segments = computeWarpedSegments(clip.warpMarkers!, clip.duration); | ||
|
|
||
| for (const seg of segments) { | ||
| const segTimelineStart = clip.startTime + seg.targetStart; | ||
| const segTimelineEnd = clip.startTime + seg.targetEnd; |
There was a problem hiding this comment.
To match timeline playback, scheduling should use the clip’s audible timeline coordinates (same as useTransport/AudioEngine), not always clip.startTime/clip.duration. In particular, when stretchMode is non-repitch (e.g. slice/warp markers), contentOffset shifts audible start and reduces audible duration (getClipAudibleStartTime / getClipAudibleTimelineDuration). Using clip.startTime and passing clip.duration into computeWarpedSegments() can cause warped audio to start early and/or compute segments against the wrong clip length. Recommend computing audibleStart, audibleDuration, and using those for clipEnd, overlap checks, segTimelineStart, and the clipDuration argument to computeWarpedSegments().
| // @ts-expect-error - mock global | ||
| globalThis.OfflineAudioContext = MockOfflineAudioContext; |
There was a problem hiding this comment.
This test sets globalThis.OfflineAudioContext at module scope but never restores it, which can leak into other unit tests and create order-dependent failures (especially in files that use vi.stubGlobal('OfflineAudioContext', ...) / vi.unstubAllGlobals()). Prefer vi.stubGlobal('OfflineAudioContext', MockOfflineAudioContext) in beforeAll/beforeEach and vi.unstubAllGlobals() (or restoring the prior value) in afterAll to keep the global environment isolated per test file.
| let srcOffset: number; | ||
| if (alreadyTrimmed) { | ||
| srcOffset = audioOffset; | ||
| } else { | ||
| srcOffset = audioOffset + clip.startTime * rate; | ||
| } | ||
| source.start(clip.startTime, srcOffset, bufferDuration); | ||
| } else { | ||
| // Context starts mid-clip — seek into it | ||
| const seekOffset = overlapStart - clip.startTime; | ||
| const bufferSeek = seekOffset * rate; | ||
| const bufferRemaining = (clipEnd - overlapStart) * rate; | ||
| let srcOffset: number; | ||
| if (alreadyTrimmed) { | ||
| srcOffset = audioOffset + bufferSeek; | ||
| } else { | ||
| srcOffset = audioOffset + clip.startTime * rate + bufferSeek; | ||
| } | ||
| source.start(overlapStart, srcOffset, bufferRemaining); |
There was a problem hiding this comment.
For clips rendered from cumulativeMixKey (alreadyTrimmed === false), the buffer’s time axis is still absolute seconds from project time 0. Multiplying clip.startTime by rate when computing srcOffset shifts the read position and will select the wrong part of the buffer whenever timeStretchRate !== 1 and clip.startTime !== 0. To mirror AudioEngine._scheduleStandardClip, keep buffer offsets in buffer-seconds (no * rate on absolute positions): use clip.startTime (or overlapStart) as the base offset into a project-length buffer, and only apply rate when converting timeline durations/seeks into buffer duration (bufferSeek, bufferRemaining).
| if (hasWarpMarkers) { | ||
| // Schedule warped segments — mirrors AudioEngine._scheduleWarpedClip | ||
| const segments = computeWarpedSegments(clip.warpMarkers!, clip.duration); | ||
|
|
||
| for (const seg of segments) { | ||
| const segTimelineStart = clip.startTime + seg.targetStart; | ||
| const segTimelineEnd = clip.startTime + seg.targetEnd; | ||
|
|
||
| // Skip segments entirely outside context window | ||
| if (segTimelineEnd <= ctxStart || segTimelineStart >= ctxEnd) continue; | ||
|
|
||
| // Clamp to context window | ||
| const overlapStart = Math.max(segTimelineStart, ctxStart); | ||
| const overlapEnd = Math.min(segTimelineEnd, ctxEnd); | ||
| if (overlapEnd <= overlapStart) continue; | ||
|
|
||
| const source = offlineCtx.createBufferSource(); | ||
| source.buffer = buffer; | ||
| source.playbackRate.value = seg.playbackRate; | ||
| source.connect(offlineCtx.destination); | ||
|
|
||
| const sourceDur = seg.sourceEnd - seg.sourceStart; | ||
| const targetDur = seg.targetEnd - seg.targetStart; | ||
|
|
||
| if (overlapStart <= segTimelineStart) { | ||
| // Context includes segment start — schedule normally | ||
| source.start( | ||
| segTimelineStart, | ||
| audioOffset + seg.sourceStart, | ||
| sourceDur, | ||
| ); | ||
| } else { | ||
| // Context starts mid-segment — seek into it | ||
| const elapsed = overlapStart - segTimelineStart; | ||
| const fraction = elapsed / targetDur; | ||
| const sourceSeek = fraction * sourceDur; | ||
| source.start( | ||
| overlapStart, | ||
| audioOffset + seg.sourceStart + sourceSeek, | ||
| sourceDur - sourceSeek, | ||
| ); |
There was a problem hiding this comment.
Warp scheduling currently assumes the decoded buffer is clip-local (like AudioEngine’s clip.buffer), but contextAudioExtractor can fall back to cumulativeMixKey (project-length) when isolatedAudioKey is missing. In that case, source.start(..., audioOffset + seg.sourceStart, ...) will read from the wrong position because it doesn’t incorporate the clip’s project-time base offset into the buffer. Consider branching on alreadyTrimmed here too: for project-length buffers, add the clip’s audible start time (or clip.startTime) to the segment’s source offset so the segment maps to the correct absolute buffer region.
The context audio blob starts at project time 0 (for backend alignment), but the preview player was playing from 0 — causing N seconds of silence when the context window starts mid-project (e.g. 9.5s of silence for a context starting at bar 6). Fixes: - Seek audio.currentTime to contextWindow.startTime on play - Auto-stop when reaching contextWindow.endTime - Display context-relative time (0:00 → context duration) - Extract waveform peaks only from the context portion of the blob - Map scrub/waveform clicks to absolute blob time correctly Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Default chunkMaskMode is now 'explicit' (was 'auto') - Clicking "Whole song" switches mask to 'auto' - Clicking "Restore" switches mask back to 'explicit' Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… into [0, ctxEnd] The previous approach tried to seek past leading silence in the [0, ctxEnd] blob, but the audio scheduled at absolute project time wasn't being found correctly by the AudioContext decoder (sample rate mismatch / alignment). New approach: add `trimToContext` option to `extractContextAudio`. When true, the OfflineAudioContext spans [0, ctxDuration] and all source.start() times are shifted by -ctxStart, so the blob starts with actual audio at sample 0. Preview uses trimToContext=true (no offset hacks needed). Generation path unchanged (trimToContext defaults to false). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously generation sent [0, ctxEnd] blob with absolute repainting times, causing leading silence that mismatches training distribution. Now both preview and generation use trimToContext=true: - src_audio blob spans [0, ctxDuration], audio starts at sample 0 - repainting_start/end offset by ctxStart (relative to context window) - audio_duration = ctxDuration (not full project length) Applied to both generateFromAddLayer and generateFromMultiTrack paths. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
contextAudioExtractor.tsnow appliestimeStretchRateviasource.playbackRate.value,audioOffsetas buffer offset insource.start(), and schedules warped segments viacomputeWarpedSegments()for clips with warp markersAudioEngine._scheduleStandardClipand_scheduleWarpedClipbehaviorTest plan
Closes #1184
🤖 Generated with Claude Code