Replace MoviePy with ffmpeg for 10-100x performance improvement by cyberb · Pull Request #15 · motattack/mtslinker

cyberb · 2026-03-24T12:01:27Z

Summary

Parallel downloads: 4 concurrent downloads instead of sequential, with 1MB chunk size (was 8KB)
ffmpeg instead of MoviePy: Eliminates the full video re-encoding that caused 3-hour videos to take 14+ hours to process. Uses ffmpeg concat demuxer with -c copy for near-instant concatenation
Removed moviepy/numpy dependencies: Only requires httpx and tqdm as Python deps. ffmpeg is the only new system requirement
Proper gap handling: Black segments and silence generated with ffmpeg (ultrafast, stillimage tune) instead of in-memory numpy arrays
Audio-only track merging: Uses ffmpeg amix filter with proper delay offsets

What changed

File	Change
`downloader.py`	Added `download_chunks_parallel()` using ThreadPoolExecutor, chunk size 8KB → 1MB
`processor.py`	Full rewrite: MoviePy → ffmpeg subprocess calls (ffprobe, concat demuxer, filter_complex)
`webinar.py`	Updated to use new parallel download + ffmpeg pipeline
`requirements.txt`	Removed `moviepy`
`setup.py`	Removed `moviepy` dep, bumped version to 2.0.0
`Dockerfile`	Added `ffmpeg` package installation

Performance

Tested on a real 3-hour webinar (139 chunks, 106 video + 33 audio-only segments):

Metric	Old (MoviePy)	New (ffmpeg)
Total time	14+ hours (reported in #8)	~70 minutes
Video concat	Full re-encode	`-c copy` (stream copy)
Peak RAM	~1GB	~300MB

For recordings without audio-only tracks, the improvement is even larger since the audio mixing step (the slowest remaining part) is skipped entirely.

Requirements

ffmpeg and ffprobe must be installed (apt install ffmpeg / brew install ffmpeg)
The tool checks for ffmpeg at startup and gives a clear error if missing

Fixes #8

Test plan

Download a webinar with multiple video segments — verified output plays correctly
Test with recordings that have 33 audio-only tracks — audio merged successfully
Test with recordings that have gaps between segments — black segments generated correctly
Verify Docker build works with new Dockerfile

- Parallel downloads (4 concurrent) instead of sequential - Increase download chunk size from 8KB to 1MB - Replace MoviePy video processing with direct ffmpeg calls - Use ffmpeg concat demuxer with -c copy (no re-encoding) - Normalize segments to common resolution for reliable concat - Handle gaps with lightweight ffmpeg-generated black segments - Merge audio-only tracks using ffmpeg filter_complex - Remove moviepy and numpy dependencies - Add ffmpeg to Dockerfile - Bump version to 2.0.0 Fixes motattack#8

Some MTS Link recordings store only audio in the direct mp4 files, while the HLS delivery endpoint has both video and audio streams. Check the HLS playlist for a video track and download via ffmpeg when detected.

The old _merge_audio_tracks passed all audio files (up to 63+) as simultaneous inputs to a single ffmpeg amix command, which required ffmpeg to hold all delayed audio streams in memory for the full recording duration — causing OOM kills on long recordings. Now audio tracks are pre-delayed individually, then mixed via tree reduction in batches of 8. Also routes all subprocess calls through _run_ffmpeg which logs stderr on failure instead of silently swallowing it.

Recordings with multiple simultaneous feeds (webcam + screen share) have segments with overlapping timestamps. The old code laid them out sequentially, turning a 3hr recording into 10+ hours of concat video. This also caused the audio merge WAVs to be padded to 10hrs each, requiring ~400GB of disk. Added _deduplicate_overlapping() which keeps only the longest segment per time window (186 -> 7 segments in a real test case). Also pass total_duration from the API to _merge_audio_tracks so WAVs are padded to the correct recording length, not the (potentially inflated) concat file duration.

The previous approach materialized each audio track as a full-duration WAV (~1.8GB each for a 3hr recording). With 63 tracks that's ~113GB, filling the disk and crashing with "No space left on device". Now audio tracks are mixed in batches directly with adelay inside the ffmpeg filter graph, outputting compressed m4a (~15MB each). No intermediate WAVs are created. Batch results are tree-reduced and intermediates are deleted immediately after each round.

- Add _validate_downloaded_file() to check files with ffprobe after download - Re-download corrupt files (missing moov atom) up to 2 retries - Validate existing cached files on disk, re-download if corrupt - Add _is_valid_media() in processor to skip corrupt files during classification - Audio batch mixing catches errors and skips failed batches instead of crashing - If all audio batches fail, output video without audio overlay

Extract presentation.update events from the MTS API to get slide images and their timestamps. Download pre-rendered slide JPGs and composite them with the webcam video in a 1280x720 layout: - Left 960px: presentation slide - Top-right 320x180: webcam - Slides are pre-encoded as 1fps video segments and concatenated into a single track, then overlaid with the webcam in one pass. Recordings without presentations are unaffected (existing behavior).

Some recordings have tiny thumbnail-sized video segments (192x108) as the first file. The old code used the first segment's resolution for all normalization, resulting in a blurry output. Now scans all segments and picks the largest, with a 640x360 floor.

When multiple webcams overlap at the same timestamp, the old code kept the longest segment (often a random participant). Now tracks conference ID from the API and prefers the user with the most total segments across the recording — typically the presenter/instructor. Falls back to longest segment when conf_id is unavailable.

The -loop 1 -framerate 1 -t approach could produce millions of frames for long-duration slides (e.g., last slide staying up for 3 hours), causing ffmpeg to spin for hours and write gigabytes. Now uses -frames:v to strictly cap frame count to match duration at 1fps.

Detects h264_nvenc at startup and uses it for all encoding steps if available. Falls back to libx264 CPU encoding if no GPU. Massively reduces CPU load and encoding time on systems with NVIDIA GPUs, while keeping the CPU cool.

The overlay step was CPU-bound (97°C). Now uses hwupload_cuda, scale_cuda, and overlay_cuda to do the compositing entirely on GPU. Falls back to CPU filters if CUDA overlay is not available.

Two changes: 1. Swap inputs in slide compositing so webcam (25fps) drives the output frame clock instead of the slide track (1fps). Fixes choppy webcam playback in presentation videos. 2. For recordings without presentation slides that have multiple concurrent webcams (ПЗ sessions), composite all active webcams into a grid layout using xstack instead of discarding all but one. Grid size adapts to the number of concurrent webcams (2x1, 2x2, 3x3 etc). Audio from all participants is mixed.

…ipeline Webcam inputs may lack audio tracks, causing ffmpeg to fail with 'Stream specifier :a matches no streams'. Since _merge_audio_tracks handles all audio separately, the grid step should output video only.

The old scoring picked the conference with the most segments, which favored participants toggling their cameras (many short segments) over the presenter (few long segments). Also had a window-shrinking bug where replacing a long segment with a short higher-ranked one let subsequent segments leak through. New approach: identify the main conference by total recorded duration, keep its segments, and fill gaps from other conferences.

Extracts ADMIN role from userlist events, maps to conference IDs via conference.add events, and passes is_admin flag through the download pipeline. Dedup now prefers ADMIN conferences (the presenter), falling back to total duration when no admin is found. Also fixes download_chunks_parallel to preserve the is_admin flag.

…ebcam layout Dedup gap-fill: clamp "other" conference segments to actual gap boundaries instead of using raw file duration, preventing timeline overflow. Compile: skip segments starting before current_time (safety net for overlaps), and truncate segments via -t so they can't overflow into the next segment. Slide composite: scale webcam proportionally to 320px wide (was fixed 320x180), so portrait webcams render at a usable size instead of being squished.

Grid fix: cell dimensions from integer division could be odd, causing ffmpeg's scale filter to round up and produce dimensions larger than the pad target ("Padded dimensions cannot be smaller than input"). Now forces even dimensions and uses min() to cap scale output. Audio fix: amix divides volume by number of inputs at each stage. After 3 levels of mixing (batch->reduce->overlay), audio was attenuated to near-silence (-91 dB). Added volume=N compensation after each amix to restore original loudness.

normalize=0 already prevents amix from dividing by N, so the volume=N multiplier was over-amplifying (~x112 across 3 pipeline stages), turning noise from silent tracks into interference. Also filter out silent audio-only segments (<-80 dB) before mixing so they don't waste processing time or add noise floor.

-80 dB was filtering out participant microphone audio that sits around -80 to -60 dB. Only -91 dB is true digital silence.

With normalize=0 and no volume=N, mixing silent segments with real audio just gives real audio. The filter was incorrectly dropping participant microphone tracks. Removing it simplifies the pipeline and ensures all audio-only segments are included.

When slides + multiple webcams are present, analyzes audio levels per participant to detect who is talking. Switches the right-side webcam to show the active speaker, defaulting to presenter when nobody else talks. Uses 2s analysis windows with 4s minimum hold to prevent flickering.

NVENC + complex overlay filter on 3+ hour videos consumes ~7GB, triggering OOM killer. libx264 uses ~300MB for the same operation. All other encoding steps still use NVENC.

The 720p cap + fast preset still OOM-kills on 3.5h recordings with many segments (e.g. 1197678196: 125 chunks, 23 participants, 34 segments).

Split compositing into 30-min chunks so ffmpeg never holds the full video in memory. This allows using NVENC again (faster) and restores 720p resolution cap. Each chunk is composited independently then concatenated with stream copy.

_get_video_encoder_fast() set _NVENC_AVAILABLE directly, bypassing _detect_gpu(). This left _CUDA_OVERLAY_AVAILABLE as None, so compositing always used CPU overlay even with CUDA support available.

Each participant's audio segments are analyzed by independent ffmpeg calls. Running 4 in parallel instead of sequentially speeds up speaker detection ~4x on multi-core systems.

Speaker switching can produce segments whose combined duration exceeds the original recording. Cap the concat at total_duration from the API to ensure the output matches the expected length.

yokidjo · 2026-04-09T20:03:13Z

@cyberb Thank you for the excellent work done

@motattack you can accept the pull request after finishing testing the work in docker

Grid segments could start late (e.g. 1762s) but concat placed them at 0s, causing video/audio desync. Now inserts leading and internal gaps in the manifest so the video timeline matches total_duration exactly. Validated: planned duration equals API duration with zero timeline gaps.

cyberb · 2026-04-09T21:26:40Z

Sorry it started as a simple improvement , then I tried on various videos, then some videos would not have sound, some no video. Then fixing fixing fixing. So currently it downloaded videos I needed, but testing it was a pain as I needed to wait for hours, then check videos, then fix then repeat.
So now I am not sure if you want to take this PR or not, Claude code was helping me :(

yokidjo · 2026-04-09T22:41:29Z

@cyberb

Security note: command injection via filenames

Since we're passing arguments to ffmpeg via subprocess, we should ensure that filenames containing special characters (;, |, $(), ` ) are not interpreted as shell commands.

✅ The current implementation looks safe — all calls use subprocess.run(cmd: List[str]) without shell=True, so arguments remain as literal strings passed to ffmpeg, not parsed by the shell.

Just double-checking: no os.system() or shell=True sneaked into any of the 46 commits, right? 😄

yokidjo · 2026-04-09T22:54:42Z

@cyberb

Architecture & I/O review:

· Responsibility split: processor.py is quite heavy (~800 loc). Consider splitting audio / grid / manifest logic into separate modules in future refactoring.
· I/O: efficient — no large in-memory arrays, intermediate files cleaned up, parallel downloads.
· Pipeline design: normalize→concat→audio is correct for HLS streams with variable resolutions. Incremental concat to a single growing file would force re-encoding and lose -c copy benefits. Current approach is optimal for this use case.

Grid composite only keeps audio from input 0, losing all other webcam voices. Now extracts audio from all video files with audio streams for mixing in _merge_audio_tracks, same as dedup path.

yokidjo · 2026-04-10T06:34:44Z

@cyberb

Architecture suggestion for future refactoring

First of all, thank you so much for the incredible amount of work you've put into this PR. The performance gains (14+ hours → 70 minutes) are mind-blowing, and the attention to edge cases — from OOM handling to speaker detection, from GPU overlay to audio tree reduction — shows an amazing level of dedication and skill. Seriously impressive stuff.

One observation: processor.py has grown to ~1900 lines and now handles GPU detection, audio mixing, grid composition, slide compositing, and overlap strategies all in one place. The conditional logic for choosing between grid, dedup, and simple concat is already there.

💡 For a future iteration, consider applying the Strategy pattern:

Define a base ProcessingStrategy class with analyze() and execute() methods
Implement concrete strategies: SimpleConcatStrategy, GridCompositionStrategy, SlideCompositionStrategy
Let a thin VideoProcessor context select the strategy based on has_overlaps and slide_events

This would make the code much easier to test (unit tests per strategy), extend with new scenarios, and maintain without touching the monolith.

Not a blocker for this PR at all — just a thought for when you're ready to split this beast apart! Thanks again for this massive contribution.

cyberb · 2026-04-10T06:43:09Z

Could hold a bit I am still finding some sound issues in the fridge mode, I have 30 links to download, but it is getting better now. A day or two and I will let you know?

Split monolithic processor.py (1915 lines) into 6 focused classes: - FFmpegRunner: ffmpeg execution, GPU detection, encoder selection - MediaProber: file probing, duration, streams, audio levels - GridCompositor: multi-webcam grid layout - SlideCompositor: presentation slide overlay - AudioMerger: batched audio mixing with tree-reduce - SegmentBuilder: normalize, gaps, dedup, admin detection VideoProcessor composes all classes via constructor injection. No static methods, no underscore prefixes on public methods. processor.py is now a thin orchestrator with backward-compatible module-level functions. 33 tests across 5 test files, all passing. Added requirements-dev.txt with pytest.

yokidjo · 2026-04-10T08:16:38Z

@cyberb
Testing strategy suggestions (non-blocking)

Awesome work on the refactoring and adding 33 tests! The class decomposition with dependency injection is perfect for testing.

Two suggestions for future iterations:

Integration tests with Testcontainers

If the current tests mock FFmpegRunner and MediaProber (which makes sense for fast unit tests), consider adding a few integration tests with Testcontainers to verify that complex ffmpeg filter graphs work with real binaries.

Benefits:

Catch regressions when ffmpeg CLI behavior changes between versions
Verify that amix tree-reduce, xstack grid, and CUDA overlay detection work end-to-end
Run identically in CI and locally

Trade-off: slower — can be marked @pytest.mark.integration and run separately.

Code coverage reporting

Adding pytest-cov would help track which parts of the pipeline are well-tested vs. untested. Example setup:

pytest --cov=mtslinker --cov-report=term --cov-report=html

This gives visibility into coverage for audio mixing, grid composition, slide overlay, and edge case handling.

Both are just ideas for the roadmap — not blockers for this PR. Thanks again for the massive effort on this!

yokidjo · 2026-04-10T08:19:16Z

GitHub Actions integration suggestion

I see you added requirements-dev.txt — great first step toward CI. Here's a complete setup you could add in a future PR if you want automated testing on every push.

Create .github/workflows/ci.yml with:

name: CI

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.10"
          
      - name: Install ffmpeg
        run: sudo apt-get update && sudo apt-get install -y ffmpeg
        
      - name: Install dependencies
        run: |
          pip install --upgrade pip
          pip install -r requirements.txt
          pip install -r requirements-dev.txt
          
      - name: Run tests with coverage
        run: pytest --cov=mtslinker --cov-report=xml --cov-report=term
        
      - name: Upload coverage to Codecov
        uses: codecov/codecov-action@v3
        with:
          files: ./coverage.xml

Notes:

ubuntu-latest has Docker pre-installed, so Testcontainers would work out of the box if added later
For Codecov, you'll need to add CODECOV_TOKEN to repository secrets (optional — public repos can use tokenless upload)
Integration tests with @pytest.mark.integration can run in the same job since ffmpeg is already installed via apt

Not a blocker — just a template for when you're ready to automate the test suite.

Grid composite already includes audio from input 0 (sorted so audio-bearing webcam is first). Extracting the same webcam audio again for _merge_audio_tracks caused the voice to play twice. Audio-only tracks already cover other participants.

yokidjo · 2026-04-10T11:19:04Z

@cyberb Let's wrap this up — amazing work!

I think we should cap this PR at the current functionality and move any remaining edge cases to follow-up PRs or issues. This will let us merge the massive performance improvements now rather than chasing the last 1% of fringe scenarios indefinitely.

Here's a summary of what's been accomplished:

Performance

14+ hours → 70 minutes for 3-hour webinars
RAM usage: ~1GB → ~300MB
Parallel downloads (4 threads, 1MB chunks)
-c copy for video concat (no re-encoding)

Architecture

Split monolithic processor.py (1915 lines) into 6 focused classes:
- FFmpegRunner: ffmpeg execution, GPU detection, encoder selection
- MediaProber: file probing, duration, streams, audio levels
- GridCompositor: multi-webcam grid layout
- SlideCompositor: presentation slide overlay
- AudioMerger: batched audio mixing with tree-reduce
- SegmentBuilder: normalize, gaps, dedup, admin detection
Two-phase pipeline: analyze → manifest.json → execute
Dependency injection throughout (no static methods, testable)

Edge cases handled

Overlapping segments (grid composition + dedup strategies)
Variable resolutions (auto-detects max, 640x360 floor)
ADMIN detection for presenter prioritization
GPU acceleration (NVENC + CUDA overlay with CPU fallback)
OOM prevention (30-min chunking, tree-reduce audio mixing, no giant WAVs)
Corrupt file recovery (ffprobe validation, up to 2 retries)
Gaps and black segment generation
Audio-only tracks and webcams without audio
Speaker detection and switching (with hysteresis)
Presentation slide compositing (1280x720 layout)

Testing

33 tests across 5 test files, all passing
requirements-dev.txt with pytest
Class structure ready for mocking and Testcontainers in the future

Security

All ffmpeg calls use subprocess.run with List[str] (no shell=True)
Safe against command injection from malicious filenames

What's left (can be follow-up issues/PRs)

Docker validation (build image, test on short webinar with slides, verify no ffmpeg errors)
Remaining audio edge cases from your 30-link test batch
GitHub Actions CI workflow
Code coverage reporting (pytest-cov)
Testcontainers integration tests

This PR is already a massive win. Let's merge it and iterate on the rest in smaller, focused PRs. 🚀

Grid input 0 already has audio — but other webcams' audio was lost. Now extracts audio from all webcam files EXCEPT the one used as input 0 in each grid segment. This captures all voices without echo. Added test_audio_pipeline.py with integration tests: - Grid audio no echo (same source not duplicated) - Grid takes audio from first input - Audio merge preserves timing with adelay - Segment duration matches plan - Black segments have silent audio stream

yokidjo · 2026-04-10T13:06:17Z

@cyberb
One question before this gets too deep into edge-case handling:

Have you looked at the actual MTS Link player JavaScript code in the browser?

I'm wondering if we could simplify (or even eliminate) most of the complex reconstruction logic by understanding how the official client does it. The browser player must have a source of truth for:

Segment timeline (what plays when)
Overlapping sources (webcam + screen share simultaneously)
Layout instructions (who goes where in the grid)
Audio mixing priorities

If we can find where the player builds its internal playlist, we could replicate that logic instead of heuristically fixing:

Gap detection and black frame insertion
Overlap deduplication
Manual adelay offset calculation
Grid xstack assembly
Admin/speaker detection

yokidjo

@cyberb
One question before this gets too deep into edge-case handling:

Have you looked at the actual MTS Link player JavaScript code in the browser?

Grid input 0 already has audio — other webcams need extraction. Tracks which path is input 0 per grid segment and excludes only those. Added test_audio_pipeline.py: echo, timing, duration, silence tests.

Replace all guesswork (overlap detection, dedup, speaker switching) with StreamTimeline that builds playback windows from API mediasession events — matching exactly what the MTS-Link web player does. - Add StreamTimeline class with dataclasses (MediaSession, TimeWindow, GridSource, AudioTrack, DownloadChunk, SlideEvent) - GridCompositor now mixes all audio streams inline via amix - Remove dedup strategy, overlap heuristics, webcam audio extraction - Remove dead code: deduplicate, extract_admin_conf_ids, is_valid, is_silent, analyze_audio_levels, legacy compat wrappers - Fix .gitignore (was too broad, ignored tests/) - 49 tests passing

yokidjo

@cyberb StreamTimeline approach looks great. Code is cleaner, logic matches the actual player, tests pass.

@motattack LGTM. Ready for merge.

Audio-only streams were downloaded as raw binary from the storage URL, which returns valid MP4 containers with silent audio (-91 dB). The real audio lives in the HLS playlist variants. Now tries HLS first for all streams (video and audio-only), falling back to direct download only if HLS is unavailable.

The variable was removed in the mediasession rewrite (9a8a657) but the logging line still referenced it. Strategy is now always 'timeline'.

When multiple streams are active: - Screenshare → main area, admin → PIP overlay - Admin (no screenshare) → main area, participant → PIP overlay - No admin/screenshare → fall back to grid Also fix grid xstack to always output exact target resolution, preventing concat corruption from mismatched segment sizes.

Split GridCompositor into smaller classes: - GridLayout: xstack grid compositing - PresenterLayout: main + PIP overlay compositing - GridCompositor: backward-compatible facade delegating to both - _build_audio_filter: shared audio mixing helper - _even: shared utility Add 12 new tests covering presenter layout (main-only, PIP, extra audio, resolution consistency), audio filter builder, grid resolution matching, and facade backward compat. 61 tests passing.

The final amix step assumed the video always has an audio stream and that it matches the mixed audio's 44100 Hz sample rate. HLS sources come in at 48000 Hz, causing "Invalid argument" in the filter graph. Now checks for audio presence and resamples before mixing.

The old heuristic (has_video && !has_audio = screenshare) was wrong — it matched webcams with muted mics. The API provides explicit stream.screensharing data on mediasession.add events. Now uses that to correctly identify screen share streams for presenter layout.

cyberb added 23 commits March 24, 2026 10:40

Download video via HLS when storage mp4 is audio-only

cdcb119

Some MTS Link recordings store only audio in the direct mp4 files, while the HLS delivery endpoint has both video and audio streams. Check the HLS playlist for a video track and download via ffmpeg when detected.

Add NVENC GPU encoding support with automatic detection

f95646e

Detects h264_nvenc at startup and uses it for all encoding steps if available. Falls back to libx264 CPU encoding if no GPU. Massively reduces CPU load and encoding time on systems with NVIDIA GPUs, while keeping the CPU cool.

Use CUDA GPU filters for overlay compositing when available

c3fa7ed

The overlay step was CPU-bound (97°C). Now uses hwupload_cuda, scale_cuda, and overlay_cuda to do the compositing entirely on GPU. Falls back to CPU filters if CUDA overlay is not available.

fix: skip audio in grid composite - audio handled by separate merge p…

d24229e

…ipeline Webcam inputs may lack audio tracks, causing ffmpeg to fail with 'Stream specifier :a matches no streams'. Since _merge_audio_tracks handles all audio separately, the grid step should output video only.

Lower silent audio threshold from -80 to -88 dB

24b1bf4

-80 dB was filtering out participant microphone audio that sits around -80 to -60 dB. Only -91 dB is true digital silence.

Cap target resolution at 720p to prevent OOM kills on long lectures

e614b83

Tisar2 mentioned this pull request Apr 5, 2026

Не полное отображение видео #13

Closed

cyberb added 6 commits April 5, 2026 15:53

Use libx264 for slide compositing to avoid OOM on 8GB RAM

dd04fb4

NVENC + complex overlay filter on 3+ hour videos consumes ~7GB, triggering OOM killer. libx264 uses ~300MB for the same operation. All other encoding steps still use NVENC.

Lower resolution cap to 480p and use ultrafast preset to fix OOM

bd4647b

The 720p cap + fast preset still OOM-kills on 3.5h recordings with many segments (e.g. 1197678196: 125 chunks, 23 participants, 34 segments).

Fix CUDA overlay detection skipped when encoder_fast called first

8bbfe96

_get_video_encoder_fast() set _NVENC_AVAILABLE directly, bypassing _detect_gpu(). This left _CUDA_OVERLAY_AVAILABLE as None, so compositing always used CPU overlay even with CUDA support available.

Parallelize speaker detection audio analysis with 4 threads

75b4457

Each participant's audio segments are analyzed by independent ffmpeg calls. Running 4 in parallel instead of sequentially speeds up speaker detection ~4x on multi-core systems.

Cap concat output at total_duration to prevent inflated video length

e720279

Speaker switching can produce segments whose combined duration exceeds the original recording. Cap the concat at total_duration from the API to ensure the output matches the expected length.

Extract webcam audio for grid recordings too

58053d3

Grid composite only keeps audio from input 0, losing all other webcam voices. Now extracts audio from all video files with audio streams for mixing in _merge_audio_tracks, same as dedup path.

yokidjo mentioned this pull request Apr 10, 2026

Ускорение #8

Open

cyberb added 2 commits April 10, 2026 07:52

Add __pycache__ to gitignore

2772f7d

yokidjo approved these changes Apr 10, 2026

View reviewed changes

yokidjo suggested changes Apr 10, 2026

View reviewed changes

cyberb added 2 commits April 10, 2026 22:37

Extract audio from non-primary grid webcams, add audio pipeline tests

644e782

Grid input 0 already has audio — other webcams need extraction. Tracks which path is input 0 per grid segment and excludes only those. Added test_audio_pipeline.py: echo, timing, duration, silence tests.

yokidjo reviewed Apr 11, 2026

View reviewed changes

yokidjo approved these changes Apr 11, 2026

View reviewed changes

cyberb added 6 commits April 12, 2026 17:51

Fix NameError: remove stale overlap_strategy reference in log line

b8cc77f

The variable was removed in the mediasession rewrite (9a8a657) but the logging line still referenced it. Strategy is now always 'timeline'.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace MoviePy with ffmpeg for 10-100x performance improvement#15

Replace MoviePy with ffmpeg for 10-100x performance improvement#15
cyberb wants to merge 60 commits intomotattack:masterfrom
cyberb:feature/ffmpeg-performance

cyberb commented Mar 24, 2026

Uh oh!

yokidjo commented Apr 9, 2026

Uh oh!

cyberb commented Apr 9, 2026

Uh oh!

yokidjo commented Apr 9, 2026

Uh oh!

yokidjo commented Apr 9, 2026

Uh oh!

yokidjo commented Apr 10, 2026

Uh oh!

cyberb commented Apr 10, 2026

Uh oh!

yokidjo commented Apr 10, 2026

Uh oh!

yokidjo commented Apr 10, 2026

Uh oh!

yokidjo commented Apr 10, 2026

Uh oh!

yokidjo commented Apr 10, 2026

Uh oh!

yokidjo left a comment •

edited

Loading

Uh oh!

yokidjo left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cyberb commented Mar 24, 2026

Summary

What changed

Performance

Requirements

Test plan

Uh oh!

yokidjo commented Apr 9, 2026

Uh oh!

cyberb commented Apr 9, 2026

Uh oh!

yokidjo commented Apr 9, 2026

Uh oh!

yokidjo commented Apr 9, 2026

Uh oh!

yokidjo commented Apr 10, 2026

Uh oh!

cyberb commented Apr 10, 2026

Uh oh!

yokidjo commented Apr 10, 2026

Uh oh!

yokidjo commented Apr 10, 2026

Uh oh!

yokidjo commented Apr 10, 2026

Uh oh!

yokidjo commented Apr 10, 2026

Uh oh!

yokidjo left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yokidjo left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yokidjo left a comment •

edited

Loading