Skip to content

fix: handle corrupted/partial video files (closes #12)#21

Open
menobass wants to merge 1 commit intomainfrom
fix/corrupted-video-handling
Open

fix: handle corrupted/partial video files (closes #12)#21
menobass wants to merge 1 commit intomainfrom
fix/corrupted-video-handling

Conversation

@menobass
Copy link
Copy Markdown
Collaborator

@menobass menobass commented Mar 26, 2026

Summary

  • Adds pre-encoding file integrity validation using ffmpeg -v error -i file -f null -
  • Detects corruption patterns: Invalid NAL units, partial files, missing moov atoms, truncated headers, frame decoding errors
  • Fatal corruption (no moov atom, unplayable) fails immediately with clear error messages
  • Recoverable corruption triggers error-tolerant ffmpeg flags (-err_detect ignore_err, -fflags +discardcorrupt+genpts) that salvage valid video portions
  • Adds corruption-specific error messages in the encoding worker (exit code 183, decode errors, missing PPS/SPS)
  • Adds post-encoding output validation (no segments = fail, empty playlist = fail)

Files changed

File Change
src/services/VideoProcessor.ts New validateFileIntegrity() method, wired into processVideo(), corruption flags in determineEncodingStrategy()
src/workers/VideoEncodingWorker.ts Corruption error messages + output validation

Test results

Tested with generated video files:

Scenario Detection Recovery
Valid file No errors (clean pass) N/A
50% truncated (no moov) moov atom not found — fatal Fails fast with clear message
80% truncated (faststart, moov intact) Invalid NAL unit, partial file Error-tolerant flags produce 748KB recovered output
Last 1KB missing End of file, error reading header Detected as corruption
Zero-byte file moov atom not found — fatal Fails fast with clear message

Test plan

  • npm run build compiles clean
  • Valid files pass through with no behavior change
  • Truncated files with missing moov atom fail fast with clear error
  • Partially corrupted files (moov intact) encode successfully with error-tolerant flags
  • Worker error messages include corruption-specific details

Closes #12

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced video file integrity validation to detect and handle corrupted or damaged video files during processing.
    • Improved error detection and reporting for video encoding failures, providing clearer diagnostics when issues occur.
    • Added post-processing verification to ensure video output files are valid and complete before marking jobs as successful.

Add pre-encoding file integrity validation that detects corruption
patterns (NAL errors, partial files, missing moov atoms, truncation).
Fatal corruption fails fast with clear messages. Recoverable corruption
triggers error-tolerant ffmpeg flags (-err_detect ignore_err,
-fflags +discardcorrupt+genpts) that salvage valid portions.
Also adds corruption-specific error messages in the worker and
post-encoding output validation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 26, 2026

📝 Walkthrough

Walkthrough

The changes implement comprehensive file corruption detection and error-tolerant encoding for video processing. A pre-validation step after download detects corruption using FFmpeg error detection, logs findings, and applies recovery flags for non-fatal issues. Post-encoding validation verifies output integrity with segment checks.

Changes

Cohort / File(s) Summary
File Corruption Detection & Recovery
src/services/VideoProcessor.ts
Added validateFileIntegrity() method that runs FFmpeg error detection, parses stderr for corruption/truncation/bitstream patterns, and returns corruption status. Updated processVideo() to validate immediately after download; fatal errors throw with details, non-fatal errors merge as file_corruption warnings into probe results. Extended determineEncodingStrategy() to detect corruption issues and apply error-tolerant input flags (-err_detect ignore_err, -fflags +discardcorrupt+genpts).
Post-Encoding Integrity Validation
src/workers/VideoEncodingWorker.ts
Added post-FFmpeg completion checks to verify at least one .ts segment exists and output playlist is non-empty. Expanded error handler with corruption-focused pattern matching (invalid NAL units, missing moov atoms, decode failures, missing H.264 SPS/PPS) and detection of FFmpeg exit code 183.

Sequence Diagram

sequenceDiagram
    participant Client
    participant VideoProcessor
    participant FFmpeg
    participant FileSystem
    participant VideoEncodingWorker
    
    Client->>VideoProcessor: processVideo(jobData)
    VideoProcessor->>FileSystem: Download source video
    FileSystem-->>VideoProcessor: Source file ready
    
    VideoProcessor->>VideoProcessor: validateFileIntegrity()
    activate VideoProcessor
    VideoProcessor->>FFmpeg: Run error detection pass
    FFmpeg->>FFmpeg: Parse for corruption patterns
    FFmpeg-->>VideoProcessor: Corruption results
    deactivate VideoProcessor
    
    alt Fatal Corruption
        VideoProcessor->>Client: Throw error with details
    else Non-Fatal Corruption
        VideoProcessor->>VideoProcessor: Merge into probe.issues
        VideoProcessor->>VideoProcessor: determineEncodingStrategy()
        VideoProcessor->>VideoProcessor: Add error-tolerant flags
    end
    
    VideoProcessor->>VideoEncodingWorker: Start encoding with strategy
    VideoEncodingWorker->>FFmpeg: Execute encoding
    FFmpeg->>FileSystem: Write segments (.ts files)
    FFmpeg->>FileSystem: Write output playlist
    FFmpeg-->>VideoEncodingWorker: Process complete
    
    activate VideoEncodingWorker
    VideoEncodingWorker->>FileSystem: Verify segment existence
    VideoEncodingWorker->>FileSystem: Verify playlist non-empty
    deactivate VideoEncodingWorker
    
    alt Validation Passes
        VideoEncodingWorker->>Client: Return success
    else Validation Fails
        VideoEncodingWorker->>Client: Throw integrity error
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

The changes introduce new validation logic with pattern matching and error detection across two files. The implementation involves straightforward FFmpeg stderr parsing and flag application, but requires careful verification of pattern matching correctness, error handling flow, and the interaction between pre-validation and post-validation checks.

Possibly related PRs

Poem

🐰 A rabbit hops through corrupted streams,
With ffmpeg tools and recovery dreams,
We validate, detect, and gently mend,
Lost frames return, corruption's end! 🎬✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: handling corrupted/partial video files, which aligns with the primary objective of the pull request.
Linked Issues check ✅ Passed The PR implements pre-validation integrity checking, fatal vs recoverable handling, error-tolerant encoding flags, and post-encoding output validation; these align with issue #12's core requirements for detecting and handling corrupted files.
Out of Scope Changes check ✅ Passed All changes in VideoProcessor.ts and VideoEncodingWorker.ts are directly focused on file integrity validation and corruption handling, with no unrelated modifications detected.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/corrupted-video-handling

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/services/VideoProcessor.ts`:
- Around line 435-438: The ffmpeg integrity check spawned with execFile (using
ffmpegPath and filePath) treats an ETIMEDOUT as a numeric exit code string and
uses a confusing message and a hardcoded 120000ms timeout that may be too short
for large/long videos; update the Promise callback in VideoProcessor to (1)
detect timeout explicitly (error && error.code === 'ETIMEDOUT' or error.killed)
and produce a clear message like "ffmpeg integrity check timed out" (include
filePath for context), (2) make the timeout configurable (e.g., use a parameter
or a constant like ffmpegValidateTimeout) or derive it from file size/duration
rather than a fixed 120000, and (3) ensure non-timeout ffmpeg failures still
surface their numeric exit code or stderr in the error message so execFile's
callback returns an informative error for both timeout and other failures.
- Around line 781-787: The current hasCorruption branch pushes FFmpeg flags as
separate array elements which is fragile; update the block that checks
probe.issues (the hasCorruption variable) to push each option as a single
combined string into strategy.inputOptions (e.g., combine flag and value into
one element for the err_detect option and keep -fflags with its combined value)
so that the options are atomic when later iterated; reference the hasCorruption
check and the strategy.inputOptions array when making this change.

In `@src/workers/VideoEncodingWorker.ts`:
- Around line 336-339: The current check uses errorMsg.includes('183') which is
too broad and yields false positives; update the condition in the FFmpeg error
handling (the block referencing errorMsg and errorDetails where you append
"FFmpeg exit code 183...") to match only explicit exit/code phrases using a
stricter pattern (e.g., test for whole-word sequences like "exit code 183" or
"code 183" via a regex such as a word-boundary match) so frame numbers or
timestamps like "1830" won't trigger the corruption message; keep the existing
append logic to errorDetails but only run it when the refined match against
errorMsg succeeds.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 04d918b1-2a24-45f2-b1f0-758429d7b0f5

📥 Commits

Reviewing files that changed from the base of the PR and between 7bfd100 and e5fd6f3.

📒 Files selected for processing (2)
  • src/services/VideoProcessor.ts
  • src/workers/VideoEncodingWorker.ts

Comment on lines +435 to +438
return new Promise((resolve) => {
const proc = execFile(ffmpegPath, ['-v', 'error', '-i', filePath, '-f', 'null', '-'], {
timeout: 120000,
}, (error, _stdout, stderr) => {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Timeout handling produces confusing message; 120s may be insufficient for large files.

When timeout occurs, error.code is 'ETIMEDOUT' (string), resulting in the message "ffmpeg integrity check exited with code ETIMEDOUT" — which isn't user-friendly.

Additionally, for 2+ hour videos mentioned in PR objectives, a 120-second validation timeout may be too short when reading every frame.

♻️ Suggested improvements
 return new Promise((resolve) => {
-  const proc = execFile(ffmpegPath, ['-v', 'error', '-i', filePath, '-f', 'null', '-'], {
-    timeout: 120000,
+  execFile(ffmpegPath, ['-v', 'error', '-i', filePath, '-f', 'null', '-'], {
+    timeout: 300000, // 5 minutes for large files
   }, (error, _stdout, stderr) => {
     // ... existing code ...
     
     // Non-zero exit with no recognized patterns — flag as potentially corrupt
-    if (error && !corrupted && !fatal && errors.length === 0) {
-      const exitCode = (error as any).code;
-      if (exitCode) {
-        errors.push(`ffmpeg integrity check exited with code ${exitCode}`);
+    if (error && !corrupted && !fatal && errors.length === 0) {
+      const exitCode = (error as any).code;
+      if (exitCode === 'ETIMEDOUT') {
+        errors.push('Integrity check timed out — file may be too large or unreadable');
+        corrupted = true;
+      } else if (exitCode) {
+        errors.push(`ffmpeg integrity check exited with code ${exitCode}`);
         corrupted = true;
       }
     }

Also applies to: 480-487

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/VideoProcessor.ts` around lines 435 - 438, The ffmpeg integrity
check spawned with execFile (using ffmpegPath and filePath) treats an ETIMEDOUT
as a numeric exit code string and uses a confusing message and a hardcoded
120000ms timeout that may be too short for large/long videos; update the Promise
callback in VideoProcessor to (1) detect timeout explicitly (error && error.code
=== 'ETIMEDOUT' or error.killed) and produce a clear message like "ffmpeg
integrity check timed out" (include filePath for context), (2) make the timeout
configurable (e.g., use a parameter or a constant like ffmpegValidateTimeout) or
derive it from file size/duration rather than a fixed 120000, and (3) ensure
non-timeout ffmpeg failures still surface their numeric exit code or stderr in
the error message so execFile's callback returns an informative error for both
timeout and other failures.

Comment on lines +781 to +787
// 12. 🩹 FILE CORRUPTION: Add error-tolerant decoding flags
const hasCorruption = probe.issues.some(i => i.type === 'file_corruption');
if (hasCorruption) {
strategy.inputOptions.push('-err_detect', 'ignore_err');
strategy.inputOptions.push('-fflags', '+discardcorrupt+genpts');
reasons.push('corruption detected - using error-tolerant decoding');
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Input options pushed as separate elements — works but fragile.

Pushing -err_detect and ignore_err as separate array elements works because the worker iterates and FFmpeg will parse them correctly in sequence. However, this is inconsistent with typical fluent-ffmpeg patterns and could break if iteration order changes.

♻️ More robust approach
 const hasCorruption = probe.issues.some(i => i.type === 'file_corruption');
 if (hasCorruption) {
-  strategy.inputOptions.push('-err_detect', 'ignore_err');
-  strategy.inputOptions.push('-fflags', '+discardcorrupt+genpts');
+  strategy.inputOptions.push('-err_detect ignore_err');
+  strategy.inputOptions.push('-fflags +discardcorrupt+genpts');
   reasons.push('corruption detected - using error-tolerant decoding');
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// 12. 🩹 FILE CORRUPTION: Add error-tolerant decoding flags
const hasCorruption = probe.issues.some(i => i.type === 'file_corruption');
if (hasCorruption) {
strategy.inputOptions.push('-err_detect', 'ignore_err');
strategy.inputOptions.push('-fflags', '+discardcorrupt+genpts');
reasons.push('corruption detected - using error-tolerant decoding');
}
// 12. 🩹 FILE CORRUPTION: Add error-tolerant decoding flags
const hasCorruption = probe.issues.some(i => i.type === 'file_corruption');
if (hasCorruption) {
strategy.inputOptions.push('-err_detect ignore_err');
strategy.inputOptions.push('-fflags +discardcorrupt+genpts');
reasons.push('corruption detected - using error-tolerant decoding');
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/VideoProcessor.ts` around lines 781 - 787, The current
hasCorruption branch pushes FFmpeg flags as separate array elements which is
fragile; update the block that checks probe.issues (the hasCorruption variable)
to push each option as a single combined string into strategy.inputOptions
(e.g., combine flag and value into one element for the err_detect option and
keep -fflags with its combined value) so that the options are atomic when later
iterated; reference the hasCorruption check and the strategy.inputOptions array
when making this change.

Comment on lines +336 to +339
if (errorMsg.includes('183') || errorMsg.includes('exit code 183')) {
errorDetails = (errorDetails ? errorDetails + ' ' : '') +
'FFmpeg exit code 183 indicates input file corruption. Try re-downloading the source.';
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Exit code 183 detection pattern is overly broad — risk of false positives.

errorMsg.includes('183') will match any string containing "183", including frame numbers (frame=1830), timestamps (183.5), or other numeric values. This could incorrectly append the corruption message to unrelated errors.

🔧 Suggested fix: use more specific pattern
-if (errorMsg.includes('183') || errorMsg.includes('exit code 183')) {
+if (/\b(exit\s+)?(code\s+)?183\b/i.test(errorMsg) || errorMsg.includes('exited with code 183')) {
   errorDetails = (errorDetails ? errorDetails + ' ' : '') +
     'FFmpeg exit code 183 indicates input file corruption. Try re-downloading the source.';
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/workers/VideoEncodingWorker.ts` around lines 336 - 339, The current check
uses errorMsg.includes('183') which is too broad and yields false positives;
update the condition in the FFmpeg error handling (the block referencing
errorMsg and errorDetails where you append "FFmpeg exit code 183...") to match
only explicit exit/code phrases using a stricter pattern (e.g., test for
whole-word sequences like "exit code 183" or "code 183" via a regex such as a
word-boundary match) so frame numbers or timestamps like "1830" won't trigger
the corruption message; keep the existing append logic to errorDetails but only
run it when the refined match against errorMsg succeeds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3Speak Encoder Issue: Handling Corrupted/Partial Video Files

1 participant