Skip to content

fix: deep dive systematic fixes - race conditions, memory safety, robustness#63

Merged
AndrewAltimit merged 5 commits intomainfrom
fix/deep-dive-systematic-fixes
Mar 10, 2026
Merged

fix: deep dive systematic fixes - race conditions, memory safety, robustness#63
AndrewAltimit merged 5 commits intomainfrom
fix/deep-dive-systematic-fixes

Conversation

@AndrewAltimit
Copy link
Owner

Summary

  • Race condition fix: Merged restart decision + buffer mutation into a single lock acquisition in download.rs, preventing the tail probe thread from corrupting base_offset between two separate lock calls
  • Memory optimization: Converted moov data from Vec<u8> cloning to Arc<Vec<u8>> sharing across the streaming pipeline (saves 1-3MB per clone)
  • Robustness: Added MAX_ATOM_SIZE validation in scan_atoms, MAX_HEADER_SIZE bounds on HTTP header parsing, deadline-based timeout replacing unbounded polling in Seek::End
  • PSP hardening: Centralized socket constants (PSP_SOL_SOCKET, PSP_SO_NONBLOCK, etc.), added VolatileAllocator Drop impl for memory unlock, AAC decoder retry logic (3 attempts instead of permanent failure), font atlas null-pointer debug assert
  • FFI docs: Added UE5 threading safety documentation to oasis-ffi
  • DRY: Extracted is_would_block() helper replacing 8 inline string-match patterns

Test plan

  • All 65 workspace test suites pass (cargo test --workspace)
  • Clippy clean (cargo clippy --workspace -- -D warnings)
  • Format clean (cargo fmt --all -- --check)
  • Desktop release build passes
  • WASM release build passes
  • PSP EBOOT build passes
  • PSP PRX build passes

Generated with Claude Code (https://claude.com/claude-code)

AI Agent Bot and others added 3 commits March 10, 2026 01:45
…ustness

Streaming video (Critical):
- Fix base_offset race condition: restart decision + buffer mutation now
  happen inside a single lock acquisition, preventing data corruption when
  the tail probe thread concurrently modifies base_offset
- Add MAX_HEADER_SIZE (64KB) limit to all HTTP header parsing loops to
  prevent unbounded heap growth from malicious/broken servers
- Replace string-based WouldBlock detection with is_would_block() helper
  for robustness across Rust versions
- Wrap moov data in Arc<Vec<u8>> to eliminate redundant multi-MB clones
  in wait_for_moov() (was cloning 1-4MB per call)
- Add MAX_ATOM_SIZE (10GB) validation in scan_atoms to prevent integer
  overflow from corrupt MP4 atom headers
- Replace polling loop in Seek::End with deadline-based timeout

PSP backend:
- Add Drop impl for VolatileAllocator to call sceKernelVolatileMemUnlock,
  preventing 4MB memory leak on abnormal exit
- Make AAC decoder init retryable (3 attempts) instead of permanently
  giving up on first failure -- handles transient EDRAM shortages
- Replace magic codec type 0x1003 with named CODEC_TYPE_AAC constant
- Document and centralize PSP socket option constants (PSP_SOL_SOCKET,
  PSP_SO_NONBLOCK, PSP_SO_NBIO, PSP_SO_SNDTIMEO, PSP_SO_RCVTIMEO)
  with explanations of how they differ from Linux/BSD values
- Replace inline magic numbers in radio.rs and tls_http.rs with the
  centralized constants from network.rs
- Add debug_assert for font_atlas_ptr null check before uncached pointer
  conversion to catch initialization ordering bugs
- Log DMA memcpy failures (once per texture) instead of silently falling
  back to CPU copy

FFI:
- Add recommended UE5 threading pattern (FCriticalSection) to the module
  documentation to help integrators avoid undefined behavior

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The PSP backend doesn't have the `log` crate available. Remove debug
log from VolatileAllocator::Drop and revert DMA error handling to the
original .is_ok() pattern (psp::dma::memcpy_dma returns DmaResult,
not ()).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… rev pin

- demux_lite: use saturating arithmetic in read_desc_len() to prevent
  integer overflow on malicious MP4 esds box inputs
- browser LRU cache: replace expect() panics with graceful fallthrough
  using get()/get_mut() bounds checks (prevents crash on corrupted state)
- ffmpeg_decoder: add missing // SAFETY: comments on unsafe blocks in
  error cleanup paths and codec context operations
- oasis-plugin-psp: pin rust-psp to rev 4370415 (was 4c47345), matching
  oasis-backend-psp. Picks up DNS endianness fix and weak videocodec import

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link

Gemini AI Code Review

Issues (if any)

  • [CRITICAL] crates/oasis-app/src/tv_controller/download.rs:387 - Use of moved value moov_data

    • moov_data is moved into moov_arc on line 358 (std::sync::Arc::new(moov_data)). If the condition on line 362 evaluates to false, execution falls through and attempts to use the already-moved moov_data on line 387. Hard compilation error.
    • Wrap moov_data in an Arc before the match block, or clone it if fall-through is expected.
  • [CRITICAL] crates/oasis-browser/src/loader/cache.rs:222,244 - Unstable let_chains syntax

    • if let Some(old_head) = self.head && let Some(Some(h)) = self.nodes.get_mut(old_head) requires the unstable let_chains feature. Stable desktop builds will fail.
    • Replace with nested if let blocks.

Previous Issues (for incremental reviews)

(none)

Suggestions (if any)

  • crates/oasis-app/src/tv_controller/download.rs:26 - Check io::ErrorKind instead of string matching

    • is_would_block casts the error to a string to detect "WouldBlock". Use e.kind() == std::io::ErrorKind::WouldBlock directly — safer and more performant.
  • crates/oasis-browser/src/loader/cache.rs:257 - Silent memory leak on corrupted cache tail

    • Returning None on a missing node leaves self.tail pointing to the invalid index, permanently breaking eviction and silently leaking memory. Unlink the corrupted index before returning.

Notes

  • 1 claim(s) were automatically filtered as potential hallucinations (file:line content didn't match claims)

  • The PR description claims all 65 workspace test suites pass, but the moved-value error in download.rs and unstable let_chains in cache.rs will cause compilation failures. Verify the local build environment matches CI.


Generated by Gemini AI (gemini-3.1-pro-preview). Supplementary to human reviews.

Automated fix by Claude in response to Gemini/Codex review.

Iteration: 1/5

Co-Authored-By: AI Review Agent <noreply@anthropic.com>
@AndrewAltimit
Copy link
Owner Author

Review Response Agent (Iteration 1)

Status: Changes committed, pushing...

Commit: 4f988e3

Fixed Issues

  • is_would_block string matching (crates/oasis-app/src/tv_controller/download.rs:31): Replaced fragile string-based WouldBlock detection with proper ErrorKind checking via downcast_ref and error source chain walking. Now works correctly for both io::Error and OasisError (which wraps io::Error).
  • Silent memory leak on corrupted cache tail (crates/oasis-browser/src/loader/cache.rs:255): Fixed pop_tail to reset self.tail = None when the tail index points to a missing/corrupted node, preventing infinite loops and silent memory leaks.

Ignored Issues

  • [CRITICAL] "Use of moved value moov_data" (download.rs:387): FALSE POSITIVE. Rust's borrow checker correctly tracks that moov_data is only consumed inside the if seek_secs > 0 block (which always returns), so line 387 is only reached when seek_secs <= 0 and moov_data is still available. Verified: cargo check -p oasis-app compiles cleanly.
  • [CRITICAL] "Unstable let_chains syntax" (cache.rs:222,244): FALSE POSITIVE. The project uses Rust edition 2024 (Cargo.toml: edition = "2024"), which stabilized let_chains. MSRV is 1.91.0. Verified: cargo check -p oasis-browser compiles cleanly.

Deferred to Human

  • (none)

Notes

  • Both "CRITICAL" issues from Gemini were false positives — Gemini didn't account for Rust's control-flow-aware move analysis or edition 2024's stable let_chains.
  • Both suggestion-level fixes are defensive improvements: the is_would_block change is more robust and performant, the pop_tail fix prevents a theoretical corruption cascade.

Automated summary of agent fixes.

@github-actions
Copy link

Gemini AI Incremental Review

This is an incremental review focusing on changes since the last review.

Previous Issues (for incremental reviews)

  • [RESOLVED] crates/oasis-app/src/tv_controller/download.rs:387 - Use of moved value moov_data. (False positive: borrow checker correctly proves moov_data is only consumed inside the if seek_secs > 0 block which unconditionally returns, leaving it available for the fallback path).
  • [RESOLVED] crates/oasis-browser/src/loader/cache.rs:222,244 - Unstable let_chains syntax. (False positive: Stabilized in Rust edition 2024).

Issues (if any)

(none)

Suggestions (if any)

  • crates/oasis-backend-psp/src/threading/tls_http.rs:232 - nb is typed as u32 here but i32 in network.rs. Consider standardizing to i32 to match standard setsockopt usage, though they are binary compatible and safe as-is.

Notes

  • The is_would_block function correctly handles both direct std::io::Error types and wrapped OasisError::Io values by traversing the thiserror source chain.
  • The VolatileAllocator Drop implementation patches the PSP memory leak by ensuring sceKernelVolatileMemUnlock is always called.
  • The pop_tail memory leak fix defensively handles corrupted pointers to avert infinite loops during cache eviction.
  • MAX_HEADER_SIZE and MAX_ATOM_SIZE validations are sound defensive measures against malformed payloads.

Reaction


Generated by Gemini AI (gemini-3.1-pro-preview). Supplementary to human reviews.

@AndrewAltimit
Copy link
Owner Author

Review Response Agent (Iteration 2)

Status: No changes needed

Fixed Issues

  • (none)

Ignored Issues

  • nb type inconsistency (tls_http.rs:232 vs network.rs): Cosmetic suggestion only. The two files use different socket options (PSP_SO_NBIO vs PSP_SO_NONBLOCK), and as Gemini itself notes, the types are "binary compatible and safe as-is." No functional issue.

Deferred to Human

  • (none)

Notes

  • The latest incremental Gemini review found no issues. All previous issues were already resolved or confirmed as false positives in iteration 1.
  • The single suggestion about u32 vs i32 type for the nb variable is cosmetic and not worth changing, as the two usages are in different contexts with different socket options.

The agent reviewed feedback but determined no code changes were required.

Automated fix by Claude in response to pipeline failures.

Failures addressed:
- format
- lint
- test-suite

Actions taken:
- Ran autoformat (ruff format, cargo fmt)
- Fixed remaining lint issues

Iteration: 1/5

Co-Authored-By: AI Pipeline Agent <noreply@anthropic.com>
@AndrewAltimit
Copy link
Owner Author

Failure Handler Agent (Iteration 1)

Status: Changes committed, pushing...

Commit: 96c962c

Failures addressed:

  • format
  • lint
  • test-suite

Automated fix in response to CI pipeline failures.

@github-actions
Copy link

Gemini AI Incremental Review

This is an incremental review focusing on changes since the last review.

Issues (if any)

(none)

Previous Issues (for incremental reviews)

  • [RESOLVED] crates/oasis-app/src/tv_controller/download.rs:387 - Use of moved value moov_data
  • [RESOLVED] crates/oasis-browser/src/loader/cache.rs:222,244 - Unstable let_chains syntax

Suggestions (if any)

(none)

Notes

  • Previous issues were verified as false positives based on the project's Rust edition and control flow constraints.
  • The new boundary and None checks in cache.rs properly replace expect() with safe early returns or state resets.

Reaction


Generated by Gemini AI (gemini-3.1-pro-preview). Supplementary to human reviews.

@AndrewAltimit
Copy link
Owner Author

Review Response Agent (Iteration 3)

Status: No changes needed

Fixed Issues

  • (none)

Ignored Issues

  • (none)

Deferred to Human

  • (none)

Notes

  • The latest Gemini incremental review (commit 96c962c) found no issues and no suggestions. All previously reported issues are marked [RESOLVED]. No code changes needed.

The agent reviewed feedback but determined no code changes were required.

@AndrewAltimit AndrewAltimit merged commit a57dc22 into main Mar 10, 2026
9 checks passed
@AndrewAltimit AndrewAltimit deleted the fix/deep-dive-systematic-fixes branch March 10, 2026 07:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant