Skip to content

Conversation

@bradh
Copy link
Collaborator

@bradh bradh commented Jan 16, 2026

This is from ISO/IEC 23008-12:2025 Section 7.4. Its used in HEIF image sequences.

This fixes the parsing of the several of the old HEIF samples in the MPEG File Format Conformance suite. Without this change, we get OverDecode errors for sgpd.

This is from ISO/IEC 23008-12:2025 Section 7.4. Its used in HEIF
image sequences.

This fixes the parsing of the several of the old HEIF samples in
the MPEG File Format Conformance suite. Without this change, we
get OverDecode errors for `sgpd`.
@bradh
Copy link
Collaborator Author

bradh commented Jan 16, 2026

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Jan 16, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link

coderabbitai bot commented Jan 16, 2026

Walkthrough

The change adds support for the "refs" grouping type in the SGPD box handler. When the grouping_type equals "refs", a DirectReferenceSampleList structure is decoded consisting of a sample ID and a list of direct reference samples (capped at 16 entries). The corresponding encoder handles serialization of this structure. A new enum variant DirectReferenceSampleList(u32, Vec<u32>) was added to AnySampleGroupEntry. Test coverage validates both the new DirectReferenceSampleList decoding/encoding path and a complete SGPD structure example.

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding support for refs sample groups in the sgpd atom, which matches the primary purpose of the changeset.
Description check ✅ Passed The description is directly related to the changeset, explaining the ISO standard reference, the HEIF use case, and the specific parsing issue being resolved.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

🧹 Recent nitpick comments
src/moov/trak/mdia/minf/stbl/sgpd.rs (1)

143-150: Enforce (or remove) the implicit 16-entry limit; verify count field width.
Right now decode only caps capacity to 16 but still accepts larger counts, while encode only checks u8 conversion. If the spec caps direct references at 16, please reject >16 on both paths; otherwise consider removing the cap to avoid implying a limit. Also confirm the count field is u8 per ISO/IEC 23008-12:2025 §7.4.

♻️ Proposed fix (if max = 16)
-                let num_direct_reference_samples = u8::decode(buf)? as usize;
-                let mut direct_reference_samples =
-                    Vec::with_capacity(std::cmp::min(num_direct_reference_samples, 16));
+                let num_direct_reference_samples = u8::decode(buf)? as usize;
+                if num_direct_reference_samples > 16 {
+                    return Err(Error::TooLarge(REFS_4CC));
+                }
+                let mut direct_reference_samples =
+                    Vec::with_capacity(num_direct_reference_samples);
-                let num_direct_reference_samples: u8 = direct_reference_samples
-                    .len()
-                    .try_into()
-                    .map_err(|_| Error::TooLarge(REFS_4CC))?;
+                let len = direct_reference_samples.len();
+                if len > 16 {
+                    return Err(Error::TooLarge(REFS_4CC));
+                }
+                let num_direct_reference_samples: u8 = len
+                    .try_into()
+                    .map_err(|_| Error::TooLarge(REFS_4CC))?;

Also applies to: 165-169


📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1676d9a and 6b7d5fd.

📒 Files selected for processing (1)
  • src/moov/trak/mdia/minf/stbl/sgpd.rs
🧰 Additional context used
📓 Path-based instructions (3)
**/*.rs

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.rs: Code must be clippy-clean (cargo clippy --all-targets --all-features -- -D warnings)
Code must be formatted with rustfmt (cargo fmt) and pass formatting checks (cargo fmt -- --check)
Write unit tests inline within modules using #[cfg(test)]

Files:

  • src/moov/trak/mdia/minf/stbl/sgpd.rs
src/**/*.rs

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.rs: Each MP4 atom type must implement the Atom trait with KIND, decode_body(), and encode_body()
Use Header::read_from() to handle large atoms without loading entire contents into memory
All operations should return Result using custom error types defined in error.rs
Use the custom Buf/BufMut traits for contiguous byte slices instead of ad-hoc buffer handling

Files:

  • src/moov/trak/mdia/minf/stbl/sgpd.rs
src/moov/**

📄 CodeRabbit inference engine (CLAUDE.md)

Place movie metadata atom implementations under src/moov/ (and its hierarchy: trak/mdia/minf/stbl/stsd)

Files:

  • src/moov/trak/mdia/minf/stbl/sgpd.rs
🧠 Learnings (1)
📚 Learning: 2025-08-31T04:06:57.364Z
Learnt from: CR
Repo: kixelated/mp4-atom PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-31T04:06:57.364Z
Learning: Applies to src/any.rs : Register new atom types in the Any enum (src/any.rs) to enable flexible decoding

Applied to files:

  • src/moov/trak/mdia/minf/stbl/sgpd.rs
🔇 Additional comments (2)
src/moov/trak/mdia/minf/stbl/sgpd.rs (2)

132-138: Nice addition of REFS_4CC and the new enum variant.
This keeps the grouping type explicit and makes the new entry type easy to pattern-match.


240-278: Great round‑trip coverage for the HEIF C041 sample.
The decode + encode assertion makes the new “refs” path robust.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@bradh
Copy link
Collaborator Author

bradh commented Jan 16, 2026

The change adds support for the "refs" grouping type in the SGPD box handler. When the grouping_type equals "refs", a DirectReferenceSampleList structure is decoded consisting of a sample ID and a list of direct reference samples (capped at 16 entries). The corresponding encoder handles serialization of this structure. A new enum variant DirectReferenceSampleList(u32, Vec<u32>) was added to AnySampleGroupEntry. Test coverage validates both the new DirectReferenceSampleList decoding/encoding path and a complete SGPD structure example.

src/moov/trak/mdia/minf/stbl/sgpd.rs (1)> 143-150: Enforce (or remove) the implicit 16-entry limit; verify count field width.

Right now decode only caps capacity to 16 but still accepts larger counts, while encode only checks u8 conversion. If the spec caps direct references at 16, please reject >16 on both paths; otherwise consider removing the cap to avoid implying a limit. Also confirm the count field is u8 per ISO/IEC 23008-12:2025 §7.4.

Also applies to: 165-169

@coderabbitai can you explain the cap of 16 statement further?

@coderabbitai
Copy link

coderabbitai bot commented Jan 16, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

@bradh: Let me read the relevant code sections to explain this more clearly.


🏁 Script executed:

#!/bin/bash
# Read the sgpd.rs file to see the decode and encode implementations
cat src/moov/trak/mdia/minf/stbl/sgpd.rs

Length of output: 10185


@bradh: Certainly! Let me explain the "cap of 16" observation:

Looking at the decode path (lines 143-150):

let num_direct_reference_samples = u8::decode(buf)? as usize;  // Can be 0-255
let mut direct_reference_samples =
    Vec::with_capacity(std::cmp::min(num_direct_reference_samples, 16));  // Capacity capped at 16
for _ in 0..num_direct_reference_samples {  // But loop runs the full count
    direct_reference_samples.push(u32::decode(buf)?);
}

The code pre-allocates capacity for at most 16 elements (via std::cmp::min(..., 16)), but then the loop actually decodes all num_direct_reference_samples (which can be up to 255 since it's a u8). If there are, say, 100 direct reference samples in the file, the Vec will:

  1. Start with capacity for 16
  2. Decode all 100 items (causing automatic reallocation as needed)

So the "16" is only limiting memory pre-allocation, not the actual number of items decoded.

Looking at the encode path (lines 165-169):

let num_direct_reference_samples: u8 = direct_reference_samples
    .len()
    .try_into()
    .map_err(|_| Error::TooLarge(REFS_4CC))?;

This only checks that the count fits in a u8 (max 255), with no mention of 16.

The inconsistency is that the decode side implies 16 is a special limit (by capping capacity), while the encode side only enforces the u8 limit of 255. My comment was asking whether:

  1. The ISO/IEC 23008-12:2025 spec actually limits direct references to 16 → if so, both decode and encode should reject counts >16
  2. Or if the real limit is 255 (u8 max) → then remove the .min(16) to avoid confusion

Does the spec specify a maximum number of direct reference samples?

@bradh
Copy link
Collaborator Author

bradh commented Jan 16, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain
@bradh: Certainly! Let me explain the "cap of 16" observation:

Looking at the decode path (lines 143-150):

let num_direct_reference_samples = u8::decode(buf)? as usize;  // Can be 0-255
let mut direct_reference_samples =
    Vec::with_capacity(std::cmp::min(num_direct_reference_samples, 16));  // Capacity capped at 16
for _ in 0..num_direct_reference_samples {  // But loop runs the full count
    direct_reference_samples.push(u32::decode(buf)?);
}

The code pre-allocates capacity for at most 16 elements (via std::cmp::min(..., 16)), but then the loop actually decodes all num_direct_reference_samples (which can be up to 255 since it's a u8). If there are, say, 100 direct reference samples in the file, the Vec will:

  1. Start with capacity for 16
  2. Decode all 100 items (causing automatic reallocation as needed)

So the "16" is only limiting memory pre-allocation, not the actual number of items decoded.

Looking at the encode path (lines 165-169):

let num_direct_reference_samples: u8 = direct_reference_samples
    .len()
    .try_into()
    .map_err(|_| Error::TooLarge(REFS_4CC))?;

This only checks that the count fits in a u8 (max 255), with no mention of 16.

The inconsistency is that the decode side implies 16 is a special limit (by capping capacity), while the encode side only enforces the u8 limit of 255. My comment was asking whether:

  1. The ISO/IEC 23008-12:2025 spec actually limits direct references to 16 → if so, both decode and encode should reject counts >16
  2. Or if the real limit is 255 (u8 max) → then remove the .min(16) to avoid confusion

Does the spec specify a maximum number of direct reference samples?

@coderabbitai The limit is 255, the 16 is just to avoid overallocation. Note this pattern in other boxes.

@coderabbitai
Copy link

coderabbitai bot commented Jan 16, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

@bradh: Thank you for the clarification! Let me verify that this pattern is indeed used consistently across the codebase.


🏁 Script executed:

#!/bin/bash
# Search for similar patterns of Vec::with_capacity with a capped allocation
rg -n -A2 -B2 'Vec::with_capacity\(std::cmp::min\(' --type rust

Length of output: 1841


@bradh: Perfect! I've confirmed this pattern is indeed used consistently across the codebase. I found several instances:

  • src/sidx.rs - caps at 128 for references
  • src/mfra/tfra.rs - caps at 128 for entries with an explicit comment: "Don't trust the entry count, just start with a small-ish reservation"
  • src/moov/trak/mdia/minf/stbl/stsd/ftab.rs - caps at 8 for font entries

This is a defensive memory allocation pattern to avoid excessive pre-allocation when parsing untrusted input. The specific cap value (8, 16, or 128) appears to be chosen based on the expected typical size for that particular structure. Thank you for the clarification!


✏️ Learnings added
Learnt from: bradh
URL: https://github.com/kixelated/mp4-atom/pull/106

Timestamp: 2026-01-16T11:02:12.622Z
Learning: In the mp4-atom codebase, when decoding arrays from untrusted input, use `Vec::with_capacity(std::cmp::min(count, N))` to cap initial memory allocation to a reasonable value (e.g., 8, 16, or 128 depending on typical expected size) while still decoding all elements. This prevents excessive memory allocation from malformed files.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

@bradh bradh merged commit 96533ac into kixelated:main Jan 16, 2026
1 check passed
@bradh bradh deleted the sgpd_over_decode_2026-01-16 branch January 16, 2026 11:10
@github-actions github-actions bot mentioned this pull request Jan 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants