sgpd: add support for the refs sample groups #106

bradh · 2026-01-16T03:56:07Z

This is from ISO/IEC 23008-12:2025 Section 7.4. Its used in HEIF image sequences.

This fixes the parsing of the several of the old HEIF samples in the MPEG File Format Conformance suite. Without this change, we get OverDecode errors for sgpd.

This is from ISO/IEC 23008-12:2025 Section 7.4. Its used in HEIF image sequences. This fixes the parsing of the several of the old HEIF samples in the MPEG File Format Conformance suite. Without this change, we get OverDecode errors for `sgpd`.

bradh · 2026-01-16T08:06:31Z

@coderabbitai review

coderabbitai · 2026-01-16T08:06:39Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2026-01-16T08:10:10Z

Walkthrough

The change adds support for the "refs" grouping type in the SGPD box handler. When the grouping_type equals "refs", a DirectReferenceSampleList structure is decoded consisting of a sample ID and a list of direct reference samples (capped at 16 entries). The corresponding encoder handles serialization of this structure. A new enum variant DirectReferenceSampleList(u32, Vec<u32>) was added to AnySampleGroupEntry. Test coverage validates both the new DirectReferenceSampleList decoding/encoding path and a complete SGPD structure example.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: adding support for refs sample groups in the sgpd atom, which matches the primary purpose of the changeset.
Description check	✅ Passed	The description is directly related to the changeset, explaining the ISO standard reference, the HEIF use case, and the specific parsing issue being resolved.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧹 Recent nitpick comments

src/moov/trak/mdia/minf/stbl/sgpd.rs (1)

143-150: Enforce (or remove) the implicit 16-entry limit; verify count field width.
Right now decode only caps capacity to 16 but still accepts larger counts, while encode only checks u8 conversion. If the spec caps direct references at 16, please reject >16 on both paths; otherwise consider removing the cap to avoid implying a limit. Also confirm the count field is u8 per ISO/IEC 23008-12:2025 §7.4.

♻️ Proposed fix (if max = 16)

-                let num_direct_reference_samples = u8::decode(buf)? as usize;
-                let mut direct_reference_samples =
-                    Vec::with_capacity(std::cmp::min(num_direct_reference_samples, 16));
+                let num_direct_reference_samples = u8::decode(buf)? as usize;
+                if num_direct_reference_samples > 16 {
+                    return Err(Error::TooLarge(REFS_4CC));
+                }
+                let mut direct_reference_samples =
+                    Vec::with_capacity(num_direct_reference_samples);

-                let num_direct_reference_samples: u8 = direct_reference_samples
-                    .len()
-                    .try_into()
-                    .map_err(|_| Error::TooLarge(REFS_4CC))?;
+                let len = direct_reference_samples.len();
+                if len > 16 {
+                    return Err(Error::TooLarge(REFS_4CC));
+                }
+                let num_direct_reference_samples: u8 = len
+                    .try_into()
+                    .map_err(|_| Error::TooLarge(REFS_4CC))?;

Also applies to: 165-169

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1676d9a and 6b7d5fd.

📒 Files selected for processing (1)

src/moov/trak/mdia/minf/stbl/sgpd.rs

🧰 Additional context used

📓 Path-based instructions (3)

**/*.rs

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.rs: Code must be clippy-clean (cargo clippy --all-targets --all-features -- -D warnings)
Code must be formatted with rustfmt (cargo fmt) and pass formatting checks (cargo fmt -- --check)
Write unit tests inline within modules using #[cfg(test)]

Files:

src/moov/trak/mdia/minf/stbl/sgpd.rs

src/**/*.rs

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.rs: Each MP4 atom type must implement the Atom trait with KIND, decode_body(), and encode_body()
Use Header::read_from() to handle large atoms without loading entire contents into memory
All operations should return Result using custom error types defined in error.rs
Use the custom Buf/BufMut traits for contiguous byte slices instead of ad-hoc buffer handling

Files:

src/moov/trak/mdia/minf/stbl/sgpd.rs

src/moov/**

📄 CodeRabbit inference engine (CLAUDE.md)

Place movie metadata atom implementations under src/moov/ (and its hierarchy: trak/mdia/minf/stbl/stsd)

Files:

src/moov/trak/mdia/minf/stbl/sgpd.rs

🧠 Learnings (1)

📚 Learning: 2025-08-31T04:06:57.364Z

Learnt from: CR
Repo: kixelated/mp4-atom PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-31T04:06:57.364Z
Learning: Applies to src/any.rs : Register new atom types in the Any enum (src/any.rs) to enable flexible decoding

Applied to files:

src/moov/trak/mdia/minf/stbl/sgpd.rs

🔇 Additional comments (2)

src/moov/trak/mdia/minf/stbl/sgpd.rs (2)

132-138: Nice addition of REFS_4CC and the new enum variant.
This keeps the grouping type explicit and makes the new entry type easy to pattern-match.

240-278: Great round‑trip coverage for the HEIF C041 sample.
The decode + encode assertion makes the new “refs” path robust.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

bradh · 2026-01-16T10:48:06Z

The change adds support for the "refs" grouping type in the SGPD box handler. When the grouping_type equals "refs", a DirectReferenceSampleList structure is decoded consisting of a sample ID and a list of direct reference samples (capped at 16 entries). The corresponding encoder handles serialization of this structure. A new enum variant DirectReferenceSampleList(u32, Vec<u32>) was added to AnySampleGroupEntry. Test coverage validates both the new DirectReferenceSampleList decoding/encoding path and a complete SGPD structure example.

src/moov/trak/mdia/minf/stbl/sgpd.rs (1)> 143-150: Enforce (or remove) the implicit 16-entry limit; verify count field width.

Right now decode only caps capacity to 16 but still accepts larger counts, while encode only checks u8 conversion. If the spec caps direct references at 16, please reject >16 on both paths; otherwise consider removing the cap to avoid implying a limit. Also confirm the count field is u8 per ISO/IEC 23008-12:2025 §7.4.

Also applies to: 165-169

@coderabbitai can you explain the cap of 16 statement further?

coderabbitai · 2026-01-16T10:48:56Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

@bradh: Let me read the relevant code sections to explain this more clearly.

🏁 Script executed:

#!/bin/bash
# Read the sgpd.rs file to see the decode and encode implementations
cat src/moov/trak/mdia/minf/stbl/sgpd.rs

Length of output: 10185

@bradh: Certainly! Let me explain the "cap of 16" observation:

Looking at the decode path (lines 143-150):

let num_direct_reference_samples = u8::decode(buf)? as usize;  // Can be 0-255
let mut direct_reference_samples =
    Vec::with_capacity(std::cmp::min(num_direct_reference_samples, 16));  // Capacity capped at 16
for _ in 0..num_direct_reference_samples {  // But loop runs the full count
    direct_reference_samples.push(u32::decode(buf)?);
}

The code pre-allocates capacity for at most 16 elements (via std::cmp::min(..., 16)), but then the loop actually decodes all num_direct_reference_samples (which can be up to 255 since it's a u8). If there are, say, 100 direct reference samples in the file, the Vec will:

Start with capacity for 16
Decode all 100 items (causing automatic reallocation as needed)

So the "16" is only limiting memory pre-allocation, not the actual number of items decoded.

Looking at the encode path (lines 165-169):

let num_direct_reference_samples: u8 = direct_reference_samples
    .len()
    .try_into()
    .map_err(|_| Error::TooLarge(REFS_4CC))?;

This only checks that the count fits in a u8 (max 255), with no mention of 16.

The inconsistency is that the decode side implies 16 is a special limit (by capping capacity), while the encode side only enforces the u8 limit of 255. My comment was asking whether:

The ISO/IEC 23008-12:2025 spec actually limits direct references to 16 → if so, both decode and encode should reject counts >16
Or if the real limit is 255 (u8 max) → then remove the .min(16) to avoid confusion

Does the spec specify a maximum number of direct reference samples?

bradh · 2026-01-16T10:54:07Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain
@bradh: Certainly! Let me explain the "cap of 16" observation:

Looking at the decode path (lines 143-150):
let num_direct_reference_samples = u8::decode(buf)? as usize;  // Can be 0-255
let mut direct_reference_samples =
    Vec::with_capacity(std::cmp::min(num_direct_reference_samples, 16));  // Capacity capped at 16
for _ in 0..num_direct_reference_samples {  // But loop runs the full count
    direct_reference_samples.push(u32::decode(buf)?);
}
The code pre-allocates capacity for at most 16 elements (via std::cmp::min(..., 16)), but then the loop actually decodes all num_direct_reference_samples (which can be up to 255 since it's a u8). If there are, say, 100 direct reference samples in the file, the Vec will:

Start with capacity for 16

Decode all 100 items (causing automatic reallocation as needed)

So the "16" is only limiting memory pre-allocation, not the actual number of items decoded.

Looking at the encode path (lines 165-169):
let num_direct_reference_samples: u8 = direct_reference_samples
    .len()
    .try_into()
    .map_err(|_| Error::TooLarge(REFS_4CC))?;
This only checks that the count fits in a u8 (max 255), with no mention of 16.

The inconsistency is that the decode side implies 16 is a special limit (by capping capacity), while the encode side only enforces the u8 limit of 255. My comment was asking whether:

The ISO/IEC 23008-12:2025 spec actually limits direct references to 16 → if so, both decode and encode should reject counts >16

Or if the real limit is 255 (u8 max) → then remove the .min(16) to avoid confusion

Does the spec specify a maximum number of direct reference samples?

@coderabbitai The limit is 255, the 16 is just to avoid overallocation. Note this pattern in other boxes.

coderabbitai · 2026-01-16T11:02:14Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

@bradh: Thank you for the clarification! Let me verify that this pattern is indeed used consistently across the codebase.

🏁 Script executed:

#!/bin/bash
# Search for similar patterns of Vec::with_capacity with a capped allocation
rg -n -A2 -B2 'Vec::with_capacity\(std::cmp::min\(' --type rust

Length of output: 1841

@bradh: Perfect! I've confirmed this pattern is indeed used consistently across the codebase. I found several instances:

src/sidx.rs - caps at 128 for references
src/mfra/tfra.rs - caps at 128 for entries with an explicit comment: "Don't trust the entry count, just start with a small-ish reservation"
src/moov/trak/mdia/minf/stbl/stsd/ftab.rs - caps at 8 for font entries

This is a defensive memory allocation pattern to avoid excessive pre-allocation when parsing untrusted input. The specific cap value (8, 16, or 128) appears to be chosen based on the expected typical size for that particular structure. Thank you for the clarification!

✏️ Learnings added

Learnt from: bradh
URL: https://github.com/kixelated/mp4-atom/pull/106

Timestamp: 2026-01-16T11:02:12.622Z
Learning: In the mp4-atom codebase, when decoding arrays from untrusted input, use `Vec::with_capacity(std::cmp::min(count, N))` to cap initial memory allocation to a reasonable value (e.g., 8, 16, or 128 depending on typical expected size) while still decoding all elements. This prevents excessive memory allocation from malformed files.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

kixelated approved these changes Jan 16, 2026

View reviewed changes

bradh merged commit 96533ac into kixelated:main Jan 16, 2026
1 check passed

bradh deleted the sgpd_over_decode_2026-01-16 branch January 16, 2026 11:10

github-actions bot mentioned this pull request Jan 16, 2026

chore: release v0.10.1 #108

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sgpd: add support for the refs sample groups #106

sgpd: add support for the refs sample groups #106

Uh oh!

bradh commented Jan 16, 2026

Uh oh!

bradh commented Jan 16, 2026

Uh oh!

coderabbitai bot commented Jan 16, 2026

Uh oh!

coderabbitai bot commented Jan 16, 2026

Uh oh!

bradh commented Jan 16, 2026

Uh oh!

coderabbitai bot commented Jan 16, 2026

Uh oh!

bradh commented Jan 16, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Jan 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sgpd: add support for the refs sample groups #106

sgpd: add support for the refs sample groups #106

Uh oh!

Conversation

bradh commented Jan 16, 2026

Uh oh!

bradh commented Jan 16, 2026

Uh oh!

coderabbitai bot commented Jan 16, 2026

Uh oh!

coderabbitai bot commented Jan 16, 2026

Walkthrough

Uh oh!

bradh commented Jan 16, 2026

Uh oh!

coderabbitai bot commented Jan 16, 2026

Uh oh!

bradh commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Jan 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bradh commented Jan 16, 2026 •

edited

Loading