Skip to content

feat(packet): add keyframe detection for VP8, VP9 and AV1#869

Open
farit2000 wants to merge 1 commit intoalgesten:mainfrom
farit2000:feat/keyframe-detection
Open

feat(packet): add keyframe detection for VP8, VP9 and AV1#869
farit2000 wants to merge 1 commit intoalgesten:mainfrom
farit2000:feat/keyframe-detection

Conversation

@farit2000
Copy link
Contributor

We're using str0m as an SFU and need to detect keyframes from forwarded RTP packets without fully depacketizing them. This is needed for things like PLI request handling, layer switching decisions, and knowing when a new participant can start decoding.

Added three public functions:

  • detect_vp8_keyframe — parses the VP8 RTP payload descriptor per RFC 7741 (handles all the X/I/L/T/K extension combinations and 7/16-bit PictureID), then checks the P bit in the VP8 payload header
  • detect_vp9_keyframe — checks the P bit in byte 0 of the VP9 RTP descriptor, works with both flexible and non-flexible mode
  • detect_av1_keyframe — checks the N bit (new coded video sequence) in the AV1 RTP aggregation header

All three are exported from packet and re-exported from format alongside the existing CodecExtra types.

Tests cover keyframes, interframes, various extension combinations, truncated payloads, and edge cases.

Add public functions to detect keyframes from raw RTP payloads
without fully depacketizing:

- detect_vp8_keyframe: parses VP8 RTP descriptor (RFC 7741),
  checks P bit in payload header
- detect_vp9_keyframe: checks P bit in VP9 RTP descriptor
- detect_av1_keyframe: checks N bit in AV1 aggregation header

Useful for SFU-style forwarding where you need to identify
keyframes for PLI handling or layer switching without decoding.
@algesten
Copy link
Owner

@farit2000 Is this exposing functions because you don't get enough data in existing output from str0m? Or is it because you want to reuse functions in other contexts?

@xnorpx
Copy link
Collaborator

xnorpx commented Feb 15, 2026

Maybe these could just be unversioned so users don't have to implement their own versions of this.

Also add H264 and H265 :)

But this is useful indeed.

@farit2000
Copy link
Contributor Author

@farit2000 Is this exposing functions because you don't get enough data in existing output from str0m? Or is it because you want to reuse functions in other contexts?

In rtp_mode we forward raw RTP packets without depacketizing, so we don't get CodecExtra. But we still need to know when a keyframe arrives — for PLI handling, to know when a new subscriber can start decoding, and for layer switching decisions. These functions let us check that from the raw payload without running the full depacketizer.

@farit2000
Copy link
Contributor Author

Maybe these could just be unversioned so users don't have to implement their own versions of this.

Also add H264 and H265 :)

But this is useful indeed.

Good idea, I can add H264 and H265 too. And yeah marking them unversioned makes sense since the detection logic is straightforward and unlikely to change.

@algesten
Copy link
Owner

@farit2000 It feels like we're doubling up on functionality we already have. We already detect keyframes in the media level API. Do you see any way we could avoid having separate keyframe detector logic for RTP level vs media level?

@farit2000
Copy link
Contributor Author

@farit2000 It feels like we're doubling up on functionality we already have. We already detect keyframes in the media level API. Do you see any way we could avoid having separate keyframe detector logic for RTP level vs media level?

The existing keyframe detection in the depacketizers works on accumulated frame data after reassembly. In rtp_mode there's no depacketization - we get individual RTP packets and forward them as-is. These functions work on single RTP packets (checking the P bit in VP8/VP9 descriptor, N bit in AV1 aggregation header), which is fundamentally different from the bitstream-level detection in the depacketizers. We could potentially refactor the depacketizers to call these functions internally, but they serve different layers - RTP header inspection vs bitstream parsing.

@algesten
Copy link
Owner

Fair enough. If we can have detectors for all the supported codecs, then let's merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants