Relate NACK to RTT by algesten · Pull Request #754 · algesten/str0m

algesten · 2025-11-21T21:17:39Z

Close #744

src/streams/mod.rs

src/streams/register_nack.rs

tests/nack.rs

lherman-cs · 2025-11-25T20:15:09Z

I've been testing this new patch locally with a simulated network that is close to the lte network here, #744.

        sudo tc qdisc add dev lo root netem \
                rate 3mbit \
                delay 45ms 35ms distribution pareto \
                loss 0.15%; \

The test setup:

WHIP Publisher -> SFU -> WHEP Subscriber

WHIP Publisher publishes 1 video without simulcast, and uses a pre-recorded video. https://codepen.io/lherman-cs/pen/xbVYqrJ
WHEP Subscriber: https://codepen.io/lherman-cs/pen/pvgVZar
SFU: RTP Mode as a dumb pipe, no buffering.

Result:

fwiw, the result is indefinitive, maybe due to too many non-deterministic parts in the tests. I can't really say one is better than the other. But, I noticed that the SFU now reports less NACKs to the sender.

The main metrics I looked at were: Sender's NACK & PacketLost, and Receiver's PacketLost.

Before:

Sender's NACK: ~38 NACKs
Sender's PacketLost: ~2
Receiver's PacketLost: ~10

After:

Sender's NACK: ~20NACKs
Sender's PacketLost: 0
Receiver's PacketLost: ~11

Note: I tried to stop the stream at a similar duration. But, this could've been done better with a shorter video..

Before

sender:

receiver:

After

sender:

receiver:

src/streams/receive.rs

lherman-cs · 2025-12-02T18:16:09Z

@algesten just ran another run with real network this time, and statistic interval set to 200ms. I noticed that the packet lost is higher with this PR as observed in remote-inbound-rtp stats. But, it looks like it recovers the packets in the next receiver reports. The original NACK reporter seems to be more responsive as there are less packet loss reported in the receiver reports.

The CPU usage seems higher? I sometimes observed CPU spikes in some occasions. Looking at the flamegraph, "do_poll_output" is a bit more expensive, and it seems to create more timer wheel churns.

My rtc loop looks more or less like this:

let rtc_timer = tokio::time::sleep_until(current_deadline);
tokio::pin!(rtc_timer);

loop {
    let Some(new_deadline) = self.core.poll_rtc() else {
        break;
    };
    if new_deadline != current_deadline {
        rtc_timer.as_mut().reset(new_deadline);
        current_deadline = new_deadline;
    }
    
    tokio::select! {...}
}

original nack reporter

this PR nack reporter

algesten · 2025-12-03T08:06:26Z

@lherman-cs thanks for testing!

There's a bug in this PR that would have caused a constant incorrect NACK reporting. Would you mind testing again?

But, it looks like it recovers the packets in the next receiver reports.

I might be looking at the wrong thing, but to me these graphs look like an improvement. Instead of hovering around a constant 4 packets lost, we are going to 0-1?

Before

After

The CPU usage seems higher? I sometimes observed CPU spikes in some occasions. Looking at the flamegraph, "do_poll_output" is a bit more expensive, and it seems to create more timer wheel churns.

Yeah, it's doing a lot more now, so CPU would be higher. If we can get the functionality correct, we can look at how to optimize it.

algesten · 2025-12-03T08:16:24Z

@lherman-cs i also found a bug that meant nack reports were sent at times they shouldn't bee (too soon).

I also fixed the RTT being read from TWCC (like @k0nserv mentioned).

lherman-cs · 2025-12-04T15:45:25Z

@lherman-cs thanks for testing!

There's a bug in this PR that would have caused a constant incorrect NACK reporting. Would you mind testing again?

But, it looks like it recovers the packets in the next receiver reports.

I might be looking at the wrong thing, but to me these graphs look like an improvement. Instead of hovering around a constant 4 packets lost, we are going to 0-1?

Before

After

The CPU usage seems higher? I sometimes observed CPU spikes in some occasions. Looking at the flamegraph, "do_poll_output" is a bit more expensive, and it seems to create more timer wheel churns.

Yeah, it's doing a lot more now, so CPU would be higher. If we can get the functionality correct, we can look at how to optimize it.

Ah, this is my bad. I forgot to mention that the stats from the original probably got messed up at ~1:03:30PM, since I started a profiler at this moment.

From my perspective, there are 2 ideal behaviors that would lead to improved end-to-end UX (from publisher to subscriber):

Only generate NACK reports when necessary: This will reduce network pressure for the publisher, allowing more bandwidth used for the actual streams, aka higher goodput. This can be observed through a proxy metric from candidate-pair's availableOutgoingBandwidth.
No reported packet lost in remote-inbound-rtp/rtcp receiver report from sfu: I typically treat sustained packet lost and temporary packet lost as equal. Either case, the packet recovery is probably too slow if the packet lost gets reported since each receiver report takes ~1 second interval for video, https://github.com/PulseBeamDev/str0m/blob/9b28349eb828410a40b038888efc7b40e00fcd10//src/streams/mod.rs#L35-L36.

I considered the original to perform better because there are fewer packet lost reported overall, which implies a higher success rate in recovering the stream.

lherman-cs · 2025-12-04T15:53:35Z

@algesten I still see similar packet lost reports from the latest change here unfortunately. The RTT graph is very noisy. This makes me think that this is related to the rtt calculation. Maybe we need a windowed histogram, and take P50 or even P30?

original

rtt

The original per-stream nack_at scheduling was causing timing changes that affected BWE estimation under cellular conditions. This commit simplifies the approach: - Keep RTT-aware NACK generation through nack_report() which uses RTT to decide when each packet needs a NACK - Use session-level NACK_MIN_INTERVAL (33ms) for consistent polling - Add nack_rtt() that caps measured RTT at 50ms to ensure quick recovery under high-RTT networks - Keep per-stream nack_at for efficient poll scheduling (avoids iterating active window when no NACKs are due) - Only update nack_at after generating NACKs, not on every packet reception (the latter caused BWE timing issues) All NACK and BWE tests pass with these changes. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add EWMA-based RTT smoothing (RFC 6298, alpha=1/8) to the TWCC send register. This provides a more stable RTT estimate that can be used by consumers like NACK timing without being affected by RTT spikes. - Add MovingAverage field to TwccSendRegister - Update smoothed RTT during apply_report() using max RTT from batch - Add smoothed_rtt() method to retrieve the smoothed value - Keep existing rtt() method for raw values Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Instead of maintaining its own RTT smoothing in DelayController, the BWE now receives the smoothed RTT from the TwccSendRegister. This centralizes RTT measurement and smoothing in one place. Changes: - Add TwccRegisterUpdate wrapper type to hold iterator and smoothed RTT - Update apply_report to return TwccRegisterUpdate - Update Bwe::update and SendSideBandwidthEstimator::update to accept smoothed_rtt parameter - Simplify DelayController by removing internal RTT tracking (smoothed_rtt, max_rtt_history fields and update_rtt/get_smoothed_rtt methods) - Store last_smoothed_rtt in DelayController for use in handle_timeout - Update session to pass smoothed_rtt from TWCC to BWE Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Use the smoothed RTT from TWCC for NACK timing instead of raw RTT. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The premature eviction check loop was missing an early break condition, causing it to iterate billions of times when there's a large gap between active.start and the new start position (e.g., seq jump from 0 to near u64::MAX). Added the same early break pattern that already existed in the subsequent reset loop: exit after checking packets.len() entries since the buffer is circular. Also fixed the off-by-one in the second loop (>= instead of >). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

algesten mentioned this pull request Nov 21, 2025

NACK reporter is ineffective in wireless network environments #744

Open

Razzwan reviewed Nov 24, 2025

View reviewed changes

src/streams/mod.rs Show resolved Hide resolved

src/streams/register_nack.rs Show resolved Hide resolved

src/streams/register_nack.rs Show resolved Hide resolved

tests/nack.rs Show resolved Hide resolved

Razzwan approved these changes Nov 24, 2025

View reviewed changes

algesten force-pushed the fix/rtt-nack branch from 7f63cb8 to 3ef5b3d Compare November 24, 2025 20:50

algesten mentioned this pull request Nov 24, 2025

RTT as Duration #756

Closed

lherman-cs reviewed Nov 25, 2025

View reviewed changes

src/streams/receive.rs Outdated Show resolved Hide resolved

algesten force-pushed the main branch 4 times, most recently from dd8ff6c to 5201a30 Compare December 17, 2025 10:15

algesten force-pushed the main branch from 6507f59 to 15f91ab Compare January 12, 2026 21:37

algesten added 4 commits January 23, 2026 16:15

Relate NACK to RTT

192dfda

Fix incorrect offset 17 blp

afa5930

Use twcc_rtt instead of stats

a4c56c4

Dont produce nack too early

bd7e93a

algesten force-pushed the fix/rtt-nack branch 3 times, most recently from b7a46ae to ee841be Compare January 23, 2026 18:26

algesten force-pushed the fix/rtt-nack branch from ee841be to b606263 Compare January 23, 2026 18:34

Vagrant User and others added 2 commits January 23, 2026 19:12

algesten force-pushed the fix/rtt-nack branch 2 times, most recently from 984e5e5 to d4e146b Compare January 23, 2026 20:02

Use smoothed RTT for NACK timing

8e53c8d

Use the smoothed RTT from TWCC for NACK timing instead of raw RTT. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

algesten force-pushed the fix/rtt-nack branch from d4e146b to 8e53c8d Compare January 23, 2026 20:19

Grow buffers

9da4d44

algesten force-pushed the fix/rtt-nack branch from 988a1fd to 9da4d44 Compare January 23, 2026 20:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Relate NACK to RTT#754

Relate NACK to RTT#754
algesten wants to merge 10 commits intomainfrom
fix/rtt-nack

algesten commented Nov 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lherman-cs commented Nov 25, 2025

Uh oh!

Uh oh!

lherman-cs commented Dec 2, 2025

Uh oh!

algesten commented Dec 3, 2025

Uh oh!

algesten commented Dec 3, 2025

Uh oh!

lherman-cs commented Dec 4, 2025 •

edited

Loading

Uh oh!

lherman-cs commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

algesten commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lherman-cs commented Nov 25, 2025

Uh oh!

Uh oh!

lherman-cs commented Dec 2, 2025

Uh oh!

algesten commented Dec 3, 2025

Uh oh!

algesten commented Dec 3, 2025

Uh oh!

lherman-cs commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lherman-cs commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

algesten commented Nov 21, 2025 •

edited

Loading

lherman-cs commented Dec 4, 2025 •

edited

Loading