Skip to content

Relate NACK to RTT#754

Open
algesten wants to merge 10 commits intomainfrom
fix/rtt-nack
Open

Relate NACK to RTT#754
algesten wants to merge 10 commits intomainfrom
fix/rtt-nack

Conversation

@algesten
Copy link
Owner

@algesten algesten commented Nov 21, 2025

Close #744

@lherman-cs
Copy link
Contributor

I've been testing this new patch locally with a simulated network that is close to the lte network here, #744.

        sudo tc qdisc add dev lo root netem \
                rate 3mbit \
                delay 45ms 35ms distribution pareto \
                loss 0.15%; \

The test setup:

WHIP Publisher -> SFU -> WHEP Subscriber

Result:

fwiw, the result is indefinitive, maybe due to too many non-deterministic parts in the tests. I can't really say one is better than the other. But, I noticed that the SFU now reports less NACKs to the sender.

The main metrics I looked at were: Sender's NACK & PacketLost, and Receiver's PacketLost.

Before:

  • Sender's NACK: ~38 NACKs
  • Sender's PacketLost: ~2
  • Receiver's PacketLost: ~10

After:

  • Sender's NACK: ~20NACKs
  • Sender's PacketLost: 0
  • Receiver's PacketLost: ~11

Note: I tried to stop the stream at a similar duration. But, this could've been done better with a shorter video..

Before

sender:

before-sender

receiver:

before-receiver

After

sender:

after2-sender

receiver:

after2-receiver

@lherman-cs
Copy link
Contributor

@algesten just ran another run with real network this time, and statistic interval set to 200ms. I noticed that the packet lost is higher with this PR as observed in remote-inbound-rtp stats. But, it looks like it recovers the packets in the next receiver reports. The original NACK reporter seems to be more responsive as there are less packet loss reported in the receiver reports.

The CPU usage seems higher? I sometimes observed CPU spikes in some occasions. Looking at the flamegraph, "do_poll_output" is a bit more expensive, and it seems to create more timer wheel churns.

My rtc loop looks more or less like this:

let rtc_timer = tokio::time::sleep_until(current_deadline);
tokio::pin!(rtc_timer);

loop {
    let Some(new_deadline) = self.core.poll_rtc() else {
        break;
    };
    if new_deadline != current_deadline {
        rtc_timer.as_mut().reset(new_deadline);
        current_deadline = new_deadline;
    }
    
    tokio::select! {...}
}

original nack reporter
nack-original

this PR nack reporter
nack-rtt

@algesten
Copy link
Owner Author

algesten commented Dec 3, 2025

@lherman-cs thanks for testing!

There's a bug in this PR that would have caused a constant incorrect NACK reporting. Would you mind testing again?

But, it looks like it recovers the packets in the next receiver reports.

I might be looking at the wrong thing, but to me these graphs look like an improvement. Instead of hovering around a constant 4 packets lost, we are going to 0-1?

Before
image

After
image

The CPU usage seems higher? I sometimes observed CPU spikes in some occasions. Looking at the flamegraph, "do_poll_output" is a bit more expensive, and it seems to create more timer wheel churns.

Yeah, it's doing a lot more now, so CPU would be higher. If we can get the functionality correct, we can look at how to optimize it.

@algesten
Copy link
Owner Author

algesten commented Dec 3, 2025

@lherman-cs i also found a bug that meant nack reports were sent at times they shouldn't bee (too soon).

I also fixed the RTT being read from TWCC (like @k0nserv mentioned).

@lherman-cs
Copy link
Contributor

lherman-cs commented Dec 4, 2025

@lherman-cs thanks for testing!

There's a bug in this PR that would have caused a constant incorrect NACK reporting. Would you mind testing again?

But, it looks like it recovers the packets in the next receiver reports.

I might be looking at the wrong thing, but to me these graphs look like an improvement. Instead of hovering around a constant 4 packets lost, we are going to 0-1?

Before image

After image

The CPU usage seems higher? I sometimes observed CPU spikes in some occasions. Looking at the flamegraph, "do_poll_output" is a bit more expensive, and it seems to create more timer wheel churns.

Yeah, it's doing a lot more now, so CPU would be higher. If we can get the functionality correct, we can look at how to optimize it.

Ah, this is my bad. I forgot to mention that the stats from the original probably got messed up at ~1:03:30PM, since I started a profiler at this moment.

From my perspective, there are 2 ideal behaviors that would lead to improved end-to-end UX (from publisher to subscriber):

  1. Only generate NACK reports when necessary: This will reduce network pressure for the publisher, allowing more bandwidth used for the actual streams, aka higher goodput. This can be observed through a proxy metric from candidate-pair's availableOutgoingBandwidth.
  2. No reported packet lost in remote-inbound-rtp/rtcp receiver report from sfu: I typically treat sustained packet lost and temporary packet lost as equal. Either case, the packet recovery is probably too slow if the packet lost gets reported since each receiver report takes ~1 second interval for video, https://github.com/PulseBeamDev/str0m/blob/9b28349eb828410a40b038888efc7b40e00fcd10//src/streams/mod.rs#L35-L36.

I considered the original to perform better because there are fewer packet lost reported overall, which implies a higher success rate in recovering the stream.

@lherman-cs
Copy link
Contributor

@algesten I still see similar packet lost reports from the latest change here unfortunately. The RTT graph is very noisy. This makes me think that this is related to the rtt calculation. Maybe we need a windowed histogram, and take P50 or even P30?

original

nack3-original

rtt

nack3-rtt

@algesten algesten force-pushed the fix/rtt-nack branch 3 times, most recently from b7a46ae to ee841be Compare January 23, 2026 18:26
The original per-stream nack_at scheduling was causing timing changes
that affected BWE estimation under cellular conditions. This commit
simplifies the approach:

- Keep RTT-aware NACK generation through nack_report() which uses RTT
  to decide when each packet needs a NACK
- Use session-level NACK_MIN_INTERVAL (33ms) for consistent polling
- Add nack_rtt() that caps measured RTT at 50ms to ensure quick
  recovery under high-RTT networks
- Keep per-stream nack_at for efficient poll scheduling (avoids
  iterating active window when no NACKs are due)
- Only update nack_at after generating NACKs, not on every packet
  reception (the latter caused BWE timing issues)

All NACK and BWE tests pass with these changes.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Vagrant User and others added 2 commits January 23, 2026 19:12
Add EWMA-based RTT smoothing (RFC 6298, alpha=1/8) to the TWCC send
register. This provides a more stable RTT estimate that can be used
by consumers like NACK timing without being affected by RTT spikes.

- Add MovingAverage field to TwccSendRegister
- Update smoothed RTT during apply_report() using max RTT from batch
- Add smoothed_rtt() method to retrieve the smoothed value
- Keep existing rtt() method for raw values

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Instead of maintaining its own RTT smoothing in DelayController, the BWE
now receives the smoothed RTT from the TwccSendRegister. This centralizes
RTT measurement and smoothing in one place.

Changes:
- Add TwccRegisterUpdate wrapper type to hold iterator and smoothed RTT
- Update apply_report to return TwccRegisterUpdate
- Update Bwe::update and SendSideBandwidthEstimator::update to accept
  smoothed_rtt parameter
- Simplify DelayController by removing internal RTT tracking (smoothed_rtt,
  max_rtt_history fields and update_rtt/get_smoothed_rtt methods)
- Store last_smoothed_rtt in DelayController for use in handle_timeout
- Update session to pass smoothed_rtt from TWCC to BWE

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@algesten algesten force-pushed the fix/rtt-nack branch 2 times, most recently from 984e5e5 to d4e146b Compare January 23, 2026 20:02
Use the smoothed RTT from TWCC for NACK timing instead of raw RTT.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The premature eviction check loop was missing an early break condition,
causing it to iterate billions of times when there's a large gap between
active.start and the new start position (e.g., seq jump from 0 to near u64::MAX).

Added the same early break pattern that already existed in the subsequent
reset loop: exit after checking packets.len() entries since the buffer is
circular.

Also fixed the off-by-one in the second loop (>= instead of >).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NACK reporter is ineffective in wireless network environments

5 participants