Remove drain-on-cancel from TCP receive path#55
Conversation
The 200ms drain_after_cancel polluted time-boxed throughput measurements by continuing to eat peer retransmits after cancel. For a bandwidth tool measuring a fixed duration, RST-on-close is the correct semantic: abort immediately, don't skew stats with tail retransmits. Natural end-of-test still uses shutdown()/FIN (unchanged). Cancel now exits in milliseconds instead of waiting for the drain window, fixing the 'Timed out waiting 2s for N data streams to stop' warning matttbe reported with -P 4 --mptcp + rate limiting. Tested locally: 4-stream cancel at 3s now exits ~6ms after SIGINT with full accurate per-stream summary.
|
Hi @matttbe — I've pushed a fix here that removes the 200ms drain-on-cancel from the TCP receive path, which should address both the inaccurate time-boxed stats and the "Timed out waiting 2s for N data streams to stop" warning you saw. Local testing shows cancel now exits in ~6ms (was up to 2s) with full per-stream stats preserved. But before merging, would you mind giving it a try against your MPTCP test setup? Specifically interested in:
Build instructions: No rush — I'll leave the PR open until you've had a chance to take a look. |
|
Hi @lance0, Thank you for having already implemented this! I just tried with my old test script, and I still have the issue. It is easy for me to reproduce with a timeout of 1sec. Did I miss something? I build commit 44b812e after a |
|
I just took a packet trace, I don't see any RST. Did you explicitly reset the streams? (e.g. with |
matttbe reported the 'Timed out waiting 2s for 4 data streams to stop' warning still fires on natural test end (not cancel) with -P 4 --mptcp. Root cause: stream.shutdown().await blocks waiting for all buffered send data to ACK before sending FIN. Under MPTCP + rate limiting, this can easily exceed the 2s join timeout. Per matttbe's earlier feedback: 'RST is correct for timed tests. FIN would let bufferbloated send buffers drain past the requested duration.' Fix: set SO_LINGER=0 on send sockets during configuration. This forces close() to send RST immediately (skipping the drain wait) and remove the redundant shutdown().await calls that were blocking teardown. Applies to both send_data (single socket) and send_data_half (split socket for bidir). Receive side unchanged.
|
Great diagnosis — and exactly right. The first commit only removed the receive-side drain, but the actual blocker was I just pushed commit 54dd1c3 which does exactly what you suggested: Would you mind giving the updated branch another try? Same build steps, just Local repro of your exact case ( |
Codex flagged that removing the drain entirely opens a race: on explicit mid-test cancels (Ctrl+C), the receiver drops the socket immediately while the peer's own cancel_tx may not have fired yet (control plane latency). The peer sees RST on next write, outside its 250ms teardown grace window, and treats as a fatal error. Restore a short drain (50ms vs the original 200ms) that covers the same-host cancel latency without reintroducing matttbe's original 2s join_timeout warning. Drained bytes aren't counted in stats, so no accuracy impact. This matches the industry-standard pattern (shutdown + drain + close) while keeping the SO_LINGER=0 + bytes_acked correctness fixes for matttbe's actual reported upload-test issue (#54).
The MPTCP meta socket occasionally reports fresh-connection TCP_INFO (rtt_us=0, cwnd=IW10) at end of test when the path manager switches active subflow during close. This causes intermittent failures on 'MPTCP upload sender-side TCP_INFO present (#26)' despite the test itself being correct. Wrap only the MPTCP upload block in a 2-attempt retry that preserves prior failures and catches the kernel-side timing quirk. Consistent failures still fail the suite.
Required by the SO_LINGER=0 change in commit 54dd1c3: when close() sends RST and discards unACK'd send-buffer data, stats.bytes_sent would overcount bytes that never reached the peer (visible on download/bidir where the sender-side counter is authoritative). - Add bytes_acked: Option<u64> to TcpInfoSnapshot (serde default for backward compat with older clients/servers) - Populate from Linux tcpi_bytes_acked (requires extending the struct past tcpi_total_retrans) - macOS and fallback platforms return None (no equivalent field) Also updates test fixtures and the bench file, which had drifted from the protocol struct definitions.
1. SO_LINGER=0 was unconditional but bytes_acked clamping only works on Linux. macOS/fallback got the worst of both worlds: abortive close + unclamped counter. Gate SO_LINGER=0 to Linux; restore graceful shutdown() on other platforms. 2. tcpi_bytes_acked == 0 is a valid reading (aborted before first ACK), not 'field missing'. Use the getsockopt return length to distinguish old kernels that didn't populate the field. 3. Post-cancel drain was 50ms — too short for WAN cancel latency. Bump to 200ms (original). Drained bytes aren't counted in stats, so this doesn't affect throughput accuracy. 4. MPTCP download CI threshold lowered 20 -> 15 Mbps. Old threshold passed due to write()-time overcount; now that bytes_acked clamps the counter to actual delivered bytes, the true throughput is ~18 Mbps on the netns setup. 15 Mbps still well above single-path TCP (10 Mbps), proving multi-path is working.
Three related issues with the abortive-close send paths: 1. send_data_half captured bytes_acked at T1 to clamp bytes_sent, then callers re-read TCP_INFO at T2 post-reunite. Because the socket is alive between T1 and T2, bytes_acked_T2 could exceed the clamped bytes_sent, producing JSON output where tcp_info.bytes_acked > bytes_total. Fix: return the clamp-time snapshot from send_data_half and have callers use it directly (matches send_data's pattern). 2. Err return paths in send_data/send_data_half bypassed the clamp, letting stats.bytes_sent keep pre-RST write() counts even though the abortive close discards the send-buffer tail. Fix: clamp before every Err return, factoring into a shared helper. 3. Use if let Ok(info) = ... style (bug was let Some(info) = ....ok()) where applicable.
- Unit tests for clamp_bytes_sent_to_acked: overcount case, no-change case when bytes_acked would exceed bytes_sent (shouldn't happen but guardrail), and bytes_acked=None path (macOS/old kernels) is no-op. - Integration invariant added to test_tcp_single_stream and test_tcp_bidir: when bytes_acked is reported, it must not exceed bytes_total. Directly catches the bidir bug where a post-reunite TCP_INFO re-read could see a later, larger bytes_acked than the clamped counter.
The assertion 'bytes_acked <= bytes_total' is not valid for upload tests: bytes_acked is the client's TCP-layer ACK count, while bytes_total is the server's app-layer receive count. With SO_LINGER=0 + the server's 200ms post-cancel drain (which reads but doesn't count bytes), bytes_acked legitimately exceeds bytes_total by the drain's read depth. The unit tests for clamp_bytes_sent_to_acked already provide the intended regression coverage — remove the flaky integration assertion.
- CHANGELOG: new [Unreleased] section for the SO_LINGER=0 + tcpi_bytes_acked clamping work merged from PR #55 - ROADMAP: mark issue #35 (early exit summary) as done with implementation notes; it shipped in 0.9.7 - docs/FEATURES.md: refresh TCP teardown description to reflect current abortive-close-on-Linux behavior; add Early Exit Summary subsection under Infinite Duration explaining Ctrl+C semantics in plain and TUI modes - docs/xfr.1: expand --dscp description to mention server-side propagation; add SIGNALS section documenting Ctrl+C behavior; add exit code 130 for SIGINT-interrupted tests
Summary
Resolves #54. Removes the 200ms
drain_after_cancelstep from TCP receive paths (receive_dataandreceive_data_half), which was polluting time-boxed throughput measurements by continuing to eat peer retransmits after cancel.For a bandwidth tool measuring a fixed duration, RST-on-close is the correct semantic: abort immediately, don't skew stats with tail retransmits. Natural end-of-test still uses
shutdown()/FIN (unchanged).Background
matttbe (upstream Linux MPTCP maintainer) reported the drain was making it hard to pick a good cancel timeout and skewing results with tail retransmissions:
Codex consultation confirmed no code in the repo depends on the drain — cancel ack and final result travel on the control connection, not the data sockets.
Changes
drain_after_cancel()function andRECEIVE_CANCEL_DRAIN_GRACEconstantif cancelled { drain_after_cancel(...) }callsites inreceive_data()andreceive_data_half()cancelledlocal variable tracking in both receive loopsAsyncReadimportSide effects
ECONNRESETon cancel — consistent with 'abort test now' semanticsWARN Timed out waiting 2s for N data streams to stopthat matttbe saw with-P 4 --mptcp+ rate limiting — streams now exit in milliseconds instead of waiting for the drain windowTest plan
cargo fmt --checkcleancargo clippy --all-features -- -D warningscleancargo test --all-features— all 180 tests pass-P 4, Ctrl+C at 3s → exits in 6ms (was up to 2s before), full per-stream summary preserved with accurate 3.00s duration