feat: add TCP stream reassembly engine#10
Merged
Conversation
Forensic-grade TCP reassembly module design covering: - ISN-relative sequence tracking with wraparound handling - First-wins overlap policy for evasion resilience - Incremental stream exposure via StreamHandler callbacks - Configurable memory limits (10MB/direction, 1GB global) - Mid-stream flow pickup and LRU eviction
- Use u64 for ISN-relative offsets (handles >4GB streams) - Add overlap anomaly detection (>50 overlaps = evasion Finding) - Add conflicting overlap data detection (HIGH confidence Finding) - Add small segment flood detection (>2048 tiny segments) - Add depth truncation mid-segment (partial store, not full drop) - Add Future Considerations section for deferred items (TFO, PAWS, etc.) - Confirm ACK tracking not needed for offline analysis
Add seq_number: u32 field to TransportInfo::Tcp variant and populate it from tcp.to_header().sequence_number during packet decoding. Update test fixtures in analyzer_tests.rs and summary_tests.rs to include the new field.
Sorting IPs and ports independently would merge different connections (e.g., A:80->B:443 and A:443->B:80 would produce the same key). Correct approach: compare (ip, port) tuples together.
Implements TCP segment buffering (insert_segment) and ordered delivery (flush_contiguous) with first-wins overlap policy, conflict detection, sequence wraparound support, depth limiting with truncation, and anomaly counters for small segments and overlaps.
Add --reassemble, --no-reassemble, --reassembly-depth, and --reassembly-memcap global CLI flags; integrate TcpReassembler with a NullHandler placeholder into the run_analyze pipeline including finalize and findings collection.
C1: RST/FIN closed flows now removed from HashMap immediately.
Previously they accumulated forever (expire_flows skipped Closed
flows). New connections reusing the same 5-tuple would silently
merge with the dead flow, corrupting reassembly.
C2: Overlap conflict detection now uses slice comparison instead of
byte-by-byte loop. Prevents O(N²·S) CPU exhaustion from crafted
overlapping segments.
Also fixes I6: expire_flows now uses checked subtraction instead of
wrapping_sub, preventing premature expiration on non-monotonic pcap
timestamps. Closed flows are also now eligible for expiration as a
safety net.
I1: small_segment_count now cumulative (not reset on normal segments)
I2: Anomaly thresholds use > with fired flags (not exact ==).
Named constants OVERLAP_ALERT_THRESHOLD (50) and
SMALL_SEGMENT_ALERT_THRESHOLD (2048).
I3: Add max_flows (100K) to ReassemblyConfig. Eviction triggers
when flow count exceeds limit.
I4: Add max_segments_per_direction (10K) to prevent BTreeMap
overhead explosion from sparse insertions.
I5: Eviction and finalize now flush contiguous data before
removing flows. Salvageable data delivered to handler.
Eviction loop also fixed from O(n^2) to O(n log n).
I7: insert_segment returns DepthExceeded (not fake Inserted)
when ISN is None. Added debug_assert.
- FIN on_flow_close now fires AFTER payload processing (not before), so stream handlers receive all data before the close notification - expire_flows now flushes contiguous data before removing flows, consistent with evict_flows and finalize
- RST path now flushes buffered contiguous data before removal (consistent with evict/finalize/expire paths) - FIN-closed removal now flushes both directions before removal (opposite direction's buffered data was silently lost) - Gap insertion in overlap handling now checks max_segments limit per insert (prevents bypass via single large overlapping segment) - fin_count uses saturating_add to prevent u8 overflow panic
S4: Cap findings vector at 10K to prevent unbounded growth S6: Validate ReassemblyConfig (assert non-zero values) S10: Add conflicts_with for --reassemble/--no-reassemble TD1: Remove dead FlowState::TimedOut variant TD3: Collapse initiator_ip/initiator_port into Option<(IpAddr, u16)>
This was referenced Apr 6, 2026
- tls.pcap: Wireshark TLS capture (58 packets, Ethernet, link type 1) — WORKS - http.pcap: Wireshark HTTP capture (1 packet) — WORKS - segmented.pcap: Wireshark segmented TCP (Raw IP, link type 101) — FAILS (decoder only handles Ethernet) - http-ooo.pcap: Wireshark HTTP OOO (link type 113) — FAILS (same reason) Findings: wirerust only supports Ethernet encapsulation (link type 1). Raw IP (101), Linux cooked (113), and pcapng format are not supported. These are reader/decoder issues, not reassembly issues.
5 tasks
Zious11
added a commit
that referenced
this pull request
Apr 7, 2026
## Summary Closes #13. Adds 7 integration tests covering gaps identified during PR #10 review: - **SYN+ACK / bidirectional data** — Full 3-way handshake, data both directions, verify `Direction` assignment and `flows_partial == 0` - **FIN teardown** — Full lifecycle (handshake → bidirectional data → dual FIN), verify `CloseReason::Fin`, `flows_fin` stat, `total_memory == 0` - **max_flows eviction** — `max_flows=2`, exceed with 3rd flow, verify LRU eviction order by FlowKey and `CloseReason::MemoryPressure` - **memcap eviction** — `memcap=10`, exceed with out-of-order buffered data, verify eviction and `CloseReason::MemoryPressure` - **Overlap anomaly finding** — 51 duplicate segments trigger `OVERLAP_ALERT_THRESHOLD(50)`, verify finding fields (category, confidence, verdict, MITRE technique) - **Conflicting overlap finding** — Retransmit with different data, verify `Confidence::High` finding - **max_segments_per_direction** — 5 non-contiguous segments fill limit, 6th rejected, verify existing segments survive (non-destructive rejection) Also adds `ack: bool` parameter to `make_tcp_packet` test helper (acceptance criteria from issue). ## Test plan - [x] All 15 engine tests pass (`cargo test --test reassembly_engine_tests`) - [x] Full suite passes (91 tests) - [x] `cargo clippy --all-targets` clean - [x] `cargo fmt` clean - [x] No production code changes — test-only PR
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
src/reassembly/)--reassemble,--no-reassemble,--reassembly-depth,--reassembly-memcapInfrastructure for #1 (HTTP analyzer) and #2 (TLS analyzer).
Test plan
cargo test— all 42+ tests passcargo clippy -- -D warnings— cleancargo fmt --check— clean