Skip to content

feat: add TCP stream reassembly engine#10

Merged
Zious11 merged 18 commits intodevelopfrom
feature/tcp-reassembly
Apr 6, 2026
Merged

feat: add TCP stream reassembly engine#10
Zious11 merged 18 commits intodevelopfrom
feature/tcp-reassembly

Conversation

@Zious11
Copy link
Copy Markdown
Owner

@Zious11 Zious11 commented Apr 6, 2026

Summary

  • Forensic-grade TCP stream reassembly module (src/reassembly/)
  • FlowKey canonicalization by (ip, port) tuples, ISN-relative u64 offsets
  • BTreeMap segment storage with contiguous flush
  • First-wins overlap policy with anomaly detection:
    • Conflicting data in overlapping segments (HIGH confidence finding)
    • Excessive overlaps >50 (evasion attempt finding)
    • Small segment floods >2048 (evasion attempt finding)
  • Configurable depth limit (default 10MB/direction) with mid-segment truncation
  • Global memcap (default 1GB) with LRU eviction (non-established first)
  • Mid-stream pickup (missing SYN) with partial flow flagging
  • Incremental stream delivery via StreamHandler callbacks
  • CLI flags: --reassemble, --no-reassemble, --reassembly-depth, --reassembly-memcap
  • Sequence number wraparound handling via u64 promotion
  • 23 new tests (flow, segment, engine)

Infrastructure for #1 (HTTP analyzer) and #2 (TLS analyzer).

Test plan

  • cargo test — all 42+ tests pass
  • cargo clippy -- -D warnings — clean
  • cargo fmt --check — clean
  • Segment tests: ordered, OOO, overlap first-wins, retransmit dedup, wraparound, depth truncation
  • Engine tests: 3-packet stream, OOO delivery, mid-stream, RST, finalize, timeout
  • Flow tests: canonicalization, state transitions, direction detection, mid-stream pickup

Zious11 added 17 commits April 6, 2026 09:49
Forensic-grade TCP reassembly module design covering:
- ISN-relative sequence tracking with wraparound handling
- First-wins overlap policy for evasion resilience
- Incremental stream exposure via StreamHandler callbacks
- Configurable memory limits (10MB/direction, 1GB global)
- Mid-stream flow pickup and LRU eviction
- Use u64 for ISN-relative offsets (handles >4GB streams)
- Add overlap anomaly detection (>50 overlaps = evasion Finding)
- Add conflicting overlap data detection (HIGH confidence Finding)
- Add small segment flood detection (>2048 tiny segments)
- Add depth truncation mid-segment (partial store, not full drop)
- Add Future Considerations section for deferred items (TFO, PAWS, etc.)
- Confirm ACK tracking not needed for offline analysis
Add seq_number: u32 field to TransportInfo::Tcp variant and populate it
from tcp.to_header().sequence_number during packet decoding. Update test
fixtures in analyzer_tests.rs and summary_tests.rs to include the new field.
Sorting IPs and ports independently would merge different connections
(e.g., A:80->B:443 and A:443->B:80 would produce the same key).
Correct approach: compare (ip, port) tuples together.
Implements TCP segment buffering (insert_segment) and ordered delivery
(flush_contiguous) with first-wins overlap policy, conflict detection,
sequence wraparound support, depth limiting with truncation, and anomaly
counters for small segments and overlaps.
Add --reassemble, --no-reassemble, --reassembly-depth, and --reassembly-memcap
global CLI flags; integrate TcpReassembler with a NullHandler placeholder into
the run_analyze pipeline including finalize and findings collection.
C1: RST/FIN closed flows now removed from HashMap immediately.
    Previously they accumulated forever (expire_flows skipped Closed
    flows). New connections reusing the same 5-tuple would silently
    merge with the dead flow, corrupting reassembly.

C2: Overlap conflict detection now uses slice comparison instead of
    byte-by-byte loop. Prevents O(N²·S) CPU exhaustion from crafted
    overlapping segments.

Also fixes I6: expire_flows now uses checked subtraction instead of
wrapping_sub, preventing premature expiration on non-monotonic pcap
timestamps. Closed flows are also now eligible for expiration as a
safety net.
I1: small_segment_count now cumulative (not reset on normal segments)
I2: Anomaly thresholds use > with fired flags (not exact ==).
    Named constants OVERLAP_ALERT_THRESHOLD (50) and
    SMALL_SEGMENT_ALERT_THRESHOLD (2048).
I3: Add max_flows (100K) to ReassemblyConfig. Eviction triggers
    when flow count exceeds limit.
I4: Add max_segments_per_direction (10K) to prevent BTreeMap
    overhead explosion from sparse insertions.
I5: Eviction and finalize now flush contiguous data before
    removing flows. Salvageable data delivered to handler.
    Eviction loop also fixed from O(n^2) to O(n log n).
I7: insert_segment returns DepthExceeded (not fake Inserted)
    when ISN is None. Added debug_assert.
- FIN on_flow_close now fires AFTER payload processing (not before),
  so stream handlers receive all data before the close notification
- expire_flows now flushes contiguous data before removing flows,
  consistent with evict_flows and finalize
- RST path now flushes buffered contiguous data before removal
  (consistent with evict/finalize/expire paths)
- FIN-closed removal now flushes both directions before removal
  (opposite direction's buffered data was silently lost)
- Gap insertion in overlap handling now checks max_segments limit
  per insert (prevents bypass via single large overlapping segment)
- fin_count uses saturating_add to prevent u8 overflow panic
S4: Cap findings vector at 10K to prevent unbounded growth
S6: Validate ReassemblyConfig (assert non-zero values)
S10: Add conflicts_with for --reassemble/--no-reassemble
TD1: Remove dead FlowState::TimedOut variant
TD3: Collapse initiator_ip/initiator_port into Option<(IpAddr, u16)>
- tls.pcap: Wireshark TLS capture (58 packets, Ethernet, link type 1) — WORKS
- http.pcap: Wireshark HTTP capture (1 packet) — WORKS
- segmented.pcap: Wireshark segmented TCP (Raw IP, link type 101) — FAILS (decoder only handles Ethernet)
- http-ooo.pcap: Wireshark HTTP OOO (link type 113) — FAILS (same reason)

Findings: wirerust only supports Ethernet encapsulation (link type 1).
Raw IP (101), Linux cooked (113), and pcapng format are not supported.
These are reader/decoder issues, not reassembly issues.
@Zious11 Zious11 merged commit 7beaca6 into develop Apr 6, 2026
40 checks passed
@Zious11 Zious11 deleted the feature/tcp-reassembly branch April 6, 2026 17:00
Zious11 added a commit that referenced this pull request Apr 7, 2026
## Summary

Closes #13. Adds 7 integration tests covering gaps identified during PR
#10 review:

- **SYN+ACK / bidirectional data** — Full 3-way handshake, data both
directions, verify `Direction` assignment and `flows_partial == 0`
- **FIN teardown** — Full lifecycle (handshake → bidirectional data →
dual FIN), verify `CloseReason::Fin`, `flows_fin` stat, `total_memory ==
0`
- **max_flows eviction** — `max_flows=2`, exceed with 3rd flow, verify
LRU eviction order by FlowKey and `CloseReason::MemoryPressure`
- **memcap eviction** — `memcap=10`, exceed with out-of-order buffered
data, verify eviction and `CloseReason::MemoryPressure`
- **Overlap anomaly finding** — 51 duplicate segments trigger
`OVERLAP_ALERT_THRESHOLD(50)`, verify finding fields (category,
confidence, verdict, MITRE technique)
- **Conflicting overlap finding** — Retransmit with different data,
verify `Confidence::High` finding
- **max_segments_per_direction** — 5 non-contiguous segments fill limit,
6th rejected, verify existing segments survive (non-destructive
rejection)

Also adds `ack: bool` parameter to `make_tcp_packet` test helper
(acceptance criteria from issue).

## Test plan

- [x] All 15 engine tests pass (`cargo test --test
reassembly_engine_tests`)
- [x] Full suite passes (91 tests)
- [x] `cargo clippy --all-targets` clean
- [x] `cargo fmt` clean
- [x] No production code changes — test-only PR
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant