Skip to content

Restructure: Trust → Structure → Depth → Polish → CI → Portfolio (v1.4.0)#33

Merged
sohan-shingade merged 56 commits intomainfrom
restructure
Apr 25, 2026
Merged

Restructure: Trust → Structure → Depth → Polish → CI → Portfolio (v1.4.0)#33
sohan-shingade merged 56 commits intomainfrom
restructure

Conversation

@sohan-shingade
Copy link
Copy Markdown
Owner

Restructure: Trust → Structure → Depth → Polish → CI → Portfolio

Closes the entire 6-phase restructure plan, the bulk of the deferred
backlog, and all 5 D-6.4-replay slices end-to-end. 41 commits this
session, +8000 / -2400 LOC, 1861 → 2070 tests, ruff hard-fail clean,
vite build clean, 30/30 cargo tests, 91/91 across the full replay
surface (event log + replay primitive + snapshots + engine writer
hooks + REST + MCP + auto-compaction + E2E parity + Rust ledger
parity).

Bumps version 1.3.1 → 1.4.0.

Highlights by phase

Phase 1 — Trust & correctness (shipped)

Phase 1 work landed earlier in the session as commits d7e80e5
35cbf24. Recap: parity reports, PIT audit, custom data ingest,
sandbox subprocess isolation, 26 PIT_METADATA blocks, force-close
correctness fix in Rust + Python, cross-market terminals.

Phase 2 — Structural cleanup (shipped, full close)

D-2.1.b: BacktestContext god class → 7 manager classes

Every piece of mutable state now lives in one of seven dedicated
owners under flint/execution/:

Manager Owns
PositionManager open + closed-trade dicts
CashManager cash, allocator, fees / tx / funding counters
FillRecorder recorded fills + diagnostic log
OrderQueue pending limit/stop/TP queue + this-bar market queue
FundingLedger per-market + per-venue funding history
BorrowLedger Jupiter borrow rates + paid-borrow ledger
MarketDataFeed cross-market candles + orderbook + OI

Caller sites all migrated: every _apply_fill, apply_funding,
check_liquidations, process_pending_orders, process_market_orders,
close_all_positions, set_candle, account, positions,
pending_orders body now routes through the right manager. Legacy
property aliases retained for tests; new code uses managers directly.

D-4.7-full: flint/services/{strategies,backtest,journal,data,paper}.py
pulls work-doing code out of FastAPI routes into a service layer that
MCP, scripts, and notebooks call directly. Strategy-template registry
has one source of truth (was duplicated across 3 places).

D-2.2-internal: every store mutation routes through _sql_*
wrappers that hold the lock; routes never touch store._conn directly.

Phase 3 — Depth on wedge (shipped)

D-3.4-rust + D-3.1-rust: Rust ports of TxCostModel and
OrderbookFiller (PyO3, 2.24× and 3.52× speedups, 1e-9 parity tests).

D-3.3-maker-detection: FillResult.is_maker flag wires through
the Rust fill pipeline; resting-limit fills tag maker; Drift
(-2 bps rebate) and Hyperliquid (1 bp) maker rates verified through
end-to-end PyO3 tests.

D-3.5-orchestrator: flint/risk/portfolio_orchestrator.py:PortfolioMarginEngine
composes MarginEngine + VenueAllocator + PortfolioRiskEngine into one
pre-trade check facade. BacktestContext.market_order consults the
orchestrator; rejection comes back tagged MARGIN/ALLOCATOR/PORTFOLIO
so the warn-line names which engine vetoed.

Phase 4 — Product polish (shipped)

D-4.3-websocket end-to-end:

  • Per-session routes /ws/paper/{id} and /ws/live/{id}
  • ConnectionManager with monotonic per-channel seq + 500-deep replay
    ring buffer (?since=<seq> opt-in) + ping(channel) heartbeat
  • useWebSocket<T> hook with 1→2→5→10→30s reconnect backoff +
    30s heartbeat-stale detection
  • PaperTradingEngine emits {type: tick} per bar and {type: trade}
    per closed trade
  • LiveExecutionContext emits {type: fill} from _handle_fill
    (fire-and-forget via ensure_future)
  • PaperTrading.tsx + LiveMonitor.tsx pages bound: live equity / trades /
    fills overlay polled state, with WS LIVE / CONNECTING / OFFLINE
    indicator dot

D-4.2-backoff-full: useBackoffPoll<T> + 3-hook migration shipped.

D-1.4-ui: paper reconciliation upload (multipart CSV) + UI panel.

Phase 5 — CI (shipped)

D-5.1-ruff: ruff configured to F-class only (real bugs); 315
auto-fixes + 26 manual; CI flipped from soft to hard fail.

Phase 6 — Portfolio (foundations shipped)

D-6.1-unified: flint/portfolio/shared_engine.py:SharedCapitalPortfolioEngine
runs N strategies on one shared BacktestContext, so cash, fees,
funding, borrow, and the orchestrator's pre-trade margin gauntlet
all see the actual book. Per-strategy _TaggedContextProxy tags
order_ids with strategy_name:; closed-trade exit_order_id lets PnL
flow back to the actual closer (was even-split-by-market in the
foundation slice).

D-6.4-replay (closed end-to-end, 5/5 slices):

  • portfolio_events(session_id, seq, ts, kind, payload) table +
    EventLogWriter (thread-safe, monotonic per-session seq)
  • BookState + fold(events, initial_capital, seed=) + replay()
    primitive
  • portfolio_snapshots + SnapshotStore for compaction; replay
    fast-forwards via latest_before(target_ts)read_after_seq
    fold(seed=snapshot)
  • BacktestContext._emit(kind, payload) writer hooks: zero overhead
    when event_log_writer + event_session_id not set; otherwise emits
    on every order submit/cancel, fill, funding, liquidation, borrow
  • REST: GET /api/v1/replay/{id}/{events,state,summary}
  • MCP: replay_summary, replay_state, list_replay_events
  • UI: /replay page with session loader, real timeline slider
    (range bounded to first/last event ts), step controls
    (← PREV / NEXT → / ⏮ START / END ⏭), state cards, positions
    table, color-coded event-tail panel (50 most recent folded events)
  • Auto-compaction: BacktestContext's snapshot_every ctor kwarg
    (default 10_000) drives _emit to fold + persist a fresh
    BookState every N events. Default disabled (no overhead unless
    caller wires a SnapshotStore).
  • Rust ledger ports: flint_core.FundingLedger + flint_core.BorrowLedger
    with PyO3 bindings (add/latest/recent/by_venue,
    record/record_payment/add_paid/cumulative_at). 7 cargo +
    7 Python↔Rust parity tests pinned to 1e-9.

Load-bearing parity tests:

  • tests/test_event_log_engine_hooks.py::TestEndToEndReplayParity
    replay over the live-emitted log reproduces BacktestContext.account.cash
    byte-for-byte.
  • tests/test_replay_e2e_backtest.py — same parity over a real
    MACrossoverStrategy run with auto-compaction enabled.
  • tests/test_auto_compaction.py::TestSnapshotPreservesReplayCorrectness
    snapshot fast-forward replay never produces a divergent state.

Test sweep

2070 passed · 7 skipped · 0 failed

(Skipped suites are missing optional deps — ccxt, eth_account,
solders — none are code regressions.)

UI: 133 vitest · vite build clean. Rust: 30 cargo tests.

Files reorganized

New modules under flint/:

  • execution/{position_manager,cash_manager,fill_recorder,order_queue,funding_ledger,borrow_ledger,market_data_feed}.py
  • services/{strategies,backtest,journal,data,paper}.py
  • risk/portfolio_orchestrator.py
  • portfolio/{shared_engine,event_log,replay,snapshots}.py
  • api/routes/replay.py
  • 3 new MCP tools

New UI pages + hooks:

  • ui/src/pages/Replay.tsx
  • ui/src/hooks/{useWebSocket,useReplay}.ts

New Rust modules (PyO3-exposed):

  • rust/src/engine/{tx_costs,orderbook_fill,funding_ledger,borrow_ledger}.rs
  • PyO3 classes: flint_core.TxCostModel, flint_core.OrderbookFiller,
    flint_core.FundingLedger, flint_core.BorrowLedger
  • supports_tx_costs, supports_orderbook_walk,
    supports_maker_taker_fees capability flags flipped to true

Migration notes for users

  • pip install -U flint-trading (1.4.0)
  • No breaking API changes: every old method still works through the
    legacy property aliases. New code should read state via
    ctx._pm.values(), ctx._cm.cash, ctx.account etc. directly.
  • New event_log_writer + event_session_id ctor kwargs on
    BacktestContext are opt-in; without them, behavior is identical
    to 1.3.1.
  • New portfolio_risk ctor kwarg routes book-level checks through
    the new PortfolioMarginEngine.

Follow-on work

  • D-2.1.c (live context merge) — needs testnet secrets
  • D-2.1.d (paper context split) — needs deliberate API design pass
  • D-6.5-api (live deploy two-step) — needs testnet secrets
  • D-6.6-proof (funding-arb proof notebook) — needs D-6.5-api
  • D-6.7-jito (real Jito bundle integration) — needs D-6.5-api

WAVE_STATUS.md tracks per-item state; ROADMAP.md tracks phase-level.

sohan-shingade and others added 30 commits April 24, 2026 11:57
Full multi-phase restructure plan. 6 phase specs under docs/specs/ with
exit criteria + dependency graph + task breakdowns, rooted in the
2026-04-23 audit findings.

- IMPLEMENTATION_PLAN.md — master plan, sequencing, quick wins, rules of engagement
- ROADMAP.md — rewritten as short index pointing to plan + phase specs
- TRUST_ARTIFACTS.md — live status board for Phase 1 items
- DEFERRED.md — sibling-PR tracker with owners, prerequisites, effort estimates
- docs/specs/phase-1-trust-correctness.md
- docs/specs/phase-2-structural-cleanup.md
- docs/specs/phase-3-depth-on-wedge.md
- docs/specs/phase-4-product-polish.md
- docs/specs/phase-5-ci-testing.md
- docs/specs/phase-6-portfolio-cross-venue.md

Wedge: "best local backtester + paper-trading lab for Drift + Hyperliquid
perp strategies." Phase ordering is load-bearing: trust → structure →
depth → portfolio → live.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…pts)

Full docs reorganization following https://diataxis.fr/:
- docs/tutorials/ — linear walkthroughs (01-06)
- docs/how-to/ — task-oriented recipes
- docs/reference/ — exhaustive catalogs (REST API, CLI, SDK, metrics, etc)
- docs/concepts/ — explanation-oriented (architecture, fill pipeline, risk model, regimes, margin/capital, backtests-vs-reality)
- docs/README.md — unified doc index

Existing guides trimmed + cross-linked to new structure. MEV content moved
out of product surface into docs/mev-*.md as research artifacts.

- scripts/build_docs.py extended to scan new sections + regenerate UI
  docs content
- ui/src/data/docs-content.ts regenerated from updated markdown sources

Rename: "4-tier fill pipeline" → "3-stage pipeline with 4 impact models"
(latency → impact → partial; impact stage has 4 models). Misnomer flagged
in the 2026-04-23 audit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fixes every Rust/Python parity bug flagged in the 2026-04-23 audit + ships
scaffolding for proof artifacts firms need before buying features.

Engine + parity (T1.1):
- Force-close equity: append terminal point instead of overwriting last-bar
  mark-to-market (both Python backtest/engine.py AND Rust runner.rs)
- Cross-market cursor: strict `<` instead of `<=` so cross-market history
  never includes same-ts bars (simultaneity assumption now explicit)
- LatencyStage RNG: deterministic default seed (was system-time); seed=-1
  opts into unseeded
- MonteCarlo seed: parameter threaded through run_monte_carlo(seed=)
- Rust VenueFiller RNG: seed threaded from Python (was hardcoded 42),
  per-venue offset so per-venue RNGs are independent but reproducible
- BacktestEngine(seed=): `_resolve_seed` derives deterministic-but-
  strategy-local seed when None

tx_cost wiring (T1.1.b):
- FillPipeline w/ tx_cost_model gates Rust off → Python path populates
  BacktestResult.total_tx_costs correctly
- Fixes test_tx_cost_deducted regression (ROADMAP §1.6 blocker)

Note: engine.py diff spans multiple phases (T1.1 base + T3.2 rust_required
+ T3.2 fallback_reason tagging + T4.5 cancel_check). Kept in one commit
because non-interactive staging can't slice a file across phases.

PIT audit (T1.3):
- scripts/audit_pit.py scans flint/providers/* for PIT_METADATA
- 3 flagship providers declared (drift_candles, hyperliquid_candles,
  funding_rates); remaining 22 tracked as D-1.3-providers
- artifacts/pit/initial-scan.md: first report

Determinism (T1.1-parity / T1.3.c):
- tests/test_rust_python_parity.py (5 tests)
- tests/test_determinism.py (5 tests)

Custom data (T1.6):
- flint/providers/custom.py: CustomCSVProvider + CustomParquetProvider
- SHA-256 source_hash provenance; strict OHLCV / monotonic / resolution
  validation; custom:* namespace enforcement
- docs/reference/custom-data-schema.md canonical schema
- tests/test_custom_provider.py (12 tests)

Parity report pipeline (T1.2):
- scripts/run_parity_report.py catalogs 6 strategies, emits markdown
  artifact with 5-metric gate (PnL divergence, fill MAE, timing match,
  trade count, equity correlation), exits non-zero on breach

Reconciliation tooling (T1.4):
- scripts/reconcile_fills.py matches engine fills vs venue CSV export
- Nearest-ts within window on (market, venue, side, size)
- Stats: price-bps p50/p95/p99 + ts-delta p50/p95 + orphan rate
- tests/test_reconcile_fills.py (14 tests)

Proof notebooks (T1.5):
- notebooks/{funding_arb,basis_trade,momentum_breakout}.py (jupytext)
- Each pins candle sha256, runs backtest + parity, emits artifact,
  CI-gated exit code
- notebooks/README.md

CLI cosmetics (T1.1.f):
- cli.py: "Candles" label → "Equity Points" (accurate after force-close
  changes the curve length relative to candle count)

Test asserts updated for force-close terminal point: test_backtest.py,
test_multi_market.py, test_example_strategies_v2.py,
test_pyth_backtest_integration.py, test_latency.py (default-seed
determinism replaces stale unseeded-varies test).

TRUST_ARTIFACTS.md updated: 5/7 🟢 shipped, 2/7 🟡 partial (CI gate for
parity → Phase 5.3; reconciliation API → Phase 4).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…onfig

Closes out CLAUDE.md rule violations flagged in audit + ships user-strategy
isolation + unified config entry point. Invasive ExecutionContext
breakup (D-2.1.b/c/d) deferred to sibling PRs — contract locked in by
tests/test_context_portability.py.

Store encapsulation (T2.2):
- 10 new FlintStore methods — list_live_sessions, list_markets_with_data,
  list_venues_for_market, delete_market_data, count_funding_rates,
  count_orderbook_snapshots, count_open_interest, count_venue_candles,
  mark_running_live_sessions_interrupted (+ existing get_live_equity_history)
- Every method wraps execute in `with self._lock:` (CLAUDE.md rule)
- api/routes/live.py, api/routes/data.py, api/main.py lifespan,
  mcp_server.py migrated — grep 'store\._conn\|store\._lock' in API
  surface now returns zero hits
- tests/test_store_encapsulation.py lint enforces the rule going forward
- tests/test_mcp_server.py mock updated for new store API

User-strategy sandbox (T2.4):
- flint/strategy/sandbox.py — multiprocessing.Process with `spawn` start
  method; RLIMIT_AS (best-effort — Linux enforced, macOS advisory);
  configurable wall-clock timeout with SIGTERM then SIGKILL escalation
- Typed exceptions: StrategyTimeoutError, StrategyMemoryError,
  StrategyExecutionError
- tests/test_sandbox_escape.py — 12 hostile payloads (os.system,
  subprocess, open, eval, exec, __import__, socket, nested import,
  .unlink, infinite loop, memory bomb, no-Strategy-class)
- Route wiring into /api/v1/backtest/run for user-uploaded strategies
  deferred to D-2.4.b (Phase 4 UI work)

Unified config (T2.3):
- flint/backtest/config.py — BacktestConfig dataclass + nested FillConfig
  / MarginConfig / AllocatorConfig / VenueConfig
- Stable to_json_str + sha256 checksum() — load-bearing for proof
  notebook provenance
- from_dict / from_yaml / from_legacy_kwargs (one-release deprecation
  window for existing kwargs)
- BacktestEngine.from_config(cfg, strategy, **overrides) classmethod

ExecutionContext conformance (T2.1.a/e):
- tests/test_context_portability.py walks the subclass tree, enforces
  every concrete ExecutionContext implements every abstract method,
  checks signature alignment on market_order
- God-class breakup (T2.1.b: BacktestContext → PositionManager /
  OrderQueue / FundingTracker / BorrowTracker / MarketDataFeed), live-
  context merge (T2.1.c), PaperContext separation (T2.1.d) = sibling
  PRs D-2.1.b/c/d

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tion

Executes docs/specs/execution-upgrade-v0.3.md Python-side. Rust ports of
orderbook fills + partial+latency stage + multi-venue orchestrator are
sibling PRs (D-3.1-rust / D-3.4-rust / D-3.5-orchestrator) — tracked in
DEFERRED.md with prereqs.

Rust capability matrix (T3.2):
- Rust `capabilities()` PyO3 function exposes fill_models, fee_models,
  supports_{partial_fills,latency_stage,tx_costs,orderbook_walk,
  cross_market,multi_venue_margin,borrow_snapshots,maker_taker_fees},
  engine + engine_version
- flint/backtest/rust_capabilities.py — Python discovery helper with
  @lru_cache, safe stub when flint_core missing, python_capabilities()
  superset
- BacktestResult.engine_used + fallback_reason fields (flint/models.py)
- BacktestEngine(rust_required=True) + RustRequiredError — hard errors
  instead of silent fallback
- Named fallback_reasons list — every Rust-gate says exactly why
- tests/test_rust_capabilities.py (8 tests)

Maker/taker fee trait (T3.3):
- Rust FeeModel::{MakerTaker, Drift, Hyperliquid} variants +
  compute_fee_with_role(is_maker) — Drift 10bps taker / -2bps maker,
  HL 3.5bps taker / 1bp maker. 5 cargo tests in rust/src/engine/fees.rs
- Python MakerTakerFeeModel + HyperliquidFeeModel (flint/execution/
  fee_models.py)
- tests/test_fee_model_parity.py (7 tests, 1e-9 tolerance)
- Maker-vs-taker detection at fill time in Rust pipeline = D-3.3-maker-
  detection (needs D-3.4-rust fill pipeline first)

Orderbook-walk fills Python hardening (T3.1):
- OrderbookFillModel: reject_on_insufficient_depth=True default rejects
  size > aggregate book depth (was silent underfill); per-fill impact_bps
  attribution = (vwap − mid) / mid × 10_000 signed positive = fill worse
  than mid for taker
- _book_mid helper; fallback preserved when no book or market mismatch
- tests/test_orderbook_fill.py (9 tests)
- Rust port = D-3.1-rust

Slippage calibration reports (T3.6):
- scripts/calibrate.py — fits power-law + sqrt impact via 5-fold CV;
  picks best model by CV R²; drift detection vs stored impact_coefficient
  (15% threshold); emits artifacts/calibration/{market}-{venue}-{date}.md
- --write-yaml round-trips coefficient into flint.yaml, gated by --force
  when drift exceeds threshold
- tests/test_calibrate_script.py (11 tests)

Multi-venue margin primitives (T3.5):
- tests/test_multi_venue_margin_integration.py (6 tests) validates
  existing VenueAllocator cross-venue transfer latency, multi-transfer-
  in-flight, PnL attribution, MarginEngine per-venue MMR, fragmentation
  metrics
- Unified PortfolioMarginEngine facade in BacktestContext = D-3.5-
  orchestrator (blocked on D-2.1.b god-class breakup)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 4 — honest positioning, cancel wiring, UI error surface, capability
route. Full WebSocket + MCP in-process = sibling PRs.

- /api/v1/capabilities + /api/v1/system/capabilities alias (T4.6) —
  reports version, api_version, engine (rust_available + rust_capabilities
  dict from Phase 3), features bool flags (UI hides surface when false),
  limits (max_concurrent_backtests, backtest_timeout_s). 5 tests.
- Backtest cancellation engine-side (T4.5) — BacktestCancelled exception;
  BacktestEngine(cancel_check=Callable[[], bool]) polled every 100 bars
  alongside timeout check; Rust gate added to fallback_reasons; worker
  thread in /api/v1/backtest/run passes closure over _entries[run_id]
  .status == "cancelled"; BacktestCancelled except branch frees slot.
  4 tests. UI wiring = D-4.5-ui sibling PR.
- Lazy-load Monaco (T4.4) — React.lazy wraps @monaco-editor/react with
  styled Suspense fallback; Dashboard + non-editor pages don't pay
  ~1MB bundle cost.
- ConnectionBanner (T4.2) — polls /api/v1/health every 10s, degraded
  after 1 failure, offline after 3, Retry button. Mounted in App.tsx.
  Root status-probe silent catch replaced with console.warn.
  Remaining 9 silent-catch sites + per-hook backoff = D-4.2-backoff.
- README auto-counts (T4.1) — scripts/update_readme_counts.py injects
  live counts between <!-- counts:auto --> markers; --check for CI
  drift gate. "4-tier fill pipeline" renamed to "3-stage pipeline with
  4 impact models" across README.
- Full editorial wedge rewrite (split comparison tables, refocus hero,
  notebooks-not-examples in "Try It") = D-4.1-wedge.
- MCP in-process service layer = D-4.7-mcp-inprocess.
- WebSocket paper/live streams = D-4.3-websocket.

Phase 5 — matrix CI, Rust job, parity + smoke workflows.

.github/workflows/ci.yml — full rewrite:
- test matrix: ubuntu × macos × py3.10/3.11/3.12 (6 jobs)
- rust matrix: ubuntu + macos; maturin develop + cargo test --release +
  Rust/Python parity tests (parity, engine, caps, determinism, fees)
- sandbox matrix: ubuntu + macos; tests/test_sandbox_escape.py with 120s
  timeout (memory-bomb test skips on macOS per platform limits)
- counts job: scripts/update_readme_counts.py --check gate
- lint job: ruff check + format (soft-fail pass one, tightens under
  D-5.1-ruff-fixes) + import sanity
- ui-build preserved
- Dropped `|| pip install -e .` silent fallback; pytest-timeout=60 hard cap
- ~/13 parallel jobs per push (was 3)

.github/workflows/parity.yml (T5.3) — workflow_dispatch + weekly cron
Mon 06:00 UTC. Runs scripts/run_parity_report.py, uploads markdown artifact,
fails on threshold breach. Accepts strategy/market/lookback_days inputs.

.github/workflows/live-smoke.yml (T5.6) — workflow_dispatch only (never
on push). venue chooser {drift, hyperliquid, both}. Runs tests/
integration/test_live_smoke.py gated on FLINT_LIVE_SMOKE=1 +
DRIFT_DEVNET_KEYPAIR / HL_TESTNET_KEY secrets. Guard rails in test:
max 0.01 SOL notional, refuse wallet balance > $20, auto-cancel in 1s,
testnet/devnet only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…slocation

Python-side Phase 6 scaffolding. Shared-capital orchestration + live deploy
API + Jito integration + replay = sibling PRs (D-6.1-unified, D-6.5-api,
D-6.7-jito, D-6.4-replay) — gated on Phase 2-3 sibling prereqs.

PortfolioRiskEngine (T6.2):
- flint/risk/portfolio.py — RiskLimits dataclass (gross / net exposure,
  per-venue / per-market concentration, correlation-cluster cap,
  drawdown kill-switch, 95%/99% historical-sim VaR)
- OrderLite / PositionLite transport types (tests don't need full
  BacktestContext plumbing)
- check_order → RiskCheckResult(approved, reason) with first-failing
  check named
- check_kill_switch(equity) tracks peak + triggers on drawdown
- Correlation clusters via union-find over user-supplied corr matrix
- tests/test_portfolio_risk.py (13 tests)

Correlation-aware Optuna objective (T6.3):
- flint/optimization/portfolio_objective.py —
  pairwise_correlations(equity_curves) returns mean pairwise correlation
  of per-strategy per-bar returns (handles zero-variance curves)
- portfolio_objective(trial, strategies_fn, runner, penalty_lambda) —
  maximizes portfolio_sharpe − penalty_lambda × max(0, corr − floor)
- score(...) standalone helper for post-hoc portfolio ranking
- tests/test_portfolio_objective.py (9 tests)

Funding Dislocation Arb reference (T6.6):
- flint/strategy/funding_dislocation_arb.py — evolution of
  FundingArbStrategy with z-score entry filter (spread ≥ N stddevs above
  trailing mean), Kelly-lite position sizing, mean-reversion exit
- parameters() surface ready for Optuna
- Proof notebook + mainnet checklist = D-6.6-proof (blocked on D-1.4-api
  reconciliation UI + D-6.5-api live deploy)
- tests/test_funding_dislocation_arb.py (5 tests)

Multi-strategy portfolio scaffold (T6.1):
- tests/test_portfolio_engine_multi_strategy.py (5 tests) validates
  existing flint.portfolio.engine.PortfolioEngine end-to-end
- Shared-capital pool (one strategy's loss drains another's budget) +
  pre-trade PortfolioRiskEngine gate on every order = D-6.1-unified
  (blocked on D-2.1.b BacktestContext breakup)

Final sweep: 1854 passed / 7 skipped / 0 failures across all phases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Original sprint plan for the Phase 3 "depth on wedge" work. Tracks the
four features transforming the engine from single-venue candle-based
simulation to multi-venue orderbook-aware margin-tracked execution.

Phase 3 commit (c11acad) implements the Python-side of items 1, 3
(primitives), and the calibration support surface. Rust-side items and
unified orchestration are tracked in DEFERRED.md (D-3.1-rust, D-3.4-rust,
D-3.5-orchestrator).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Full end-to-end smoke test of branch restructure surfaced 4 bugs; all
four fixed + captured by regression tests in tests/test_smoke_regressions.py.
See BUG_REPORT_2026-04-24.md for full writeup with symptoms + root
causes + red evidence.

BUG-1 — /api/v1/live/sessions DuckDB Binder Error
  flint/store.py:list_live_sessions selected non-existent column
  'strategy' — actual column is 'strategy_name'. Renamed in SELECT
  list; Python dict key unchanged. Introduced by Phase 2 T2.2 (commit
  46c7a15).

BUG-2 — engine_used + fallback_reason dropped from API response
  flint/api/routes/backtest.py:~687 built the response from
  tearsheet.to_dict() without the telemetry fields added in Phase 3 T3.2.
  Added two explicit lines after data_quality block.

BUG-3 — Four-way version mismatch (pyproject 1.3.1 / API 0.1.0 / UI 0.3.0)
  flint/api/routes/system.py:_get_version queried distribution 'flint'
  (wrong name — pyproject says 'flint-trading') and found stale phantom
  egg-info at 0.1.0/0.2.0. New preference order: pyproject.toml →
  flint-trading → flint → 0.0.0 fallback.
  ui/src/App.tsx footer was hardcoded 'FLINT v0.3.0'; now fetches from
  /api/v1/capabilities on mount and falls back to '?.?.?' on probe fail.

BUG-4 — Monaco editor pulled from cdn.jsdelivr.net
  Violated README + home-page "local-first, nothing leaves your machine"
  promise. @monaco-editor/react default loader fetched 14 files from
  jsdelivr on every /backtest page load.
  ui/src/components/CodeEditor.tsx now imports monaco-editor + calls
  loader.config({ monaco }) so Vite bundles Monaco into the local JS
  output. Verified via Playwright: cdnCount dropped from ~14 to 0.

Regression tests (tests/test_smoke_regressions.py, 4 classes):
- TestLiveSessionsColumnNameFix
- TestEngineUsedTelemetryInAPIResponse
- TestVersionConsistency
- TestMonacoLoadsLocallyNotFromCDN

Each class docstring carries pre-fix symptom + root cause + fix pointer.
Red confirmed by reverting each fix individually (for BUG-1/2/3) and
seeing the documented error surface; BUG-4 verified via live browser
network trace.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the fast-lane DEFERRED.md items. Scope matrix + remaining items
in DEFERRED.md (rewritten with closed/open split).

Phase 1 tail — all shipped:
  D-1.1.b  Rust close_all() takes FeeModel, charges exit fill
           (was fee=0.0 — Rust/Python force-close divergence closed)
  D-1.2-CI parity.yml already shipped in Phase 5.3 commit, verified
  D-1.3-providers  22 providers declared PIT_METADATA via batch script;
                   scripts/audit_pit.py reports 26/26 ✓
  D-1.4-api  GET /api/v1/paper/{session_id}/reconciliation returns
             engine-side fill summary (UI panel + POST variant =
             D-1.4-ui sibling)
  D-1.5-data-pins  docs/how-to/pin-notebook-fixtures.md explains
                   lightweight (hash-only) vs heavy (parquet-commit)
                   fixture workflows
  D-1.6-byo-fills  CustomCSVProvider(table="fills") parses user fill
                   logs with shared schema; CustomDataImport.fills
                   returns records for reconcile + calibrate paths

Phase 2 tail — partial:
  D-2.2-internal  flint/journal/storage.py + flint/paper/session_store.py
                  migrated off raw _store._conn / _store._lock via new
                  FlintStore._sql_{exec,read_all,read_one,...}
                  wrappers; grep now finds only doc comments
  D-2.4.b  /api/v1/backtest/run routes user code (req.code / user:*)
           through flint.strategy.sandbox.run_strategy_in_sandbox in
           sandbox-compatible configs; multi-market/margin/orderbook
           configs fall back to in-process with log line

Phase 4 tail — all shipped:
  D-4.1-wedge  README hero rewritten; comparison table split into
               "vs DeFi-native tools" + "vs general crypto bots";
               examples/ → notebooks/ in Try It; CEX live honestly
               "Planned"
  D-4.2-backoff  18 .catch(() => {}) sites across 10 UI files replaced
                 with structured console.warn (full exponential backoff
                 still D-4.2-backoff-full sibling)
  D-4.5-ui  useBacktest exposes cancel() + auto-POSTs /cancel on unmount
            when status === 'running'; BacktestLab shows CANCEL button
            while running
  D-4.7-mcp-inprocess (MVP)  MCP HTTP base URL configurable via
                             FLINT_API_URL env var (was hardcoded
                             127.0.0.1:8000 in 7 sites). Full
                             service-layer extraction = D-4.7-full

Still deferred (16 items, full scoping in DEFERRED.md):
  - D-1.4-ui, D-2.1.{b,c,d}, D-3.{1,3,4,5}-rust, D-4.{2,3}-full,
    D-4.7-full, D-5.1-ruff, D-6.{1,4,5,6,7}
  - All blocked on dedicated multi-day work (god-class breakup, Rust
    ports, live-deploy API, event sourcing, Jito bundle integration)

New files:
  docs/how-to/pin-notebook-fixtures.md

Modified:
  22 providers (PIT_METADATA added)
  flint/store.py (+5 _sql_* wrapper methods)
  flint/journal/storage.py + flint/paper/session_store.py (migrated)
  flint/api/routes/{backtest,paper}.py (+reconciliation route, sandbox gate)
  flint/providers/custom.py (+fills table parser)
  flint/mcp_server.py (FLINT_API_URL env override)
  rust/src/engine/positions.rs + runner.rs (FeeModel on close_all)
  ui/src/hooks/useBacktest.ts + pages/BacktestLab.tsx (cancel UI)
  ui/src/{hooks,pages,components}/* (10 files — silent-catch cleanup)
  README.md (wedge rewrite)
  DEFERRED.md (rewritten with closed/open split)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nals

D-1.1.b made Rust close_all charge exit fees, which combined with Phase 1
T1.1.f's force-close-appends-terminal behavior produces per-strategy
equity curves of length N+1 (N candles + one terminal after force-close).

portfolio/engine.py used to iterate `for i in range(n_candles)`, silently
dropping the terminal. Result: sum(per_strategy[-1]) == sum of terminals
but combined[-1] == sum at bar N (pre-close). Test
test_per_strategy_pnl_sums_align caught the divergence.

Fix: use max curve length across strategies; extend shorter curves with
their final value so the combined time series doesn't collapse tail
entries to zero. Asserts through the 5-test portfolio suite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
docs/specs/deferred-execution-plan.md — full delivery roadmap with
dependency graph, 5 waves, per-item action checklist, single-engineer
vs 3-engineer parallel sequences, risk register.

Critical path: D-2.1.b (god-class breakup) → D-2.1.c (live-context merge)
→ D-6.5-api (live deploy) → D-6.7-jito (Jito bundles). Calendar: ~17
weeks single engineer, ~12 weeks 3 engineers.

Wave breakdown:
  Wave 1 (weeks 1-3, 6 items): D-2.1.b + D-3.4-rust unlock dependencies;
                               D-1.4-ui + D-4.2-backoff-full + D-4.7-full
                               + D-5.1-ruff in parallel
  Wave 2 (weeks 4-5, 5 items): D-2.1.d, D-3.5-orchestrator, D-2.1.c,
                               D-3.1-rust, D-3.3-maker-detection
  Wave 3 (weeks 6-9, 3 items): D-6.5-api (XL), D-4.3-websocket, D-6.6-proof
  Wave 4 (weeks 10-14, 2 items): D-6.7-jito + D-6.1-unified
  Wave 5 (weeks 14-17, 1 item): D-6.4-replay

Cross-linked from IMPLEMENTATION_PLAN.md and DEFERRED.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
D-5.1-ruff (Wave 1): flip CI from soft `|| echo` to hard-fail on the
curated rule set. pyproject.toml selects only F401/F811/F821/F841
(real bugs: unused imports, redefined names, undefined names, unused
vars) and ignores the style noise (E402 from PIT_METADATA pattern,
E501 long lines, E701/E702 colon/semicolon multi-statements, E741
single-letter names) until an editorial sweep happens.

ruff --fix landed 315 mechanical cleanups across 100+ files.
Remaining 26 manual fixes:
- flint/store.py: TYPE_CHECKING block for lazy-imported model names
- flint/{analytics,providers,strategy}: drop dead unused-var assignments
- scripts/{backtest_funding_arb,populate_db}: drop unused vars
- tests/test_*: drop debug result/missing/has_liq_warning assignments

Targeted regression sweep on 8 modified-file test modules: 82/82 pass.
WAVE_STATUS.md updated, D-5.1-ruff marked 🟢 shipped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
D-4.7-full (Wave 1): factor the work-doing code out of FastAPI route
handlers and the MCP server into a callable service layer that doesn't
care whether a request came from HTTP, an MCP tool, or a script.

New modules under flint/services/:
- strategies — single source of truth for the 20-template builder map
  (was duplicated across backtest.py route and paper.py route)
- backtest — run_backtest_sync(req, store) → tearsheet dict; the
  synchronous one-shot path for MCP and notebooks. Out of scope:
  progress callbacks, sandbox routing, multi-market aggregate, MC
  gating (those still go through the HTTP route).
- journal — list/get/delete/compare wrappers over JournalStorage
- data — OHLCV / funding / borrow / market metadata reads
- paper — read-side helpers around PaperTradingEngine + a store-only
  fallback (`list_sessions_from_store`) for MCP queries when no
  daemon is running

MCP refactor (flint/mcp_server.py):
- run_backtest tool now calls run_backtest_sync directly — no HTTP
- list_journal_runs / compare_runs use the journal service
- get_paper_sessions falls back to the store fallback when HTTP fails
- Paper start/stop still legitimately go over HTTP (need the asyncio
  daemon owning the live session loop)

Routes thinned:
- flint/api/routes/journal.py — pure adapter, ~25 lines
- flint/api/routes/paper.py — _BUILDERS dict deleted (now in service);
  _build_strategy delegates to services.strategies.build_strategy
- flint/api/routes/backtest.py — _build_strategy delegates to service;
  drops the 100-line builder map and 20 strategy-class top-level imports

Tests:
- tests/test_mcp_standalone.py — 12 tests covering each service
  module, plus three acceptance checks that mcp_server.py imports the
  service layer for the core tools (the spec's "no HTTP for backtest /
  journal / data" requirement)
- 105/105 regression tests green across journal/paper/data/walk-forward
  paths

WAVE_STATUS.md: D-5.1-ruff and D-4.7-full both 🟢. Next: D-2.1.b step 1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Step 1 of 7 in breaking up BacktestContext (current: 971 lines, owning
positions, cash, fills, fees, funding, borrow, orderbook, OI, margin,
allocator, capital transfers — the textbook god class).

This commit pulls position-state ownership out into
`flint/execution/position_manager.py:PositionManager` so the next slices
(CashManager, FillRecorder, Risk surface, Funding/Borrow ledgers) have
a stable boundary to compose against.

Approach: BacktestContext keeps `self._pm = PositionManager()` and
exposes `self._positions` / `self._closed_positions` as @Property
aliases that return the manager's underlying mutable dict/list. That
lets the 100+-line `_apply_fill` body and the rest of the call sites
(apply_funding, check_liquidations, close_all_positions) keep working
unchanged — they're still doing `self._positions[key] = pos` and
`del self._positions[key]`. Steps 2–7 migrate those call sites to
explicit `self._pm.set/delete/record_close` calls so the aliases can
go away.

PositionManager surface (kept narrow on purpose):
- get / set / delete on (venue, market) keys
- dict-like __contains__ / __iter__ / __len__ / keys / values / items
- update_pnl_for_market(market, price) — bulk PnL refresh
- record_close(record), closed property — closed-trade ledger
- positions property — direct dict access for the margin engine and
  the future Rust-side adapter (commented as "treat as private")

Tests:
- tests/test_position_manager.py — 8 unit tests covering single-position
  ops, dict-like surface, PnL update for one-vs-many markets, closed
  ledger immutability, BacktestContext integration (legacy alias
  routes through manager)
- 90/90 regression tests pass on tests touching BacktestContext
  (test_backtest_v2, test_context_portability, test_fill_pipeline,
  test_jupiter_backtest, test_multi_market,
  test_multi_venue_margin_integration, test_venue_fill_dispatch)

WAVE_STATUS.md: D-2.1.b → 🟡 (1/7 shipped). Steps 2–7 stay scoped as
follow-up work in the deferred backlog. Wave 1 next: D-1.4-ui.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
D-1.4-ui (Wave 1): POST /api/v1/paper/{session_id}/reconciliation
accepts a multipart CSV of venue-reported fills, matches against the
engine fills already persisted for that session, and returns the same
counts + p50/p95/p99 bps/ts deltas the reconcile_fills CLI emits.

Backend:
- scripts/reconcile_fills.py — extract _parse_venue_fills_reader so
  both the path-based load_venue_fills_csv (CLI) and the new
  parse_venue_fills_csv_text (HTTP route) share validation. New
  CSVSchemaError lets the API return 400 instead of SystemExit-on-CLI
  vs 500-on-HTTP.
- flint/api/routes/paper.py — POST handler caps uploads at 10 MB,
  rejects non-UTF-8 with 400, decodes the CSV, calls reconcile() over
  store.get_live_fills(session_id) + the venue fills.

UI:
- ui/src/pages/PaperTrading.tsx — hidden <input type="file"> +
  "RECONCILE FILLS" button next to the existing PARITY TEST button.
  Result panel mirrors the parity-report layout: counts grid
  (matched / engine-only / venue-only / venue-total) + bps/ts
  percentile grid, color-coded on the p95-bps > 10 threshold.
- Schema/upload errors show a [RECONCILE] inline banner.
- Same-file uploads work twice in a row (input.value cleared after).

Tests:
- tests/test_reconciliation_endpoint.py — 6 cases covering empty
  engine fills, schema-error → 400, oversized upload → 400, non-UTF-8
  → 400, plus end-to-end reconcile() unit checks for ts-window match
  and out-of-window mismatch.
- 20/20 paper-route regression tests still pass; ruff clean; UI vite
  build green.

WAVE_STATUS.md: D-1.4-ui → 🟢. Wave 1 progress: 4/6 first-cuts
shipped (D-5.1-ruff, D-4.7-full, D-2.1.b step 1, D-1.4-ui).
Remaining: D-3.4-rust, D-4.2-backoff-full.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
D-4.2-backoff-full (Wave 1): replace ad-hoc setInterval/setTimeout loops
in the polling hooks with one shared primitive that handles the
boring boilerplate everyone gets wrong:

- 1 s → 2 s → 5 s → 10 s → 30 s backoff schedule on consecutive errors
- AbortController per request so an unmounting component never
  resolves a stale fetch and sets state on a dead hook
- errorCount / lastError / nextRetryIn surfaced in the return so
  ConnectionBanner can show "retry in 5s" instead of pretending
  everything is fine

Migrations in this commit:
- useLiveMonitor: equity + fills polled together via Promise.all,
  friendly "is flint serve running?" string still applied
- usePaperPortfolio: same friendly-error map, errorCount/nextRetryIn
  exposed for the dashboard banner
- useSessionStatus: simplest case — drops directly to the new hook

Skipped on purpose:
- useBacktest, useOptimize — these are job-completion-driven (poll
  until status==complete then stop forever). They have their own poll
  cancellation against an explicit run id; useBackoffPoll's
  steady-state mental model is wrong for them and forcing a migration
  would regress the cancellation UX. Documented in WAVE_STATUS.md.

Tests:
- src/test/hooks/useBackoffPoll.test.ts — 5 cases covering happy
  path, enabled=false skip, errorCount escalation, recovery on later
  success, and abort-on-unmount. Real timers (RTL + fake timers don't
  compose cleanly).
- 127/127 vitest pass; vite build clean.

WAVE_STATUS.md: D-4.2-backoff-full → 🟢. Wave 1 progress: 5/6 first
cuts shipped. Only D-3.4-rust (Rust port, requires cargo + PyO3
expertise) remains as a sibling-PR follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
D-3.4-rust (Wave 1): the last Wave 1 item. Moves the three venue cost
models (Solana/Drift, Hyperliquid, CEX) into Rust so the hot fill loop
doesn't round-trip through Python for every trade's cost breakdown.

Rust side:
- rust/src/engine/tx_costs.rs — CostEstimate struct (mirrors the Python
  dataclass field-for-field) + TxCostModel enum with three variants.
  Factory `for_venue()` replicates `get_tx_cost_model()` including the
  "unknown venue → CEX" fallback and case-insensitive matching.
  Historical p50/p90 lamport fees collapse from the Python dict into
  two Optional<u64> fields to avoid a HashMap on the hot path.
- rust/src/lib.rs — PyO3 class `TxCostModel` with four static ctors
  (`for_venue`, `solana`, `hyperliquid`, `cex`) and an
  `estimate(market, size, price, urgency) → dict` method returning the
  exact shape of `CostEstimate.to_dict()`.
- capabilities(): `supports_tx_costs` flipped from false to true.

Tests:
- 6 cargo tests in engine/tx_costs.rs covering default/urgent/fallback
- tests/test_rust_tx_cost_parity.py — 13 Python↔Rust parity tests
  pinned to 1e-9 tolerance across every field. Covers default cases,
  urgent-vs-normal p90 selection, custom fee bps, unknown-venue
  fallback, case-insensitive venue lookup, and two edge cases (zero
  size still charges network fee; $1B notional stays inside tolerance).
- Micro-benchmark: 200k iterations on the tight path → 137 ms Python vs
  61 ms Rust (2.24×). FFI overhead dominates the single-call
  measurement — the real win is that the Rust engine's internal fill
  loop can call Rust-to-Rust without crossing the Python boundary at
  all, which unblocks the D-1.1.b/D-3.1 Rust fill-pipeline work.

47/47 Rust-suite tests green. ruff clean. WAVE_STATUS.md:
D-3.4-rust → 🟢. Wave 1 complete (6/6 first cuts shipped).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
D-3.3-maker-detection (Wave 2): tag FillResult.is_maker so passive
limit fills pay the maker rate (or rebate) rather than the taker
rate. Wired into the existing `FeeModel::Drift`/`Hyperliquid`/
`MakerTaker` variants that were shipped in Phase 3 T3.3 but unused.

Rust side:
- types.rs — FillResult gains `is_maker: bool`. All 12 constructors
  in engine/{fills,orders,venue_fills,positions,fees}.rs default it
  to false.
- engine/orders.rs — `process_pending_orders` sets `is_maker: true`
  on Limit fills (resting orders filled by a later bar's range).
  Both `process_pending_orders` and `process_market_orders` now call
  `compute_fee_with_role(fill, fill.is_maker)` instead of the
  role-blind `compute_fee`. Market orders stay `is_maker=false`.

PyO3:
- `RustEngine.__init__` gains `fee_model` + `maker_bps` + `taker_bps`
  kwargs. `"flat"` (default) preserves the old flat-bps behavior.
  `"drift"` / `"hyperliquid"` pick the venue-specific schedules;
  `"maker_taker"` takes explicit bps. Capability flags
  `supports_maker_taker_fees` and the `fee_models` list are updated
  to reflect the new surface.

Tests:
- tests/test_rust_maker_detection.py — 6 behavioral cases:
    * Drift rebate nets ~$0.079 total (resting long at 95 rebates
      $0.019, market close at 98 pays $0.098 taker). Under a bug
      where maker wasn't detected, the total would double.
    * Hyperliquid (1 bp maker / 3.5 bp taker) lands in [0.03, 0.05].
    * Flat fee path unchanged (maker tag set but not observable).
    * Explicit MakerTaker with maker=-5, taker=20 matches expected.
    * Capability flag assertions.
- 163/163 Rust-suite regression green. ruff clean.

WAVE_STATUS.md: D-3.3-maker-detection → 🟢 (Wave 2).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
D-3.1-rust (Wave 2): the hottest path on the orderbook-aware fill
pipeline now runs in Rust. One `flint_core.OrderbookFiller` call
replaces the per-fill Python loop over book levels.

Rust side:
- rust/src/engine/orderbook_fill.rs — `BookSnapshot` { bids, asks },
  `OrderbookFiller { reject_on_insufficient_depth }`,
  `walk_market(side, size, book) → Option<OrderbookWalk>` where
  `OrderbookWalk { price, size, impact_bps, is_partial }`. Algorithm
  mirrors `OrderbookFillModel._walk_book` line-for-line:
    * pick asks for Long, bids for Short
    * reject when order_size > total depth on the taking side
      (when `reject_on_insufficient_depth` is true — the default)
    * walk levels in order, accumulate size × price, compute VWAP
    * signed impact_bps = (vwap - mid) / mid × 10_000, positive =
      taker-unfavorable; mid falls back to the one-sided price
      when only one side of the book has levels.
- PyO3: `flint_core.OrderbookFiller` class with `walk_market(side,
  size, bids, asks) → dict | None`. Callers pre-sort their book
  snapshots (bids descending, asks ascending) just like the Python
  `OrderbookSnapshot` dataclass enforces today.
- Capability flag `supports_orderbook_walk` flipped to true.

Tests:
- 9 cargo tests in engine/orderbook_fill.rs (long/short VWAP, reject,
  partial, empty, impact_bps sign on both sides, one-sided mid, empty
  mid)
- tests/test_rust_orderbook_parity.py — 13 Python↔Rust parity cases
  across varied order sizes, rejection vs partial behavior, impact
  sign verification, and capability flag.
- Micro-benchmark (20 levels/side, 100k iterations): Python 280 ms
  vs Rust 80 ms ≈ 3.52× speedup. Unlike the TxCost port's FFI-bound
  result, this one gets real gains because each walk allocates and
  iterates per fill.

176/176 Rust-suite regression green. ruff clean. WAVE_STATUS.md:
D-3.1-rust → 🟢. Wave 2 progress: 2/5 items shipped
(D-3.3-maker-detection + D-3.1-rust); remaining 3 are blocked on
later D-2.1.b steps or on testnet secrets.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Step 2 of 7 in breaking up BacktestContext. After Step 1 pulled
position state into PositionManager, this slice does the same for
cash + running counters.

flint/execution/cash_manager.py:
- Single-ledger `CashManager` owning `cash`, optional `allocator`, and
  the three running counters (`total_fees`, `total_tx_costs`,
  `total_funding`). `__slots__` keeps it lean.
- `debit(amount, venue)` and `credit(amount, venue)` mirror the
  pre-extraction `_debit_cash` / `_credit_cash` helpers exactly,
  including the allocator's `track_pnl` call on credit so per-venue
  PnL ledgers stay in sync.
- `available()` and `balances()` cover the venue-balance helpers the
  BacktestContext exposed for tests.

BacktestContext changes:
- `__init__` constructs `self._cm = CashManager(initial_capital,
  allocator=capital_allocator)` instead of holding `self._cash`,
  `self._allocator`, and the three counters as plain attributes.
- Property aliases `_cash` (read+write), `_allocator` (read-only),
  `_total_fees` / `_total_tx_costs` / `_total_funding` (read+write)
  keep the 20+ existing call sites working: `self._cash -= x` and
  `self._total_funding += p` route through the property setters
  back into the manager. Steps 3–7 migrate those to explicit
  `self._cm.debit/credit/add_*` calls.

Tests:
- tests/test_cash_manager.py — 16 cases covering init (with and
  without allocator), debit/credit (allocator-aware rejection,
  per-venue update, track_pnl), counter accumulation, compound
  assignments, balance helpers, and BacktestContext integration
  (legacy alias routes through manager, no class-level state
  pollution between instances).
- 134/134 regression tests on BacktestContext-using paths still
  pass (test_backtest_v2, test_fill_pipeline, test_jupiter_backtest,
  test_multi_market, test_multi_venue_margin_integration,
  test_venue_fill_dispatch, test_position_manager, test_tx_costs,
  test_funding_arb, test_funding_dislocation_arb,
  test_midnight_gardener, test_safety_integration).
- ruff clean.

WAVE_STATUS.md: D-2.1.b → 2/7 shipped (still 🟡). Steps 3–7
(FillRecorder, FundingLedger, BorrowLedger, OrderbookCache, Risk
surface) remain as follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Step 3 of 7 in breaking up BacktestContext. After PositionManager
(step 1) and CashManager (step 2), this slice pulls the two
append-only ledgers (recorded fills + diagnostic log messages) into
their own owner.

flint/execution/fill_recorder.py:
- `FillRecorder` with `__slots__` for the two lists. Surface:
    * `fills` (mutable list, used by hot-path call sites)
    * `record(fill)` and `all_fills()` for new code
    * `logs` (mutable list)
    * `log(msg)` and `messages()` for new code

BacktestContext changes:
- `__init__` constructs `self._fr = FillRecorder()` instead of
  holding `self._fills` / `self._log_messages` as plain lists.
- Read-only property aliases `_fills` and `_log_messages` return
  the manager's underlying mutable lists, so existing call sites
  (`self._fills.append(...)` in _apply_fill, `self._log_messages
  .append(...)` in market_order's reduce_only / margin paths,
  log() helper, check_liquidations) keep working unchanged.
- Public `all_fills` and `log_messages` properties read through
  the manager.

Tests:
- tests/test_fill_recorder.py — 10 cases covering record/log
  ordering, copy semantics on snapshots, legacy-append-via-property,
  BacktestContext integration (recorder ownership, alias routing,
  public-property reads), and per-instance isolation.
- 127/127 regression tests on BacktestContext-using paths still
  green. ruff clean.

WAVE_STATUS.md: D-2.1.b → 3/7 shipped (still 🟡). Steps 4–7
(OrderQueue, FundingLedger, BorrowLedger, Risk surface) remain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Step 4 of 7 in breaking up BacktestContext. Pulls the two order
queues (resting limit/stop/TP orders + this-bar market orders) into
a single owner.

flint/execution/order_queue.py:
- `OrderQueue { pending, market_queue, pending_cap }` with
  `__slots__`. Default cap of 100 mirrors the pre-extraction
  `len >= 100 → drop` behavior in `_check_order_cap`.
- Surface:
    * `add_pending(order) → bool` — False at cap (caller drops)
    * `cancel(order_id) → bool`
    * `cancel_all(market=None) → int` — count removed
    * `pending` / `market_queue` properties read+write
    * `add_market(order)` and `drain_market()` (atomic swap so
      orders submitted during drain go to the fresh queue)
    * `snapshot()` for `BacktestContext.pending_orders` semantics

BacktestContext changes:
- `__init__` constructs `self._oq = OrderQueue()` instead of two
  plain `List[Order]`.
- `_pending_orders` and `_market_orders_queue` become read+write
  property aliases. The setters are load-bearing — `cancel_all` and
  `process_pending_orders` rebuild via
  `self._pending_orders = [filtered list]`, and the setter routes
  that into `self._oq.pending = ...` so the manager keeps a single
  reference.

Tests:
- tests/test_order_queue.py — 16 cases covering append/cap, cancel
  by id, cancel_all (no-arg + by market), market-queue drain
  semantics, list reassignment via setters, BacktestContext
  integration (legacy append, legacy reassignment, public
  `pending_orders` property), and per-instance isolation.
- 134/134 regression tests on BacktestContext-using paths still
  green. ruff clean.

WAVE_STATUS.md: D-2.1.b → 4/7 shipped (still 🟡). Steps 5–7
(FundingLedger, BorrowLedger, Risk surface) remain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Step 5 of 7 in breaking up BacktestContext. Pulls the two funding
dictionaries (flat per-market history + per-venue split) into a
dedicated owner with the strategy-facing helpers built in.

flint/execution/funding_ledger.py:
- `FundingLedger` owns `_history: Dict[market, List[FundingRate]]`
  and `_venue_history: Dict[market, Dict[venue, List[FundingRate]]]`
  via `__slots__`.
- Surface mirrors the pre-extraction get_funding_* API:
    * `add(fr)` — single source of truth for venue-tagged appends
    * `latest(market)` → Optional[float]
    * `recent(market, lookback)` → List[(ts, rate)]
    * `by_venue(market, lookback)` → Dict[venue, List[(ts, rate)]]
    * `venue_snapshots(market, lookback)` → full FundingRate objects
      (used by funding_arb / funding_dislocation_arb for mark/oracle
      prices, not just rate)

BacktestContext changes:
- `__init__` constructs `self._fl = FundingLedger()` instead of two
  bare dicts.
- `add_funding_rate` is now a one-line `self._fl.add(fr)`.
- `get_funding_rate`, `get_funding_rates`, `get_funding_by_venue`,
  `get_venue_snapshots` collapse to one-line ledger calls (the
  per-method market resolution stays for backward-compat).
- Read-only property aliases `_funding_history` and `_venue_funding`
  return the ledger's underlying dicts so any existing test that
  peeks into internals still passes.

Tests:
- tests/test_funding_ledger.py — 13 cases covering empty/latest/
  lookback truncation, venue grouping, full-snapshot access, and
  BacktestContext integration (add/get round-trips, legacy alias
  reads, per-instance isolation).
- 125/125 regression tests green across funding-using paths
  (test_funding_arb, test_funding_dislocation_arb, test_paper_funding,
  test_paper_multi_venue, test_backtest_v2 etc.). ruff clean.

WAVE_STATUS.md: D-2.1.b → 5/7 shipped (still 🟡). Steps 6–7
(BorrowLedger, Risk surface) remain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Step 6 of 7 in breaking up BacktestContext. Owns the Jupiter Perps
borrow-rate history plus the running paid-borrow counters and the
per-trade payment ledger that `_apply_fill` writes into for
tearsheet attribution.

flint/execution/borrow_ledger.py:
- `BorrowLedger { _history, _payments, total_paid }` with `__slots__`.
- Surface:
    * `record(snapshot)` — single source of truth for rate appends
    * `record_payment(payment)` — per-trade attribution dict
    * `add_paid(amount)` — running counter helper
    * `latest(market)` → most recent rate_hourly
    * `recent(market, lookback)` → List[(ts, rate_hourly)]
    * `cumulative_at(market, ts)` — value at-or-before ts (linear
      scan because snapshots are append-ordered by arrival, same
      as the pre-extraction code)

BacktestContext changes:
- `__init__` constructs `self._bl = BorrowLedger()` instead of two
  separate fields + a list.
- `add_borrow_rate`, `get_borrow_rate`, `get_borrow_rates`, and
  `get_borrow_cumulative_at` collapse to one-line delegations.
- Read-only property aliases `_borrow_history` and `_borrow_payments`
  return the ledger's underlying containers; `_total_borrow_paid`
  is read+write (the load-bearing case is `_apply_fill`'s
  `self._total_borrow_paid += borrow_cost`).
- Public `total_borrow_paid` property still reads from the ledger.

Tests:
- tests/test_borrow_ledger.py — 13 cases covering record/latest/
  lookback/cumulative_at boundary behavior, payment append, total_paid
  accumulation, BacktestContext integration (compound assignment via
  alias, public property reads), per-instance isolation.
- 98/98 regression tests green across borrow-using paths
  (test_jupiter_backtest, test_backtest_v2, the four step-1..5
  ledger test files). ruff clean.

WAVE_STATUS.md: D-2.1.b → 6/7 shipped (still 🟡). Step 7 (Risk
surface) is the last slice; after it the BacktestContext is a
thin orchestrator over six small components.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Step 7 of 7 in breaking up BacktestContext. Pulls the three market-
data caches (cross-market candle histories, orderbook snapshots,
open-interest snapshots) into one owner. After this commit, every
piece of mutable state that BacktestContext used to own has moved
into one of seven dedicated managers.

flint/execution/market_data_feed.py:
- `MarketDataFeed` owns:
    * `_market_histories: Dict[str, List[Candle]]` (multi-market
      candle access used by check_liquidations, set_candle's
      cross-market PnL refresh)
    * `_orderbook_history: Dict[str, List]` (per-market book
      snapshots used by the orderbook fill model)
    * `_oi_history: Dict[str, List]` (per-market OI snapshots)
- Surface: `set_histories(d)`, `candles(market, lookback)`,
  `markets()`, `add_orderbook(snap)`, `latest_orderbook(market)`,
  `add_open_interest(oi)`, `latest_oi(market)` →
  `Optional[(long_oi, short_oi)]`, `oi_recent(market, lookback)`.
- `market_histories` exposed as read+write because the engine
  occasionally reassigns the dict directly (e.g. `set_candle`'s
  fallback path on cross-market lookups).

BacktestContext changes:
- `__init__` constructs `self._mdf = MarketDataFeed()` instead of
  three plain dicts.
- `set_market_histories`, `get_candles`, `markets`,
  `add_orderbook_snapshot`, `get_orderbook`, `add_open_interest`,
  `get_open_interest`, `get_open_interest_history` collapse to
  one-line delegations.
- Property aliases `_market_histories` (read+write),
  `_orderbook_history`, `_oi_history` (read-only) preserve every
  legacy access pattern (`for mkt, hist in self._market_histories.items()`
  in check_liquidations etc. keeps working unchanged).

Tests:
- tests/test_market_data_feed.py — 12 cases covering empty/set/
  iterate, lookback truncation, orderbook/OI round-trips,
  BacktestContext integration (legacy alias reassignment, public
  property reads), per-instance isolation.
- 176/176 regression tests green across all BacktestContext-using
  paths. ruff clean.

WAVE_STATUS: D-2.1.b shows 7/7 state-extraction steps shipped (still
🟡 because caller-site migration to explicit manager calls + the
"BacktestContext < 300 LOC" reduction land in a follow-on PR — the
manager surfaces are already in place to support it).

This loop's progress: 5 D-2.1.b slices shipped (steps 3–7).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
D-3.5-orchestrator (Wave 2): single pre-trade check facade composing
the three pre-existing engines (MarginEngine, VenueAllocator,
PortfolioRiskEngine). BacktestContext.market_order now consults one
object instead of inlining venue-margin checks.

flint/risk/portfolio_orchestrator.py:
- `PortfolioMarginEngine(margin=, allocator=, portfolio=)` — any
  sub-engine is optional; omitting it makes the corresponding check
  a no-op pass so existing backtest paths (margin-only, no allocator,
  no portfolio risk) stay behaviorally identical.
- `check_order(order, cash, positions, price, equity) →
  PortfolioCheck(approved, reason, component)` runs the three
  checks in priority order:
    1. Allocator — fastest reject (per-venue available cash)
    2. MarginEngine — venue-level margin/leverage cap
    3. PortfolioRiskEngine — book-level gross/net/concentration/VaR
  First failure short-circuits; `component` carries the source name
  so callers can name which engine vetoed.
- `check_liquidations(positions, prices, ts)` and
  `check_kill_switch(equity)` delegate to the relevant sub-engine
  (or return [] / False when omitted).

BacktestContext changes:
- New `portfolio_risk=None` ctor kwarg threads a
  PortfolioRiskEngine into the orchestrator.
- `__init__` now always builds `self._pme` (None-tolerant
  internally), so the pre-trade gauntlet has a stable callable.
- `market_order` replaces the inline margin block with a single
  `self._pme.check_order(...)` call. The reduce_only short-circuit
  stays — closing exposure can't fail margin/book checks.
- Reject log line tags the originating component
  (`MARGIN REJECTED` / `ALLOCATOR REJECTED` / `PORTFOLIO REJECTED`)
  instead of the previous flat `MARGIN REJECTED` regardless of cause.

Tests:
- tests/test_portfolio_orchestrator.py — 16 cases:
    * No-op pass when all three engines unset
    * PortfolioCheck.__bool__ / reason / component
    * Margin rejection (oversized notional)
    * Allocator short-circuits before margin engine even runs
    * Portfolio gross-exposure cap rejects
    * Priority order proven by stacking failures
    * BacktestContext integration: orchestrator constructed,
      market_order log line carries component tag, success path
      reaches the queue, portfolio_risk ctor kwarg threads through
    * reduce_only bypasses every check (closing exposure path)
- 207/207 regression tests green across margin/portfolio/safety/
  funding/multi-venue/jupiter paths. ruff clean.

WAVE_STATUS: D-3.5-orchestrator → 🟢. Wave 2 progress so far in
the autonomous loop: D-3.3-maker, D-3.1-rust, all 7 D-2.1.b state
slices, D-3.5-orchestrator. The remaining Wave 2 items
(D-2.1.c live-context merge, D-2.1.d paper-context split) are
blocked on testnet secrets / a deliberate paper API design pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
D-2.1.b caller-site migration (final slice). After all 7 state
extractions landed last loop, this commit walks through every
state-mutating call site in BacktestContext and routes it through
the right manager directly — no more `self._cash -= x`, no more
`self._positions[k] = ...`, no more `self._total_funding += p`.

Migrations:
- apply_funding: iterates `self._pm.items()` instead of the legacy
  dict alias; `self._cm.debit(payment, venue)` and
  `self._cm.add_funding(payment)` replace the `_debit_cash` helper +
  compound assignment.
- check_liquidations: uses `self._pm.get/delete/record_close` for
  position state, `self._cm.credit/add_fee` for cash, the
  `self._oq.pending = filtered` setter for pending-order cleanup,
  and `self._fr.log` for the LIQUIDATED warn line.
- process_pending_orders: reads `self._oq.pending`, writes back via
  the setter (replaces the legacy reassignment idiom).
- process_market_orders: `self._oq.drain_market()` atomic swap (so
  fills queued during the loop go to the next bar), `self._fr.log`,
  and a new `_add_pending_or_warn` helper for GTC resting drains.
- _apply_fill: factored two helpers — `_realize_jupiter_borrow_cost`
  (handles partial + full close + flip cases identically) and
  `_new_position` (Jupiter cum-borrow snapshot at entry). Main body
  is now ~80 lines of pure routing through
  `_fr.record / _cm.debit/credit/add_fee/add_tx_cost / _pm.set/get/
  delete/record_close / _bl.add_paid/record_payment`.
- close_all_positions: `self._pm.keys() / get((v, m))`.
- set_candle: `self._pm.update_pnl_for_market` + `_mdf.history_for`
  (the cross-market PnL mark).
- limit_order/stop_order/take_profit_order/cancel/cancel_all: route
  through `self._oq.add_pending/cancel/cancel_all`.
- account/positions/pending_orders: manager-direct reads.
- total_fees/total_funding/total_tx_costs/total_borrow_paid/
  log_messages: manager-direct reads.
- venue_balance/balances/transfer/process_transfers:
  `self._cm.available/balances/allocator`.
- Internal `_debit_cash` and `_credit_cash` helpers deleted.

API additions:
- Public `borrow_payments` property so `flint/backtest/engine.py`
  no longer reaches into `ctx._borrow_payments` private state.

Legacy property aliases (_cash/_positions/_fills/etc.) stay because
existing tests deliberately exercise them; new code should not use
them.

Tests:
- 262/262 regression tests green across:
    test_backtest_v2, test_multi_venue_margin_integration,
    test_jupiter_backtest, test_safety_integration,
    test_position_manager, test_cash_manager, test_fill_recorder,
    test_order_queue, test_funding_ledger, test_borrow_ledger,
    test_market_data_feed, test_portfolio_orchestrator,
    test_funding_arb, test_funding_dislocation_arb,
    test_multi_market, test_venue_fill_dispatch,
    test_context_portability, test_fill_pipeline, test_tx_costs,
    test_tx_cost_integration, test_paper, test_paper_funding,
    test_paper_multi_venue.
- ruff clean.

WAVE_STATUS: D-2.1.b → 🟢 (state extraction + caller migration).
This closes the deepest D-2.1.b unlock; D-2.1.c (live-context
merge) and D-2.1.d (paper-context split) remain blocked on testnet
secrets / a deliberate paper-API design pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wave 3 first-cut. The existing PortfolioEngine runs each strategy
on its own slice of capital — that's a portfolio of independent
runs, not a shared-capital book. This new engine puts every
strategy on **one** BacktestContext so cash, fees, funding, borrow,
and the orchestrator's pre-trade margin gauntlet all see the
actual book exposure.

Unblocked by D-2.1.b (state extraction + caller migration) and
D-3.5-orchestrator (PortfolioMarginEngine). The seven managers
inside BacktestContext now make composing one for many strategies
straightforward — a single CashManager debits across all strategy
fills; a single PositionManager nets long+short across strategies;
a single PortfolioMarginEngine sees combined exposure for caps.

flint/portfolio/shared_engine.py:
- `_TaggedContextProxy` wraps the shared `BacktestContext` per
  strategy:
    * Forwards all reads (account, positions, funding/borrow/
      orderbook queries, candles, log) — strategies see the
      whole book, by design.
    * Tags every order_id with `strategy_name:` so fills carry
      attribution back to the originating strategy.
    * `cancel_all(market=...)` only cancels orders owned by the
      calling strategy (matched by tag prefix).
- `SharedCapitalPortfolioEngine.run(candles)` walks the bar loop:
  set_candle → process_pending → strategy callbacks (each via its
  proxy) → process_market_orders → record equity. End-of-run
  close_all_positions for clean attribution.
- `SharedPortfolioResult` carries combined equity + per-strategy
  trade counts, fill streams, and PnL splits, plus warnings
  surfaced from the shared ctx.log_messages.

Tests:
- tests/test_shared_capital_portfolio.py — 8 cases:
    * Empty strategy list rejected
    * No-candles run returns initial capital
    * Two strategies firing once each both produce trades against
      the shared book
    * Fill streams are tagged per-strategy
    * Proxy forwarding (account, market_order id format)
    * `cancel_all` is per-strategy-scoped
    * Warnings list propagates from shared ctx
- 270/270 regression tests green across backtest/multi-venue/
  jupiter/funding/paper/portfolio paths. ruff clean.

Out of scope (follow-on PR):
- Per-strategy capital caps (would need an allocator with
  tagged sub-buckets).
- Closed-trade attribution by trade-id rather than the current
  even-split-by-market heuristic.
- Dollar-neutral rebalancing across strategies.

WAVE_STATUS: D-6.1-unified → 🟡 (foundation; refinements
deferred). Wave 3 progress: 1 of 4 items started.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wave 5 first slice. Lays the storage primitives that future
snapshot + time-travel-replay features will read. This commit ships
only the **append + read** path; engine writer hooks, snapshot
compaction, and the actual replay primitive land in follow-on
slices once the writer is exercised by real engine runs.

flint/portfolio/event_log.py:
- New `portfolio_events(session_id, seq, ts, kind, payload)` table.
  Composite PK enforces monotonic seq per session without DuckDB
  AUTOINCREMENT (which can't reset per-group). Index on
  (session_id, ts) for the future `read_until(target_ts)` replay path.
- `EventKind` constants: `order.submit`, `order.cancel`, `fill`,
  `liquidation`, `funding`, `borrow`. Payloads are JSON for forward
  compatibility — schema can grow new optional fields without a
  table migration.
- `EventLogWriter` is thread-safe: a `threading.Lock` serializes all
  DB-touching ops; per-session `next_seq` cache backed by
  `MAX(seq)+1` query on cold start so process restarts pick up
  cleanly. Exposes `append(...)` + `append_many(...)` (bulk variant
  using DuckDB executemany).
- `EventLogReader` provides `read_all`, `read_until(target_ts)`
  (filter by event-time, ordered by seq), `count`, `latest_seq`.
- `PortfolioEvent` dataclass with `to_row()` / `from_row()` for the
  JSON ↔ Python round trip.

Tests:
- tests/test_event_log.py — 15 cases covering:
    * Schema idempotency (table created, double-init no-op)
    * Seq starts at 0 per session, monotonic within, independent
      across sessions
    * Payload JSON round-trips with nested dicts/lists
    * append_many assigns consecutive seqs and continues from
      individual appends
    * Cross-writer seq recovery (process-restart simulation)
    * read_all orders by seq (not ts), read_until filters by ts
    * Thread safety: 8 threads × 25 events each → dense 0..199
      sequence with no duplicates or gaps
    * PortfolioEvent dataclass round-trip

ruff clean.

WAVE_STATUS: D-6.4-replay → 🟡 (foundation; snapshot + replay
deferred). Wave 3 progress so far: D-6.1-unified foundation +
D-6.4-replay foundation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sohan-shingade and others added 26 commits April 25, 2026 00:38
Slice 2 of 5 for D-6.4-replay. Builds on the event-log foundation
(slice 1) by adding the fold + replay functions that turn an event
stream into a `BookState` at any target_ts.

flint/portfolio/replay.py:
- `BookState` dataclass owns: cash, initial_capital, total_fees,
  total_tx_costs, total_funding, total_borrow_paid, positions dict
  (per (venue, market)), realized_pnl, and three event counters
  (fill / liquidation / order_submit / order_cancel).
- `BookState.equity_at(prices)` computes equity given a mark-price
  map. Positions without a price contribute 0 unrealized PnL —
  callers feed prices for the markets they care about.
- `fold(events, initial_capital) → BookState` is pure Python over
  any iterable. Used by tests and the future snapshot compactor
  without needing a store.
- `replay(store, session_id, target_ts, initial_capital)` reads
  `EventLogReader.read_until(target_ts)` and folds. Slice 3 will
  accept an optional `from_snapshot=BookState` for compaction.

Fill semantics match `BacktestContext._apply_fill`'s post-migration
code one-for-one:
- Open    : new (venue, market) entry
- DCA     : same-side adds → average entry price
- Partial : opposite-side smaller → realize size×Δprice, shrink
- Full    : opposite-side equal → realize, drop
- Flip    : opposite-side larger → realize, drop, open remainder
            on new side at fill price

Funding / liquidation / borrow folders mirror the engine's cash-debit
semantics. Unknown event kinds are ignored — forward compat for
schema growth (existing logs replay against newer code without a
migration).

Tests:
- tests/test_replay.py — 17 cases:
    * Empty fold
    * Fill: open / DCA / partial / full / flip / short-PnL-sign
    * Funding debit (positive + negative payments)
    * Liquidation drops position + books loss + penalty fee
    * Borrow cost debits cash + counter
    * Order submit/cancel counter increments
    * Unknown kind silently ignored (forward compat)
    * `equity_at(prices)` includes unrealized PnL; missing markets
      contribute 0
    * Storage-backed `replay()`: filters by ts, isolates by
      session_id, deterministic across two calls
    * Full-lifecycle scenario: open → funding → borrow → close →
      verify final cash matches expected ledger
- 32/32 between event_log + replay tests. ruff clean.

WAVE_STATUS: D-6.4-replay slice 2 noted. Snapshot compaction
(slice 3), engine writer hooks (slice 4), time-travel UI (slice 5)
remain. Wave 5 progress: 2/5 slices.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ice 3)

Slice 3 of 5 for D-6.4-replay. Long-running sessions accumulate
millions of events; replaying from seq=0 every time would be O(N).
This slice caches periodic full-state snapshots so replay can
fast-forward past the early events.

flint/portfolio/snapshots.py:
- New `portfolio_snapshots(session_id, seq, ts, payload)` table.
  Composite PK so the compactor can `INSERT OR REPLACE` idempotently
  on re-runs. Index on (session_id, ts) for `latest_before`.
- `_state_to_json` / `_state_from_json` serialize the full
  BookState including positions. The (venue, market) tuple keys
  flatten to "venue|market" strings for JSON compat.
- `SnapshotStore { write(session_id, seq, ts, state),
  latest(session_id), latest_before(session_id, target_ts),
  count(session_id) }` — thread-safe via a lock shared with all
  DB-touching ops.
- `should_compact(events_since_last, every_n_events=10_000)`
  predicate that engine writer hooks (slice 4) will call.

flint/portfolio/event_log.py:
- `EventLogReader.read_after_seq(session_id, after_seq, target_ts)`
  fetches only events with `seq > after_seq AND ts <= target_ts` —
  the tail past a snapshot.
- Reader constructor now bootstraps the events table idempotently
  via `CREATE IF NOT EXISTS`, so replay over a fresh DB doesn't
  fail with `CatalogException` when the writer hasn't run yet.

flint/portfolio/replay.py:
- `fold(events, initial_capital, seed=None)` — when `seed` is given,
  fold continues from that state (fast-forward path) and the
  `initial_capital` arg is carried by the seed.
- `replay(..., use_snapshot=True)` (default): query
  `SnapshotStore.latest_before(target_ts)`; on hit, fold only the
  tail starting from the snapshot. On miss, falls through to
  read-from-zero. `use_snapshot=False` forces full replay (used by
  the parity test).

Tests:
- tests/test_snapshots.py — 15 cases:
    * BookState ↔ JSON round-trip preserves every field, including
      multiple positions across venues.
    * Empty BookState round-trips (no positions).
    * Schema idempotent (CREATE IF NOT EXISTS).
    * write/latest/count semantics.
    * Per-session isolation.
    * Upsert overwrites same (session_id, seq).
    * `latest_before` returns the most recent snapshot at-or-before
      the target_ts; None when no qualifying snapshot exists.
    * **Load-bearing**: replay(use_snapshot=True) and
      replay(use_snapshot=False) produce byte-identical state on the
      same target_ts.
    * Replay falls back to read-from-zero when target_ts is before
      every available snapshot.
    * Replay over an unknown session returns the initial state
      (table-bootstrap path).
    * fold(seed=...) carries the seed forward; mutates in place.
    * `should_compact` threshold predicate (default 10k + custom).
- 47/47 across event_log + replay + snapshots. ruff clean.

WAVE_STATUS: D-6.4-replay slices 1+2+3 shipped. Engine writer hooks
(slice 4) + time-travel UI (slice 5) remain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Slice 4 of 5 for D-6.4-replay. Wires the BacktestContext to emit
events to the EventLogWriter so replay can reproduce final state
byte-for-byte from a real backtest.

Approach: opt-in. The two new constructor kwargs (`event_log_writer`,
`event_session_id`) default to None; when either is unset, the
`_emit(...)` helper short-circuits and the legacy path pays zero
overhead. When both are set, every state-mutating operation appends
one event to the log.

Hooks emit on:
- order.submit: market_order / limit_order / stop_order /
  take_profit_order (all four order entrypoints, after the
  orchestrator gauntlet passes)
- order.cancel: cancel(oid) on success, cancel_all() when n > 0
- fill: every `_apply_fill` invocation, emitted *before* position
  mutation so replay's fold sees opens/closes in the same order
- funding: every per-position `apply_funding` payment
- liquidation: every margin-engine force-close (after the loss is
  booked)
- borrow: every `_realize_jupiter_borrow_cost` debit

Event payloads carry enough state for replay to reconstruct the
book exactly (market, venue, side, size, price, fee, tx_cost on
fills; payment + rate on funding; loss + penalty on liquidation;
cum_entry + cum_exit on borrow). The `ts` field uses the *event*
time, not the write time — fills carry `fill.ts`, funding carries
`funding_rate.ts`, etc., so `replay(target_ts=...)` filters correctly
across cross-bar events.

Tests:
- tests/test_event_log_engine_hooks.py — 12 cases:
    * Legacy ctx (no writer) emits zero events
    * Writer without session_id is also a no-op
    * Every order kind emits the right payload (type tag,
      price/trigger_price as appropriate)
    * cancel emits only when an order was actually cancelled
    * fill payload carries market/venue/side/size/price
    * **End-to-end parity (load-bearing)**: replay over the
      live-emitted log produces `replayed.cash ==
      ctx.account.cash` for open-then-close, 5-fill DCA, and
      partial-replay-at-intermediate-ts scenarios
    * Funding payment emits with payment field equal to
      size × oracle_price × rate
- 329/329 regression green across backtest, multi-venue, jupiter,
  funding, paper, portfolio, and the four replay-suite files.
  ruff clean.

WAVE_STATUS: D-6.4-replay slices 1-4 shipped (4/5). Time-travel UI
(slice 5) is the last piece — that's a UI-side concern, deferred to
when the user wants to expose the time-travel surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-on-reconnect

Wave 3 first slice for D-4.3-websocket. Server-side: per-session
endpoints, monotonic seq stamping, replay buffer, heartbeat ping.
Client-side: typed `useWebSocket<T>` hook with the same exponential
backoff as `useBackoffPoll` + heartbeat-stale detection.

flint/api/websocket.py:
- `ConnectionManager` extended with:
    * Per-channel monotonic `seq` counter; every broadcast envelope
      carries `{channel, seq, ...payload}`.
    * 500-deep `deque` ring buffer per channel (`_REPLAY_BUFFER_SIZE`)
      of `(seq, ts, payload_str)` for replay on reconnect.
    * `connect(websocket, channel, since_seq=N)` replays every
      buffered event with `seq > N` before streaming live →
      at-least-once delivery on flap-and-reconnect.
    * `ping(channel)` broadcasts `{type: "ping", ts}` for heartbeat.
    * `disconnect`, `channels()`, `buffered_count(channel)` for
      introspection.

flint/api/main.py:
- New per-session WS routes:
    * `/ws/paper/{session_id}` → channel `paper:{id}`
    * `/ws/live/{session_id}` → channel `live:{id}`
- Shared `_ws_loop` helper pulls `?since=<seq>` from query params
  for replay opt-in. Legacy `/ws/{channel}` kept for back-compat.

ui/src/hooks/useWebSocket.ts:
- Typed `useWebSocket<T>(path, opts)` hook returning
  `{data, status, lastSeq, errorCount, lastError}`. Status enum:
  `connecting | open | closed | error`.
- Reconnect backoff schedule mirrors `useBackoffPoll`:
  1s → 2s → 5s → 10s → 30s on consecutive failures; resets on first
  successful open. Tracks `lastSeqRef` across reconnects so the next
  attempt sends `?since=<lastSeq>` for replay.
- Heartbeat: timer rearms on every incoming message; 30s of silence
  forces `ws.close(4000, 'stale')` which routes through the normal
  reconnect path — bounds the worst-case stale-data window when an
  intermediate proxy silently drops the socket.
- `ping` envelopes are consumed internally (rearm heartbeat) and
  not bubbled up to the consumer's `data`.

Tests:
- tests/test_websocket_replay.py — 10 cases:
    * Monotonic seq stamping (per-channel independence)
    * `all` channel receives every broadcast
    * Dead socket pruned on first failed send
    * `since_seq=N` replays seqs > N (and replays nothing when None)
    * Ring buffer caps at `_REPLAY_BUFFER_SIZE` even past N+1 events
    * `ping(channel)` broadcasts a heartbeat with `type: "ping"`
    * Connection introspection: count, channels(), disconnect

127/127 vitest still green; vite build clean; 10/10 backend WS
tests; ruff clean.

WAVE_STATUS: D-4.3-websocket → 🟡 (foundation). Remaining work
(deferred): engine-side broadcast hooks (paper/live engines emit
equity/fill ticks to their channels) + migration of
useLiveMonitor/usePaperPortfolio/useSessionStatus from polling to WS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
D-4.3-websocket slice 2. The foundation slice landed the endpoints,
manager, and `useWebSocket<T>` hook. This slice wires the paper
engine to actually emit ticks so subscribed clients see live equity
movement instead of polling.

flint/paper/engine.py:
- `PaperTradingEngine` gains a `ws_manager` attribute (defaults to
  None — unit tests + `flint backtest` continue to construct the
  engine without a manager and pay zero overhead).
- `_run_live_session` per-bar loop, after the equity snapshot is
  built, broadcasts `{type: "tick", ts, equity, cash,
  unrealized_pnl, is_replay, total_trades}` to channel
  `paper:{session_id}`. Wrapped in try/except at debug log level so
  a flaky ws never tanks the engine's tick.

flint/api/main.py:
- Lifespan startup wires `paper_engine.ws_manager = ws_manager`
  after both are constructed. Comment notes the dependency direction
  so the next refactor doesn't accidentally invert it.

Tests:
- tests/test_paper_engine_ws_broadcast.py — 5 cases:
    * Engine default `ws_manager` is None
    * Engine accepts a manager assignment post-init
    * Broadcast envelope shape matches what the engine emits
      (channel, seq, type, ts, equity, total_trades)
    * Subscribers only receive their own session's ticks
    * Broken sockets don't propagate exceptions out of the manager
- 47/47 regression green across paper/paper-funding/paper-multi-venue
  + the new ws-broadcast file. ruff clean.

WAVE_STATUS: D-4.3-websocket → still 🟡 (paper engine broadcast +
foundation shipped; live engine broadcast + hook migration +
PaperTrading.tsx panel binding remain).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Continues the D-4.3-websocket buildout. After slice 2 added per-bar
equity ticks, this slice adds the trade + fill streams so subscribers
see *every* state change as it happens, not just the next equity
snapshot.

flint/paper/engine.py:
- After persisting `new_trades` from the broker, the engine now
  broadcasts each one to `paper:{session_id}` as
  `{type: "trade", ...trade_dict}`. Reuses the same try/except
  swallow as the tick broadcast — a flaky ws never tanks the loop.

flint/execution/live_base.py:
- `LiveExecutionContext.__init__` gains a public `ws_manager`
  attribute (defaults to None — every concrete subclass and every
  unit test continues to construct without one).
- `_handle_fill(order_id, fill)` broadcasts to `live:{session_id}`
  with `{type: "fill", order_id, market, venue, side, price, size,
  fee, ts}` when both `ws_manager` and `_session_id` are set. Uses
  `asyncio.ensure_future` so the fill flow stays synchronous from
  the OrderTracker's perspective; broadcast failures are swallowed
  at debug level so persistence/risk-guard paths can't be blocked
  by network hiccups.

Tests:
- tests/test_live_context_ws_broadcast.py — 4 cases on a stub
  LiveExecutionContext subclass:
    * Default `ws_manager` is None
    * `_handle_fill` broadcasts when manager + session_id set;
      envelope shape verified (channel, type, market, side, price)
    * Broken broadcast doesn't propagate — fill flow continues
    * Empty session_id skips the broadcast even if manager is set
- 33/33 pass across the WS-related test files; 27/27 pass on the
  separate live-context regression files (test_multi_venue_live,
  test_live_context_data). ruff clean.

WAVE_STATUS: D-4.3-websocket → still 🟡 (foundation + paper tick +
paper trade + live fill all live; hook migration to drop polling +
UI panel binding remain).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires SessionDetail to the per-session WebSocket so the equity /
unrealized PnL / trade count update on every tick instead of every
2s poll. Augments rather than replaces polling — the polled
SessionStatus still drives the full state (margin, fees, status
phase, equity history endpoint), and the WS just overlays the
fast-moving fields.

ui/src/pages/PaperTrading.tsx:
- New `useWebSocket<PaperWsTick>(/ws/paper/${sessionId})` subscription.
- `wsTick = ws.data when type === 'tick'`. The metrics grid now
  reads `liveEquity = wsTick?.equity ?? status.equity`,
  `unrealizedPnl = wsTick?.unrealized_pnl ?? status.unrealized_pnl`,
  and `total_trades = wsTick?.total_trades ?? status.total_trades`,
  so the cells refresh as fast as the engine ticks.
- Header gets a `WS LIVE` / `WS CONNECTING` / `WS OFFLINE` indicator
  driven by `ws.status` — the user can see at a glance whether the
  socket is healthy without inspecting the network panel.

ui/src/test/hooks/useWebSocket.test.ts (new):
- 6 vitest cases on a `MockSocket` stand-in:
    * Connects on mount when enabled
    * Skips when `enabled=false`
    * Parses incoming JSON, exposes `data`, sets `status` to 'open',
      tracks `lastSeq`
    * Drops `{type: ping}` envelopes (no `data` mutation) but still
      updates lastSeq when a real tick follows
    * Appends `?since=<lastSeq>` to the URL on reconnect after an
      abnormal close
    * Closes the socket on unmount

133/133 vitest, vite build clean, ruff clean.

WAVE_STATUS: D-4.3-websocket → still 🟡 (paper UI bound; hook
migration to drop polling and Live page binding remain).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LiveMonitor.tsx now opens a WebSocket to `/ws/live/{sessionId}` per
selected session. WS-emitted fills merge into the polled fills list
(deduped by order_id + ts) so the fills tape updates as soon as
LiveExecutionContext._handle_fill broadcasts, instead of waiting on
the next 2s poll.

ui/src/pages/LiveMonitor.tsx:
- New `useWebSocket<LiveWsFill>` subscription gated on
  `enabled: !!activeId` so the hook idles when no session is
  selected.
- `wsFills` accumulates fills from incoming `{type: "fill"}`
  envelopes; reset to empty when the user switches sessions.
- Display fills = polled-fills ∪ ws-fills, deduped by composite
  `order_id|ts` key, sorted by ts. Polled fills win on first sight,
  WS fills append; either source filling first is fine.
- Header gets the same `WS LIVE` / `WS CONNECTING` / `WS OFFLINE`
  indicator that PaperTrading.tsx grew in slice 3, gated on
  `activeId` (no indicator when no session yet).

Tests:
- No new tests this commit — the existing useWebSocket vitest
  covers the hook surface, and LiveMonitor is a thin consumer.
- 133/133 vitest still green; vite build + ruff clean.

WAVE_STATUS: D-4.3-websocket → still 🟡 (both pages bound; only
remaining work is dropping the polling hooks entirely once the WS
streams are confirmed solid in production — that's a confidence
move, not a code change).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Refines D-6.1-unified attribution. Foundation slice split closed-trade
PnL evenly across strategies that touched the same market — wrong
when one strategy opened and another closed. This commit threads the
closing fill's order_id (which the engine tags with the strategy
name) into the closed-trade dict so PnL goes to the actual
authoring strategy.

flint/execution/backtest_context.py:
- Both close paths in `_apply_fill` (full close + flip, partial
  close) now write `exit_order_id: fill.order_id` into the
  closed-trade record. Liquidations don't get an order_id (no
  triggering fill) so they fall through to the even-split path.

flint/portfolio/shared_engine.py:
- `SharedCapitalPortfolioEngine.run()` per-trade attribution checks
  `trade["exit_order_id"]` for a `strategy_name:` prefix; when
  found, the entire PnL goes to that strategy. Even-split fallback
  preserved for liquidations and untagged trades.

Tests:
- tests/test_shared_capital_portfolio.py — new
  `TestPnlAttribution::test_pnl_attributed_to_closing_strategy`:
  one strategy opens long, another closes short; verify the closer's
  PnL share exceeds 50% of total (foundation slice would have split
  equally, ~50% each, so this test exercises the new path).
- 9/9 tests in the file pass; ruff clean.

WAVE_STATUS: D-6.1-unified attribution refined. Remaining D-6.1
work (per-strategy capital caps via tagged sub-buckets, dollar-
neutral rebalancing) still deferred to a follow-on PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two test fixes uncovered by the wide pytest sweep at the end of this
session.

tests/test_mcp_server.py:
- TestRunBacktest.test_returns_valid_json was patching
  `flint.api.routes.backtest._build_strategy` and `BacktestEngine`,
  but the D-4.7-full MCP refactor routes the tool through
  `flint.services.backtest.run_backtest_sync` directly. Updated to
  mock at the service boundary with a fake tearsheet-shaped dict
  (metrics + winning_trades + losing_trades + total_fees +
  engine_used). Also patches `_get_store` so the auto-fetch path
  short-circuits on a "data already local" branch.
- test_unknown_strategy switches to mocking the service to raise
  ValueError, which the MCP tool catches and surfaces as `error`.

tests/test_portfolio.py:
- TestPortfolioEngine.test_two_strategies asserted
  `combined_equity == 60`. The D-1.1.b force-close fix can append a
  terminal equity point past the last candle when a strategy had an
  open position at engine exit, so the combined curve length is in
  {N, N+1}. Relaxed the assertion to accept either.

Sweep status: pre-existing failures from missing optional deps
(`eth_account` for hyperliquid_live, `solders` for wallet) remain —
those are env issues, not code regressions. ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the Wave 3 working session. Pytest sweep over the full suite
(skipping the four files with missing optional deps: ccxt,
eth_account, solders) reports:

    2038 passed · 7 skipped · 0 failed in 5m39s

Up from 1861 at the start of the session — net +177 tests landed
across the new managers, replay, snapshots, websocket layer,
shared-capital portfolio engine, and orchestrator.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reflects 28 commits on the restructure branch. All six phases now
have concrete shipped state recorded next to them, and the
"Recently done" section is repaired to call out the actual landed
work (Waves 1, 2, 3 portfolio/UX, Wave 5 replay) instead of the
old phase-7 + Hyperliquid-funding bullets.

Adds a pointer to WAVE_STATUS.md for wave-by-wave detail; ROADMAP
keeps the bird's-eye view, WAVE_STATUS owns the per-item state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
flint/api/routes/replay.py — three GETs over the portfolio event log
+ snapshot store + replay primitive:

  GET /api/v1/replay/{session_id}/events?since=<seq>&limit=<n>
      Page through events in seq order. limit defaults to 200,
      capped at 5000. has_more flag for client-side pagination.
  GET /api/v1/replay/{session_id}/state?target_ts=<t>&initial_capital=<c>&use_snapshot=<bool>
      Run `replay()` and return BookState as JSON. Positions
      flatten to "venue|market" keys.
  GET /api/v1/replay/{session_id}/summary
      Cheap polling target: event_count, latest_seq, snapshot_count.

Registered under `/api/v1/replay`. Read-only — writes happen in the
engine via the slice-4 writer hooks.

Tests:
- tests/test_replay_api.py — 10 cases:
    * Empty/populated events list, seq ordering, since pagination,
      limit + has_more, per-session isolation
    * State replay round-trip (open+close), partial intermediate-ts
      replay, unknown-session-returns-initial
    * Summary on empty + populated session (with snapshot)
- Test fixture swaps a fresh `FlintStore` onto `app.state` per test
  to bypass the FastAPI module-level app singleton's leftover state
  from earlier tests.

ruff clean.

WAVE_STATUS: D-6.4-replay slice 5 backend shipped. Frontend
time-travel page binding remains deferred to a later slice.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
D-6.4-replay slice 5 MCP layer. After the REST endpoints landed,
this commit exposes the same surface to MCP-compatible clients
(Claude / Cursor / etc.) so AI workflows can drive time-travel
queries against any session.

flint/mcp_server.py:
- `replay_summary(session_id)` — quick metadata: event count,
  latest seq, snapshot count.
- `replay_state(session_id, target_ts, initial_capital=10_000)` —
  fold the log up to target_ts and return cash, realized PnL,
  fees, funding, borrow, fill counts, positions dict.
- `list_replay_events(session_id, since=-1, limit=50)` — page
  through events in seq order. Caps limit at 5000 to bound the
  response size.

All three call into the same `EventLogReader` + `replay()` +
`SnapshotStore` services the REST endpoints use; no duplication.

Tests:
- tests/test_mcp_replay_tools.py — 7 cases:
    * Each tool on an empty session returns the right empty shape
    * `replay_summary` reflects event count + latest seq after writes
    * `replay_state` round-trips an open+close to realized_pnl == 10
    * `list_replay_events` honors since + limit (with has_more flag)
- 47/47 across all three MCP test files (test_mcp_server,
  test_mcp_standalone, test_mcp_replay_tools). ruff clean.

Fixture trick: monkeypatches `flint.mcp_server._store` directly
with a fresh `FlintStore(tmp_path)` so the cached singleton lookup
in `_get_store()` returns the per-test DB instead of trying to
reconstruct via `load_config()`.

WAVE_STATUS: D-6.4-replay slices 1-5 (backend + REST + MCP) all
shipped. Only frontend time-travel page binding remains deferred.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes D-6.4-replay end-to-end. Backend + REST + MCP shipped in
prior commits; this slice adds the UI surface.

ui/src/hooks/useReplay.ts:
- `useReplaySummary(sessionId)` polls `/api/v1/replay/{id}/summary`
  every 5s for event_count / latest_seq / snapshot_count.
- `useReplayState(sessionId, targetTs, initialCapital)` fires once
  per (session, ts) change against `/api/v1/replay/{id}/state`,
  exposes `{data, error, loading}`. Replay isn't a stream — it's a
  deliberate query, so no auto-retry.

ui/src/pages/Replay.tsx:
- Session-id loader (text input + LOAD button) — backtest run_ids
  and paper session_ids both work.
- Summary cards (event count / latest seq / snapshot count).
- Target-ts scrubber with epoch-second input + initial-capital
  input. Defaults to "now" once a session lands so the page shows
  current state on first render.
- State cards: cash (gain/loss accent vs initial), realized PnL,
  fill count, liquidation count, fees, funding.
- Open-positions table (venue/market, side, size, entry).
- Empty-state placeholder when no session loaded.
- Inline error banner on either summary or state fetch failure.

ui/src/App.tsx:
- New `/replay` route + `REPLAY` nav item (key 0).

133/133 vitest still green; vite build clean; ruff clean.

WAVE_STATUS: D-6.4-replay → 🟢 (all 5 slices shipped end-to-end).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ts refresh

Reflects D-6.4-replay end-to-end shipping + D-4.3-websocket per-session
streams.

README.md:
- Paper Trading section: live PnL bullet adds "+ per-session WebSocket
  stream (`/ws/paper/{id}`)" so readers know polling is no longer the
  only path. New bullet on time-travel replay (event log + REST API +
  Replay UI page).
- MCP Server section: tool count 17 → 20; explicitly names the three
  new replay tools (`replay_summary`, `replay_state`,
  `list_replay_events`).
- Auto-counts block refreshed via `scripts/update_readme_counts.py`:
  23 strategies, 26 providers, 80 REST endpoints, 20 MCP tools, 2055
  tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brings the AI-aware context doc up to date with the work this session.
Future agents (and humans skimming) get a one-page map of:

- The seven managers BacktestContext composes (PositionManager,
  CashManager, FillRecorder, OrderQueue, FundingLedger, BorrowLedger,
  MarketDataFeed) and what each owns.
- The PortfolioMarginEngine pre-trade gauntlet.
- The event-sourcing modules (event_log, replay, snapshots) plus
  the REST + MCP + UI surfaces that wrap them.
- The flint/services/* layer and the rule that strategy templates
  live in services/strategies.py only (single source of truth).

Two new rules in the Rules section:
- New BacktestContext mutations route through the seven managers.
- New strategy templates land in services/strategies.py only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The session landed enough new public surface (event log + replay +
WebSocket streams + portfolio orchestrator + manager-decomposed
BacktestContext + service layer + per-session WS routes) that a
patch bump understates the changeset. Goes to 1.4.0.

pyproject.toml: 1.3.1 → 1.4.0. The capabilities endpoint reads
this directly so no other version pins need updating.

docs/concepts/architecture.md:
- New "BacktestContext composes seven managers" subsection mapping
  each manager (PositionManager, CashManager, FillRecorder,
  OrderQueue, FundingLedger, BorrowLedger, MarketDataFeed) to what
  it owns. Notes the PortfolioMarginEngine pre-trade gauntlet and
  the component-tagged rejection log line.
- New "Event sourcing + replay" subsection covering the
  portfolio_events table, the fold/replay primitive, snapshot
  fast-forward, and the three surfaces (REST + MCP + UI page) that
  expose it. Calls out the load-bearing parity test that pins
  byte-for-byte cash equality.

UI docs page regenerated via `scripts/build_docs.py` so the
architecture section in the browser stays in sync.

Verified `_get_version()` returns 1.4.0; ruff clean; UI vite build
clean; replay-related test files (event log + replay + snapshots +
engine hooks + REST + MCP) all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Full pull-request narrative covering all 6 phases, the 7-manager
BacktestContext decomposition, D-4.3-websocket end-to-end,
D-6.4-replay all 5 slices, the Rust ports (TxCostModel +
OrderbookFiller), and the service-layer extraction.

Lists migration notes (no breaking API changes; new ctor kwargs are
opt-in) and the follow-on items still blocked on testnet secrets.

Stays out of git history once the PR merges — this is a
human-readable summary for the merge review, not a runtime artifact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
D-6.4-replay slice 6. Closes the snapshot loop. Before this commit,
SnapshotStore was a write-only API — callers had to manually invoke
`SnapshotStore.write()`. Without snapshots, large sessions force a
full fold-from-seq=0 on every replay query (O(N events)).

Adds optional `snapshot_store` + `snapshot_every` (default 10_000)
ctor kwargs to BacktestContext. When `snapshot_store` is wired:
  * `_emit()` increments a `_events_since_snapshot` counter on every
    appended event.
  * When the counter crosses `snapshot_every`, `_compact_snapshot()`
    folds the entire log into a fresh BookState (using snapshot
    fast-forward when a prior snapshot exists, so each compaction
    is O(events_since_last_snapshot) rather than O(N events)).
  * Counter resets to 0 after each compaction.
  * Failures are swallowed at debug level — engine keeps running.

Backward compatible: legacy callers without `snapshot_store` see
zero new behavior. The slice-3 SnapshotStore manual-write API stays
intact for callers that want explicit control.

Tests:
- tests/test_auto_compaction.py — 5 cases:
    * No snapshot when store unset (zero-overhead path)
    * Snapshot after exactly N events at the threshold
    * Multiple snapshots at each N-event boundary
    * Counter resets after compaction (sub-threshold burst doesn't
      double-fire)
    * **Load-bearing**: replay with snapshot fast-forward matches
      replay-from-zero on the same target_ts after compaction —
      compaction never produces a divergent snapshot

5/5 pass; ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
D-6.4-replay E2E. Stronger production-confidence signal than the
synthetic-event tests in test_event_log + test_replay: drives a
real `MACrossoverStrategy` against a sine-wave-with-drift price
series for 300 bars, captures the event log + auto-compacted
snapshots in flight, then asserts replay reproduces the live
context's `cash` + `total_fees` byte-for-byte.

tests/test_replay_e2e_backtest.py — three classes:

- `TestEndToEndReplayMatchesLive::test_replay_reproduces_account_cash`
  Real strategy run → replay-from-zero → cash equality (1e-9).

- `TestAutoCompactionInRealBacktest::test_compaction_during_backtest_preserves_correctness`
  Same scenario with `snapshot_every=20` so the auto-compactor
  fires repeatedly during the run. Replay-with-snapshot vs
  replay-without-snapshot must agree on cash, realized PnL, fill
  count, and every position's (side, size, entry_price).

- `TestReplayAtIntermediateTimestamps::test_intermediate_ts_replay_traces_equity_curve`
  Replay at 25% / 50% / 75% / 100% of the candle range; assert
  monotonic fill_count and finite cash.

Helper: `_drive_strategy(ctx, strategy, candles)` mimics the bar
loop BacktestEngine runs internally. Direct-loop is necessary
because BacktestEngine.run() routes through Rust when capabilities
allow, which bypasses the supplied event-log-wired BacktestContext.

3/3 pass; ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
D-6.4-replay UI slice 6. Promotes the bare ts-input to a real
time-travel UX.

ui/src/hooks/useReplay.ts:
- New `useReplayEvents(sessionId, since, limit)` wraps
  `/api/v1/replay/{id}/events`. One-shot fetch (re-fires on hook
  arg change) so the page can keep a 1000-event window in memory
  for the slider's range + the tail panel's render list.
- New `ReplayEvent` + `ReplayEventsPage` types.

ui/src/pages/Replay.tsx:
- Numeric ts-input replaced with an `<input type="range">` slider
  bounded to [first event ts, last event ts]. Background is the
  amber accent so the position-along-the-history is obvious at a
  glance.
- Range labels show start ts, end ts, and the "N events in window"
  count.
- Step controls: ← PREV EVENT / NEXT EVENT → walk one event ts at
  a time; ⏮ START / END ⏭ jump to the boundaries. Wired against the
  events array so steps land on actual event timestamps, not
  arbitrary epoch values.
- New "EVENT.TAIL" panel under the positions table: shows the most
  recent 50 events with `ts <= target_ts`, color-coded by kind
  (gain for fills, loss for liquidations, amber for order.*, ghost
  for everything else). Header shows "showing N of M folded" so
  the user knows exactly how much state went into the displayed
  BookState.

133/133 vitest still green; vite build clean; ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the four-part Wave 5 wrap-up. After ports of TxCostModel and
OrderbookFiller landed in earlier commits, this slice ports the two
remaining hot-path ledgers — funding rates and Jupiter borrow rates —
to Rust with PyO3 bindings.

rust/src/engine/funding_ledger.rs:
- `FundingLedger` owns per-market and per-venue history. `add`,
  `latest`, `recent(market, lookback)`, `by_venue(market, lookback)`
  mirror the Python class one-for-one. 4 cargo tests.

rust/src/engine/borrow_ledger.rs:
- `BorrowLedger` owns `_history` per market + per-trade
  `payments` ledger + `total_paid` counter. `record`, `record_payment`,
  `add_paid`, `latest`, `recent(market, lookback)`,
  `cumulative_at(market, ts)` (linear walk early-exits at first
  point past `ts`, matching Python). 3 cargo tests.

PyO3 bindings:
- `flint_core.FundingLedger` exposes `add(market, venue, ts, rate)` /
  `latest` / `recent(market, lookback=24)` / `by_venue(market, lookback=24)`.
- `flint_core.BorrowLedger` exposes `record` / `record_payment` /
  `add_paid` / `total_paid` (getter) / `latest` / `recent` /
  `cumulative_at` / `payments_count`.

Tests:
- tests/test_rust_ledger_parity.py — 7 cases pinning every method's
  output to 1e-9 against the canonical Python implementations:
    * Funding: latest, recent pairs, by_venue groupings, unknown-market
      empty fallback
    * Borrow: latest + recent, cumulative_at across the whole
      query-ts window (before-all, exact-match, mid-range, past-end),
      total_paid + payments-count round-trip

7/7 parity green; 7/7 cargo green; 91/91 across the whole replay
suite (event log + replay + snapshots + engine hooks + REST + MCP +
auto-compaction + E2E + ledger parity). ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…to-compaction + E2E + UI polish)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ers, UI polish

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wave 5 closeout records the four follow-on items after D-6.4-replay:
auto-compaction, E2E parity over real strategy, replay UI polish,
Rust ledger ports.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sohan-shingade sohan-shingade merged commit ad687c2 into main Apr 25, 2026
@sohan-shingade sohan-shingade deleted the restructure branch April 25, 2026 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant