Skip to content

test(cluster): F10 omega-bootstrap end-to-end integration test#109

Merged
TickTockBent merged 2 commits intomainfrom
test/90-omega-bootstrap-integration
Apr 27, 2026
Merged

test(cluster): F10 omega-bootstrap end-to-end integration test#109
TickTockBent merged 2 commits intomainfrom
test/90-omega-bootstrap-integration

Conversation

@TickTockBent
Copy link
Copy Markdown
Owner

Summary

Closes #90. Adds the integration test that exercises the full omega bootstrap path — DNS resolution, signed-list verification, root self-recognition, cluster bootstrap, IsRoot() 403-gate, and peer convergence — in a single test invocation.

This is the test that would have caught F1/F2/F3/F11 the moment the spec landed. Every preflight finding was caught by humans during the burn-in because no test before this exercised the composition of trust + cluster + bootstrap + root-gate. The unit tests in internal/trust/ cover the resolver path in isolation; nothing pulled the whole stack together.

What's covered

Test What it locks in
TestOmegaBootstrap_RootsConvergeAndNonRootJoins happy path: two roots converge, non-root joins via roots, root status correctly classified per the signed list
TestOmegaBootstrap_NonRootRejectsWith403 non-roots return 403 on /v1/bootstrap even when reachable; protects against rogue-node topology harvesting
TestOmegaBootstrap_TamperedListRejected list signed by a different keypair fails verification
TestOmegaBootstrap_ExpiredListRejected list with valid signature but past expiration is rejected

Test design

  • Reuses the testNode harness from integration_test.go but installs a different /v1/bootstrap handler (makeOmegaBootstrapHandler) that mirrors the production IsRoot() gate. The shared harness skips the gate because it's used by tests where IsRoot is not a concept.
  • signTestList, buildResolver, and resolveAndApply helpers walk the same code paths cmd/repram/main.go uses, with injectable DNS via the stubTXTResolver pattern from internal/trust/resolver_test.go.
  • Roots are flipped to IsRoot=true on every node before any node calls Start. In production all roots come up before bootstrap traffic flows; doing the resolve+SetRoot phase first matches that ordering rather than racing the first starter against a peer that hasn't yet flipped IsRoot=true.

Drive-by

The Nodes field doc in internal/trust/signedlist.go still said "host:gossip-port" — predates #82's port-semantics fix. Updated to "host:http-port" with a #82 reference.

What this doesn't do

  • Doesn't exercise cmd/repram/main.go's resolveOmegaBootstrap() directly — that function is hardcoded to use the baked-in pubkey + production DNS config, which makes it untestable without refactoring. The integration test exercises the same composition (trust.FetchSigned + cluster.NewClusterNode + bootstrap with 403-gate handler), with injection seams the production binary doesn't need.
  • Doesn't cover the cache-fallback path (DNS unreachable → load from disk). Already covered at the unit level in internal/trust/cache_test.go.
  • Doesn't add a TS-side equivalent. The TS bootstrap flow is structurally similar but lives in repram-mcp/src/index.ts; a TS-side integration test would be a follow-on. Filed mentally as a candidate for if/when a similar finding surfaces in TS.

Suite results

  • Go: all packages green; 4 new tests in internal/cluster

Test plan

  • CI green
  • Spot-check that the new tests would have caught F1/F2 (they would: if Nodes contained gossip_port instead of http_port, the addr() comparison in resolveAndApply would fail to match any seed and no node would self-recognize as a root → cluster fails to bootstrap → test fails)

Closes

Closes #90

// ticktockbent

Closes #90. Adds the integration test that exercises the full omega
bootstrap path — DNS resolution, signed-list verification, root
self-recognition, cluster bootstrap, IsRoot() 403-gate, peer
convergence — in a single test invocation.

This is the test that would have caught F1/F2/F3/F11 the moment the
spec landed. Every preflight finding was caught by humans during
burn-in because no test before this exercised the *composition* of
trust + cluster + bootstrap + root-gate. The unit tests in
internal/trust/ cover the resolver path in isolation; nothing pulled
the whole stack together.

Tests added (internal/cluster/omega_integration_test.go):
  - TestOmegaBootstrap_RootsConvergeAndNonRootJoins — happy path:
    two roots in the signed list converge with each other and accept
    the non-root. All three nodes resolve via the stub DNS, the
    self-recognition logic correctly classifies them, and the
    production-shaped /v1/bootstrap handler enforces the 403 gate.
  - TestOmegaBootstrap_NonRootRejectsWith403 — a node not in the
    signed list returns 403 even when its endpoint is reachable.
    Protects against rogue-node peer-topology harvesting.
  - TestOmegaBootstrap_TamperedListRejected — list signed by a
    different keypair fails FetchSigned at the verify step.
  - TestOmegaBootstrap_ExpiredListRejected — list with valid signature
    but past-expiration is rejected.

Test infrastructure:
  - Reuses the testNode harness from integration_test.go but installs
    a different /v1/bootstrap handler (makeOmegaBootstrapHandler) that
    mirrors the production IsRoot() gate. The shared harness skips
    the gate because it's used by tests where IsRoot is not a concept.
  - signTestList, buildResolver, resolveAndApply helpers walk the same
    code paths cmd/repram/main.go uses, with injectable DNS.
  - Roots are flipped to IsRoot=true on every node before any node
    calls Start. In production all roots come up before bootstrap
    traffic flows; doing the resolve+SetRoot phase first matches that
    ordering rather than racing the first starter against a peer that
    hasn't yet flipped IsRoot=true.

Drive-by: stale comment in internal/trust/signedlist.go said
"host:gossip-port" — updated to "host:http-port" with a #82 reference.

Suite results:
  Go: all packages green; 4 new tests in internal/cluster

Closes #90

// ticktockbent
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 27, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
repram Ready Ready Preview, Comment Apr 27, 2026 5:20pm

Request Review

Addresses suggestions 1 and 2 from PR #109 cold review.

Suggestion 1 — handler divergence:
  Production /v1/bootstrap gates on `network == "public" && !IsRoot()`;
  the test handler gates only on `!IsRoot()`. Functionally equivalent
  here because every node in these tests is in omega-resolution mode,
  but the difference is intentional and worth flagging for the next
  contributor who adds a private-network omega test.

Suggestion 2 — F1/F2 claim was overstated:
  The original test docstring said "would have caught F1/F2 instantly."
  The reviewer caught that the testNode harness uses the same ephemeral
  port for both gossipPort and httpPort, so if someone reverted main.go's
  `selfAdvertised` from httpPort to gossipPort the test would still pass
  (both values are numerically equal). Reworded to scope the claim:
  catches spec-level regressions in the signed-list address shape,
  does NOT catch code-level port-variable swaps. Locking down the
  latter would need a test with distinct gossip/http ports.

// ticktockbent
@TickTockBent
Copy link
Copy Markdown
Owner Author

Update — review fixes pushed (`c6e6eee`)

  • Suggestion 1 (handler divergence): added a comment to `makeOmegaBootstrapHandler` clarifying the intentional difference from production. The real handler gates on `network == "public" && !IsRoot()`; the test handler omits the network guard because every node in these tests is in omega-resolution mode. A future test that exercises the private-network path should reinstate the network guard.

  • Suggestion 2 (F1/F2 claim was overstated): my original test docstring said the test "would have caught F1/F2 instantly." The reviewer caught that the testNode harness uses the same ephemeral port for both `gossipPort` and `httpPort` — so if someone reverted main.go's `selfAdvertised` from `httpPort` to `gossipPort`, this test would still pass (both values are numerically equal in the harness). Reworded the docstring to scope the claim: catches spec-level regressions in the signed-list address shape; does not catch code-level port-variable swaps. Locking down the latter would need a test with distinct gossip/http ports — out of scope here, captured as future work.

  • Skipped (worth tracking, not in this PR):

    • Nits 1-4: comment about body-read order, helper for the resolveAndApply loop, marshalling `gossip.BootstrapRequest` instead of raw JSON in 403 test, `errors.Is` once sentinels are exported.
    • Coverage gaps: dynamic root flip, explicit F3 contact-all-seeds assertion, F4 self-in-response at integration level. These are real gaps but each merits its own focused test rather than piling onto this PR. Worth filing as a follow-up if the team wants them tracked separately.

@TickTockBent TickTockBent merged commit 877e6aa into main Apr 27, 2026
4 checks passed
@TickTockBent TickTockBent deleted the test/90-omega-bootstrap-integration branch April 27, 2026 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

F10: add multi-node integration test that exercises omega-bootstrap end-to-end

1 participant