Skip to content

feat: E2E tests for ceremony pipeline (dedup, rewrite, validate) #47

@Flare576

Description

@Flare576

Background

Ceremony and extraction are the most complex pipelines in Ei and currently have zero end-to-end coverage that verifies data actually lands in state. What exists today:

  • `ceremony.test.ts` — unit tests for orchestration logic (phase sequencing, `shouldStartCeremony`, `handleCeremonyProgress`). LLM is mocked.
  • `dedup.test.ts` — unit tests for `handleDedupCurate` handler (merge decisions, referential integrity).
  • `rewrite.test.ts` — unit tests for rewrite handlers.
  • `dedupe-ui.spec.ts` — E2E for the manual dedup UI only. Does not touch the automated pipeline.
  • `basic-flow.spec.ts` / `message-flow.spec.ts` — assert that extraction LLM requests fire, but never verify that a topic, person, or fact actually appears in state afterward.

The unit layer tests both ends of each step in isolation. What nobody tests is the chain: message in → scan queued → LLM responds → handler writes → state has a new topic.

Key Insight: Extraction and Ceremony Share a Code Path

Ceremony's Expose phase calls `queueAllScans` — the exact same extraction pipeline triggered by a user message. They are not separate systems. A proper ceremony E2E test necessarily exercises extraction end-to-end first.

The gap is the same in both cases: we verify requests fire, we never verify the output lands.

What's Needed

E2E tests (using the existing mock server framework in `tests/e2e/framework/`) that verify data lands in state, not just that requests fire.

Extraction pipeline

  • Send a message
  • Mock LLM responds to topic scan with a candidate topic
  • Mock LLM responds to topic match with a match result
  • Mock LLM responds to topic update with a synthesized description
  • Assert: the topic actually appears in the My Data modal / state

Validate step (new as of feat/topic-validate-phase)

  • Seed state with one existing topic
  • Trigger extraction that produces a near-duplicate new topic (similarity >= 0.85)
  • Mock validate LLM responds with a merge decision
  • Assert: duplicate is absorbed, not left as a second record

Dedup ceremony

  • Seed state with two topics above the similarity threshold
  • Trigger ceremony
  • Mock LLM responds with a merge decision
  • Assert: one topic survives, the other is removed, `replaced_by` is correct, quotes referential integrity holds

Why This Is Hard

The extraction pipeline is multi-step and sequenced: scan → match → update (→ validate). Each step is a separate LLM call with a different `next_step` type. The mock server currently matches requests by detecting prompt content, not by `next_step`. Supporting ordered response queues per step type (or keying on `next_step` from the request body) would be the main infrastructure lift.

`multi-message-synthesis.spec.ts` is the closest existing reference for multi-step sequencing.

Acceptance Criteria

  • At least one E2E test sends a message and asserts a topic appears in state (full scan → match → update chain)
  • At least one E2E test exercises the validate step (near-duplicate caught and merged)
  • At least one E2E test exercises dedup ceremony end-to-end with mock LLM
  • Tests run in CI alongside the existing 34-test suite
  • Mock server extended to support `next_step`-keyed responses if needed — don't replace it

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions