Summary
The chromium-msg 1/1 shard of E2E Tests failed on main commit `00edd23`. Investigation of the merged-blob report (artifact ID 6651581858) shows one hard failure plus three flaky retries, all in the messaging-realtime surface. None of them are caused by the three PRs that just merged (#54 RLS test fix, #55 docs, #56 AdminGate test) — the failure surface is tests/e2e/messaging/, which those PRs do not touch.
This issue tracks it as a real symptom rather than dismissing it as flake. STATUS.md already calls this surface out as the dominant flake hotspot ("9 rounds of mitigation. Cause: stale closures, unstable hook refs, hydration timing"). The retry helpers in real-time-delivery.spec.ts have already been bumped from 3 attempts to 5 attempts (5 × 60s = 5 minute budget per assertion). They are still hitting the wall.
Hard failure
Test: `tests/e2e/messaging/real-time-delivery.spec.ts:342` — "Real-time Message Delivery (T098) > should show delivery status (sent → delivered → read)"
Error: `expect(locator).toBeVisible()` failed after 60s.
- Locator: `getByText(testMessage)` on page2
- Expected: visible
- Received: `<element(s) not found>`
Where it fails: line 85 of the `waitForMessageOnPage2` helper. The helper navigates+reloads up to 5 times waiting for the message to appear on the second window. After 5 attempts (~5 min), it gives up. Both retries (`attempt 0` followed by another `attempt 0` — Playwright's retry on failure) failed at the same line.
This is the real-time subscription on page2 not receiving the message that page1 sent. Either:
- Supabase Realtime broadcast didn't deliver to the second client's subscription
- The subscription was unsubscribed between page1's send and page2's render
- The auth session on page2 lost permission to read the conversation mid-test
- Hydration timing on page2 missed the message that arrived before subscribe()
Flaky (passed on retry)
| File:line |
Test |
Pattern |
| `message-delete-placeholder.spec.ts:341` |
"should show [Message deleted] placeholder and preserve adjacent messages" |
initial fail → passed on retry |
| `message-editing.spec.ts:337` (fails at line 394) |
"T115: should edit message within 15-minute window" |
initial fail → passed on retry |
| `real-time-delivery.spec.ts:310` (fails at line 85) |
"should deliver message in <500ms between two windows" |
initial fail → passed on retry |
The fact that two of the three flaky failures are at line 85 (same helper) and the hard failure is also at line 85 strongly suggests one root cause across all four: cross-window realtime propagation under CI.
What's been ruled out
The recent merges to main do not touch this surface:
The previous main commit (`62f8a40`) had a green E2E run, so the hard failure either appeared recently or is intermittent enough to slip through. Given STATUS.md's flake history, intermittent is more likely than regression-from-merge.
Plan
- Confirm flake-vs-regression by rerunning the failed shard once the in-flight run completes. If rerun is green, this is in the existing flake budget. If rerun is also red, escalate to root-cause analysis.
- Pull the actual trace.zip for the specific failure (the merged blob doesn't include per-test traces; need the raw shard artifact). Inspect the page2 timeline: does Realtime ever connect? Does it receive the INSERT event? Does it filter the event out?
- Cross-reference `useConversationRealtime.ts` and `useTypingIndicator.ts` — STATUS.md / tracking-doc lists this as the resolved-but-watch-for-regression hotspot. Verify the `useMemo(() => createClient(), [])` wrapper is still in place and the subscription teardown isn't leaking between the two test windows.
- Decide whether the helper's 5-attempt × 60s retry budget is the right answer or whether it's masking a real subscription bug. The comment at line 72-76 implies the bump from 3→5 was already a mitigation, not a fix. We may have hit the wall on what retries can absorb.
Acceptance
Either:
- (a) The rerun is green AND the hard failure is reproducible <30% of the time AND we add a Watch entry in
docs/STABILITY-TRACKING.md Family A. Stays open as a watch.
- (b) Root cause identified, fix landed, hard failure reproduces 0/10 reruns. Closes.
Reference
Summary
The
chromium-msg 1/1shard of E2E Tests failed on main commit `00edd23`. Investigation of the merged-blob report (artifact ID 6651581858) shows one hard failure plus three flaky retries, all in the messaging-realtime surface. None of them are caused by the three PRs that just merged (#54 RLS test fix, #55 docs, #56 AdminGate test) — the failure surface istests/e2e/messaging/, which those PRs do not touch.This issue tracks it as a real symptom rather than dismissing it as flake. STATUS.md already calls this surface out as the dominant flake hotspot ("9 rounds of mitigation. Cause: stale closures, unstable hook refs, hydration timing"). The retry helpers in
real-time-delivery.spec.tshave already been bumped from 3 attempts to 5 attempts (5 × 60s = 5 minute budget per assertion). They are still hitting the wall.Hard failure
Test: `tests/e2e/messaging/real-time-delivery.spec.ts:342` — "Real-time Message Delivery (T098) > should show delivery status (sent → delivered → read)"
Error: `expect(locator).toBeVisible()` failed after 60s.
Where it fails: line 85 of the `waitForMessageOnPage2` helper. The helper navigates+reloads up to 5 times waiting for the message to appear on the second window. After 5 attempts (~5 min), it gives up. Both retries (`attempt 0` followed by another `attempt 0` — Playwright's retry on failure) failed at the same line.
This is the real-time subscription on page2 not receiving the message that page1 sent. Either:
Flaky (passed on retry)
The fact that two of the three flaky failures are at line 85 (same helper) and the hard failure is also at line 85 strongly suggests one root cause across all four: cross-window realtime propagation under CI.
What's been ruled out
The recent merges to main do not touch this surface:
The previous main commit (`62f8a40`) had a green E2E run, so the hard failure either appeared recently or is intermittent enough to slip through. Given STATUS.md's flake history, intermittent is more likely than regression-from-merge.
Plan
Acceptance
Either:
docs/STABILITY-TRACKING.mdFamily A. Stays open as a watch.Reference