Skip to content

Show 'Switch to stable version' option on fatal/non-user-interactive errors in canary/rc builds #1455

@marcuscastelo

Description

@marcuscastelo

Feature Request Template

Title:
Show "Switch to stable version" option on fatal/non-user-interactive errors in canary/rc builds

Description:
When the app running a canary/rc (non-stable) build encounters a fatal, critical or otherwise non-user-interactive error (one that prevents normal app usage), present the user with an explicit option to switch to the stable release of the app. The UI affordance should make it obvious that the current session is using a canary/rc URL and that switching will re-open the same flow on the stable production URL (button labelled "Trocar de versão" for the Portuguese UI).

Motivation:

  • Users running canary/rc builds should have a low-friction escape when an unrecoverable error prevents them from continuing.
  • Some errors are transient (e.g., network, CDN, transient backend outage) and switching to the stable release will usually restore functionality.
  • Other errors indicate a bug in the canary build; surfacing an easy way to switch reduces user frustration and support load while retaining telemetry to identify regressions.

Proposed Solution:

  1. Error classification

    • Classify errors that should show the switch option vs errors that should not, using a combination of heuristics:
      • Error kind: network/timeouts, API 5xx, feature-flag-only failures => candidate for "switch".
      • Runtime exceptions with stack traces referencing canary-only modules or new code paths (e.g., experimental feature files) => prefer reporting (no automatic suggestion) but still offer switch as a secondary option.
      • Recurrence/frequency: if an identical error occurs > N times in a short window (telemetry), escalate the suggestion to the user.
      • Source: client-only unrecoverable errors and server 5xx for canary endpoints are better candidates than user-actionable validation errors.
    • Make the heuristics configurable (thresholds, enabled/disabled via a flag) so they can be tuned.
  2. UI behavior

    • On detection of a qualifying fatal/non-interactive error, show a full-screen or prominent modal with:
      • Short explanation in user's locale (pt-BR fallback): "Ocorreu um erro na versão canary. Deseja abrir a versão estável?"
      • Primary button: "Trocar de versão" — opens same route in stable URL (switch domain/path version parameter as appropriate).
      • Secondary button: "Reportar" or "Recarregar" depending on error kind.
      • Small link: "Manter versão canary" to dismiss and continue (if possible).
    • Default URL composition:
      • Canary/rc sessions use the usual canary URL (existing behavior).
      • Stable switch maps to the canonical stable/release URL (documented mapping in code/config).
  3. Observability & telemetry

    • Emit an event when the switch suggestion is shown: event contains error signature, heuristic reason, session id, current version, target stable version, and user action (switched/dismissed).
    • If user switches, log the action and the resulting page load success/failure.
  4. Implementation details

    • Add a small "error-handling" module/hook used by global error boundary and top-level route error handlers.
    • Integrate with existing observability: src/modules/observability/* and src/instrument.server.ts.
    • Add UI component under src/sections/common/ or src/shared/ (no barrel files).
    • Provide feature-flag to control rollout and a runtime config for detection thresholds.

Acceptance Criteria:

  • When the app encounters a qualifying fatal/non-user-interactive error in a canary/rc session, the UI shows a clear "Trocar de versão" option.
  • Clicking "Trocar de versão" opens the same route in the stable release URL and the user can continue their flow if the stable server is healthy.
  • The detection heuristics are implemented with configurable thresholds and documented defaults.
  • Telemetry events are emitted when the suggestion is shown and when the user acts on it (switched/dismissed).
  • Unit tests for the heuristic logic and UI tests for the modal/button behavior exist and pass.
  • Documentation update describing the behavior, config keys, and the intended URL mapping between canary/rc and stable.

Additional Context:

  • Suggested places to integrate:
    • Global error boundaries / entry points: src/entry-client.tsx, src/entry-server.tsx, src/middleware.ts
    • Observability: src/modules/observability/*, src/instrument.server.ts
    • UI component: src/sections/common/ErrorSwitchToStable.tsx (or src/shared/ui/ErrorSwitchToStable.tsx)
    • Config: src/app-version.ts, app.config.ts (or runtime environment)
  • Important considerations:
    • Avoid noisy suggestions for errors that are clearly user-caused (validation, permission).
    • Respect user preference and make it dismissible.
    • Make the detection logic conservative by default to avoid masking real regressions; iterate thresholds based on telemetry.
    • Ensure no barrel index.ts files are added; import components directly per repo rules.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureRequest or implement a new feature

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions