Skip to content

fix(analytics): run AccountStore blocking calls on std::thread (#1476)#1506

Open
bingran-you wants to merge 1 commit intodevfrom
fix/analytics-track-postgres-drop
Open

fix(analytics): run AccountStore blocking calls on std::thread (#1476)#1506
bingran-you wants to merge 1 commit intodevfrom
fix/analytics-track-postgres-drop

Conversation

@bingran-you
Copy link
Copy Markdown
Contributor

Summary

Closes the latent abort path flagged in #1476 (follow-up to #1454 / #1451).

The three remaining tokio::task::spawn_blocking(move || store.…) call sites in scheduler_module::service::analytics/analytics/track (account resolution + event insert) and /analytics/dashboard — now run on a dedicated std::thread via a new AccountStore::run_blocking helper that awaits the result through tokio::sync::oneshot.

Without this, any Supabase hiccup that causes r2d2 to recycle a broken connection on these endpoints will:

  1. Drop a sync postgres::Client on a tokio blocking-pool thread (they carry a runtime Handle).
  2. Drop calls Runtime::block_on(close_rendezvous()) → panics with "Cannot start a runtime from within a runtime".
  3. Panic fires inside a destructor → non-unwinding → abort() → PM2 restart.

Worker log evidence from the post-#1454 production / staging scan today (2026-04-21): /home/azureuser/.pm2/logs/dw-worker-error.log still contains this exact panic, with prod dw_worker showing 118 PM2 restarts and staging dw_worker 151 restarts accumulated since the last hot-swap window.

Changes

The follow-up service/billing.rs (5 sites) and service/auth.rs (40+ sites) listed in #1476 are intentionally deferred to keep this PR focused and reviewable.

Test plan

  • cargo check -p scheduler_module — clean (pre-existing warnings only).
  • cargo test -p scheduler_module analytics (integration tests in CI).
  • After merge + deploy, confirm /home/azureuser/.pm2/logs/dw-worker-error.log stops accumulating new postgres-0.19.12/src/connection.rs:66:22 panics on prod / staging.

Refs: #1451, #1454, #1476.

Replaces the three remaining `tokio::task::spawn_blocking` call sites in
`service::analytics` (the `/analytics/track` and `/analytics/dashboard`
handlers) with a new `AccountStore::run_blocking` helper that offloads
the sync `postgres` / r2d2 work onto a dedicated `std::thread` and
awaits the result via `tokio::sync::oneshot`.

This closes the latent abort path carried over from PR #1454 (see
issue #1476 for the audit and #1451 for the original production
crash): when r2d2 recycles a broken Supabase connection, the sync
`postgres::Client::drop` impl calls `Runtime::block_on`, which panics
with "Cannot start a runtime from within a runtime" on tokio
blocking-pool threads (they still carry a runtime Handle) and aborts
the whole worker process because the panic fires inside a destructor.

- Add `AccountStore::run_blocking<F, T>` as the await-able counterpart
  to the existing `record_analytics_event_detached` helper.
- Switch `track_event` (both `get_account_by_auth_user` and
  `record_analytics_event`) and `get_dashboard` to use it.
- Drop the now-unused `tokio::task` import in `service/analytics.rs`.
- Document the rule at each replaced call site so future edits keep
  the pattern.

The follow-up `service/billing.rs` (5 sites) and `service/auth.rs`
(40+ sites) listed in #1476 are intentionally deferred to keep this
change focused.
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
dowhiz Ready Ready Preview, Comment Apr 21, 2026 1:20pm

Copy link
Copy Markdown
Contributor Author

@bingran-you bingran-you left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern before merge: the new helper in DoWhiz_service/scheduler_module/src/account_store.rs:1266 documents that synchronous AccountStore r2d2/postgres work should never run on tokio::task::spawn_blocking, because recycling a postgres::Client there can abort the worker. This PR migrates service/analytics.rs, but the same spawn_blocking(move || store.*) pattern is still present in other async HTTP paths such as DoWhiz_service/scheduler_module/src/service/auth.rs:707 and DoWhiz_service/scheduler_module/src/service/billing.rs:140.

If the root cause is general to AccountStore, the crash path is still reachable through those handlers, so the invariant introduced here is only partially enforced. If analytics is the only affected surface, I think the new helper/docs should be scoped down to say that explicitly; otherwise I’d prefer to migrate the remaining request-path call sites or centralize the safe wrapper before calling this fixed.

This reply was drafted by breeze, an autonomous agent running on behalf of the account owner.

@bingran-you bingran-you added breeze:done Breeze finished handling this item and removed breeze:wip Breeze is actively working on this item labels Apr 21, 2026
Copy link
Copy Markdown
Contributor Author

@bingran-you bingran-you left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern before merge: the new helper in DoWhiz_service/scheduler_module/src/account_store.rs documents that synchronous AccountStore r2d2/postgres work should never run on tokio::task::spawn_blocking, because recycling a postgres::Client there can abort the worker. This PR migrates service/analytics.rs, but the same spawn_blocking(move || store.*) pattern is still present in other async HTTP paths such as DoWhiz_service/scheduler_module/src/service/auth.rs and DoWhiz_service/scheduler_module/src/service/billing.rs.

If the root cause is general to AccountStore, the crash path is still reachable through those handlers, so the invariant introduced here is only partially enforced. If analytics is the only affected surface, I think the new helper/docs should be scoped down to say that explicitly; otherwise I’d prefer to migrate the remaining request-path call sites or centralize the safe wrapper before calling this fixed.

This reply was drafted by breeze, an autonomous agent running on behalf of the account owner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breeze:done Breeze finished handling this item

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant