Skip to content

fix: enhance database session initialization with configurable pool p…#277

Open
gdm-allo wants to merge 1 commit intoeneo-ai:developfrom
gdm-allo:fix/static-pool-size
Open

fix: enhance database session initialization with configurable pool p…#277
gdm-allo wants to merge 1 commit intoeneo-ai:developfrom
gdm-allo:fix/static-pool-size

Conversation

@gdm-allo
Copy link
Copy Markdown
Contributor

Make database pool settings configurable instead of hardcoded values.

@CCimen
Copy link
Copy Markdown
Contributor

CCimen commented Mar 23, 2026

Thank you for your contribution and for picking this up!

Before we go deeper on the implementation, I'd like to understand the motivation: what problem are you trying to solve by making these configurable? Are you hitting a specific issue, or did you notice the env vars in the template weren't wired and assumed that was a bug?

The pool settings were defined in config and documented in the env template but intentionally never wired to the engine. The reason is that changing pool size has significant operational implications, particularly for the worker. Non-crawl jobs (SharePoint sync, app runs, uploads, audit exports) hold a DB session for their entire duration, and the pool is shared across web (3 Gunicorn workers by default) and the ARQ worker. Setting the pool too low without adjusting WORKER_MAX_JOBS causes pool exhaustion, which cascades: the watchdog can't reconcile zombie Redis counters, orphaned crawl cleanup can't run, and stale jobs block new crawls via the duplicate guard. We've seen this exact cascade in production at Sundsvall (Dec 2025), where it blocked tenants for hours.

Depending on the answer, there are two directions here:

If there's a real need for configurability, this PR needs a few additions: startup validation that warns when worker_max_jobs > pool_size + max_overflow, documentation of the risks, and keeping pool_pre_ping default as False to match current behavior (the current engine doesn't set it, and enabling it adds a SELECT 1 roundtrip on every connection checkout, which is a behavioral change that deserves its own discussion).

If there's no specific use case, the cleaner path might be to remove the unused config from config.py and env_backend.template rather than wiring it up. Less configuration surface means less risk of misconfiguration.

A couple of smaller things regardless: the init() function signature duplicates the defaults from config.py, which creates two sources of truth. And db_pool_debug is defined in config but not wired here. Also, from a semver perspective this would be a feat: (new capability), not a fix:.

Could you also create a GitHub issue and link it to this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants