Add PgBouncer metrics collection to agent#178
Conversation
Collect connection pool statistics from PgBouncer admin console (port 6432) using SHOW POOLS + SHOW STATS. Uses a try-connect approach — silently marks Up: false when PgBouncer is not running, so the server can track pooler state without requiring any config flag changes. - New pgbouncermetrics package: connects via selfhostadmin credential, aggregates pool/stats counters into PgBouncerMetrics struct - domain/metrics: adds MetricTypePgBouncer constant and PgBouncerMetrics type - metrics service: includes pgbouncer.stats metric set in every push cycle - metrics_test: adds MockPgBouncerCollector and updates all Push tests for the new always-included metric set Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
iAziz786
left a comment
There was a problem hiding this comment.
Code review findings for PgBouncer metrics collection
…rages maxwait_us is the sub-second microsecond remainder of the wait, not the full duration. Combine with maxwait (whole seconds) using the correct formula: maxwait*1000 + maxwait_us/1000. Previously maxwait_us alone was used whenever nonzero, dropping the whole-seconds component entirely. Latency averages in collectStats are now weighted by avg_query_count so high-traffic databases dominate the aggregate rather than each database row contributing equally regardless of query volume. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
iAziz786
left a comment
There was a problem hiding this comment.
Second-pass review findings after the follow-up fix
| qps := parseFloat(row["avg_query_count"]) | ||
| m.TotalQueriesPerSec += qps | ||
| weightedQueryTime += parseFloat(row["avg_query_time"]) * qps // microseconds · qps | ||
| weightedWaitTime += parseFloat(row["avg_wait_time"]) * qps // microseconds · qps | ||
| totalWeight += qps | ||
| } | ||
|
|
||
| if totalWeight > 0 { | ||
| m.AvgQueryTimeMs = weightedQueryTime / totalWeight / 1000 | ||
| m.AvgWaitTimeMs = weightedWaitTime / totalWeight / 1000 |
There was a problem hiding this comment.
Bug: avg_wait_time is being weighted by query rate, but PgBouncer defines it per server assignment
avg_query_time should be weighted by avg_query_count, but avg_wait_time should not. In PgBouncer source (calc_average in src/stats.c), avg_wait_time = delta wait_time / server_assignment_count, not / query_count.
Weighting wait time by avg_query_count overweights databases with many queries per server assignment (session pooling / multi-query transactions). AvgWaitTimeMs should use avg_server_assignment_count as the weight, or be derived from SHOW TOTALS as total_wait_time / total_server_assignment_count.
PgBouncer emits SHOW STATS/SHOW POOLS NUMERIC columns as []byte via
lib/pq. fmt.Sprintf("%v", []byte) produces "[49 50 46 53]" rather than
"12.5", causing parseFloat/parseInt to return 0. Add a type switch to
call string(v) for []byte values so avg_query_count, avg_query_time,
and avg_wait_time parse correctly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Collect connection pool statistics from PgBouncer admin console (port 6432) using SHOW POOLS + SHOW STATS. Uses a try-connect approach — silently marks Up: false when PgBouncer is not running, so the server can track pooler state without requiring any config flag changes.