You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add query performance monitoring by collecting aggregate stats from pg_stat_statements and sending them as a new postgresql.statements metric type to the selfhost control plane. This enables users to identify slow queries, track execution trends, and set alerts on query performance thresholds.
Problem
Currently, hostlink collects system-level PostgreSQL metrics (connections, cache hit ratio, TPS, replication lag) but has no visibility into which queries are slow or resource-intensive. Users cannot answer:
Which queries take the most total time?
Which queries have the highest average execution time?
Which queries are called most frequently and might benefit from optimization?
Approach
Inspired by New Relic's query performance monitoring: use pg_stat_statements aggregate data per query fingerprint, delta-based collection (per-interval, not cumulative), top 20 queries per tick.
Add metric type constant: MetricTypePostgreSQLStatements = "postgresql.statements"
Collector — update interface and implementation
Extend pgmetrics.Collector interface in internal/pgmetrics/collector.go with a new method: CollectStatements(credential.Credential) (metrics.PostgreSQLStatementMetrics, error) (avoids breaking the existing Collect signature)
Add internal state for statement delta tracking to pgmetrics struct: lastStatements map[string]pgStatementStats and lastStatementsTime time.Time
Implement collectStatementMetrics():
SELECT queryid::text, query, calls,
total_exec_time, min_exec_time, max_exec_time, mean_exec_time,
rows, shared_blks_hit, shared_blks_read, temp_blks_written, wal_bytes
FROM pg_stat_statements
WHERE dbid = (SELECToidFROM pg_database WHERE datname = current_database())
ORDER BY total_exec_time DESCLIMIT20
Delta-based calculation: store previous snapshot keyed by queryid, compute per-interval deltas for calls, total_exec_time, recompute mean_exec_time from deltas
First collection returns empty Queries slice — establishes baseline
Gracefully skip if pg_stat_statements extension is not available (check error code, log warning, return empty metrics — do NOT fail the push)
Pusher — wire into metricspusher
Update NewWithDependencies() in app/services/metrics/metrics.go to accept the extended pgmetrics.Collector (already the dependency — just needs to call the new method)
In Push(), call mp.metricscollector.CollectStatements(cred) after existing PG collection
Append MetricSet{Type: MetricTypePostgreSQLStatements, Metrics: stmtMetrics} to metricSets
Include even when Queries is empty (signals extension available but no activity)
Tests
Unit tests for delta calculation: first collection → empty, subsequent → correct deltas, query appearing/disappearing between collections, stats reset
Unit test for graceful handling when extension is missing (mock returns error, verify empty result returned without propagation)
Integration test with PostgreSQL container (requires shared_preload_libraries = 'pg_stat_statements') — verify actual queries appear in results
Add branch inside the metric_sets.each loop: when metric_set["type"] == "postgresql.statements", iterate over metric_set["metrics"]["queries"] array and create AgentQuerySample records instead of AgentHeartbeat
Continue creating AgentHeartbeat for all other metric types (no change to existing flow)
ActionCable broadcast: include postgresql.statements metric set in the broadcast payload (same as other types — no special handling needed)
API — query performance endpoint
New route: GET /organizations/:id/query_performance?agent_pid=...&start_date=...&end_date=...&sort_by=total_exec_time|mean_exec_time|calls&limit=20
New action in organizations_controller (follows pattern of instance_metrics action at line 155)
Add when "postgresql.statements" branch in EvaluationService#query_heartbeat (app/services/alerting/evaluation_service.rb:107)
Add query_statement_metrics(instance) method: query latest AgentQuerySample per query_id, extract the alert's metric_name from the sample with highest mean_exec_time_ms (alert on worst query, not average across all)
Health check — optional
In InstanceHealthCheckJob, optionally check if postgresql.statements data is flowing (staleness check like other metric types)
Summary
Add query performance monitoring by collecting aggregate stats from
pg_stat_statementsand sending them as a newpostgresql.statementsmetric type to the selfhost control plane. This enables users to identify slow queries, track execution trends, and set alerts on query performance thresholds.Problem
Currently, hostlink collects system-level PostgreSQL metrics (connections, cache hit ratio, TPS, replication lag) but has no visibility into which queries are slow or resource-intensive. Users cannot answer:
Approach
Inspired by New Relic's query performance monitoring: use
pg_stat_statementsaggregate data per query fingerprint, delta-based collection (per-interval, not cumulative), top 20 queries per tick.Requirements
Hostlink (this repo)
Domain — new metric type
MetricTypePostgreSQLStatements = "postgresql.statements"constant indomain/metrics/metrics.goalongside existing constants (MetricTypeSystem,MetricTypeNetwork,MetricTypePostgreSQLDatabase,MetricTypeStorage)QuerySamplestruct indomain/metrics/metrics.go:PostgreSQLStatementMetricswrapper struct:MetricTypePostgreSQLStatements = "postgresql.statements"Collector — update interface and implementation
pgmetrics.Collectorinterface ininternal/pgmetrics/collector.gowith a new method:CollectStatements(credential.Credential) (metrics.PostgreSQLStatementMetrics, error)(avoids breaking the existingCollectsignature)pgmetricsstruct:lastStatements map[string]pgStatementStatsandlastStatementsTime time.TimecollectStatementMetrics():queryid, compute per-interval deltas forcalls,total_exec_time, recomputemean_exec_timefrom deltasQueriesslice — establishes baselinepg_stat_statementsextension is not available (check error code, log warning, return empty metrics — do NOT fail the push)Pusher — wire into metricspusher
NewWithDependencies()inapp/services/metrics/metrics.goto accept the extendedpgmetrics.Collector(already the dependency — just needs to call the new method)Push(), callmp.metricscollector.CollectStatements(cred)after existing PG collectionMetricSet{Type: MetricTypePostgreSQLStatements, Metrics: stmtMetrics}tometricSetsQueriesis empty (signals extension available but no activity)Tests
shared_preload_libraries = 'pg_stat_statements') — verify actual queries appear in resultsSelfhost (separate PR)
Database — new table
agent_query_samplestable:(agent_pid, timestamp),(agent_pid, query_id, timestamp)agent_pid→agents.pid(follows same pattern asagent_heartbeats)Model
AgentQuerySamplemodel inapp/models/agent_query_sample.rb:has_many :agent_query_samplestoAgentmodelIngestion — heartbeats_controller#create
metric_sets.eachloop: whenmetric_set["type"] == "postgresql.statements", iterate overmetric_set["metrics"]["queries"]array and createAgentQuerySamplerecords instead ofAgentHeartbeatAgentHeartbeatfor all other metric types (no change to existing flow)postgresql.statementsmetric set in the broadcast payload (same as other types — no special handling needed)API — query performance endpoint
GET /organizations/:id/query_performance?agent_pid=...&start_date=...&end_date=...&sort_by=total_exec_time|mean_exec_time|calls&limit=20organizations_controller(follows pattern ofinstance_metricsaction at line 155)AgentQuerySample.where(agent_pid:).where(timestamp: range).order(sort_by => :desc).limit(limit)query_idfor time-series view of individual query trendsAlerting
AlertRule::VALID_METRICSinapp/models/alert_rule.rb:when "postgresql.statements"branch inEvaluationService#query_heartbeat(app/services/alerting/evaluation_service.rb:107)query_statement_metrics(instance)method: query latestAgentQuerySampleperquery_id, extract the alert'smetric_namefrom the sample with highestmean_exec_time_ms(alert on worst query, not average across all)Health check — optional
InstanceHealthCheckJob, optionally check ifpostgresql.statementsdata is flowing (staleness check like other metric types)Architecture
Wire changes summary
domain/metrics/metrics.goMetricTypePostgreSQLStatementsconstant,QuerySamplestruct,PostgreSQLStatementMetricsstructinternal/pgmetrics/collector.goCollectStatements()toCollectorinterface, implement with delta trackingapp/services/metrics/metrics.goCollectStatements()inPush(), append newMetricSetapp/jobs/metricsjob/metricsjob.gomp.Push(cred)which handles everything)Deferred (v2+)
track_io_timing)References