Conversation
Signed-off-by: Tadas Sutkaitis <tadas.sutkaitis@vexxhost.com>
Signed-off-by: Tadas Sutkaitis <tadas.sutkaitis@vexxhost.com>
Signed-off-by: Tadas Sutkaitis <tadas.sutkaitis@vexxhost.com>
There was a problem hiding this comment.
Pull request overview
Adds a Heat (OpenStack Orchestration) collector backed by sqlc-generated MySQL queries to export stack status metrics to Prometheus.
Changes:
- Added sqlc configuration plus Heat schema/query definitions and generated Go DB accessors.
- Implemented Heat orchestration and stacks collectors emitting per-stack status, totals, and an
upmetric. - Wired Heat collector into the global registry and CLI/config flags.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| sqlc.yaml | Adds a new sqlc target for Heat (MySQL) to generate Go query code. |
| sql/heat/schema.sql | Defines the Heat stack table schema used by the collector queries. |
| sql/heat/queries.sql | Introduces GetStackMetrics query for collecting stack attributes/status. |
| internal/db/heat/db.go | sqlc-generated DB wrapper for Heat queries. |
| internal/db/heat/models.go | sqlc-generated models for Heat tables. |
| internal/db/heat/queries.sql.go | sqlc-generated implementation for GetStackMetrics. |
| internal/collector/heat/heat.go | Registers Heat collectors with a registry when configured. |
| internal/collector/heat/orchestration.go | Adds orchestration up metric and delegates to stacks collector. |
| internal/collector/heat/stacks.go | Implements stack metrics collection and status mapping. |
| internal/collector/heat/orchestration_test.go | Tests up behavior based on stacks collection success/failure. |
| internal/collector/heat/stacks_test.go | Tests stack metrics output for success/empty/error cases. |
| internal/collector/collector.go | Adds Heat DB URL to config and registers Heat collectors. |
| cmd/openstack-database-exporter/main.go | Adds --heat.database-url flag and plumbs it into config. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Initialize status counters | ||
| stackStatusCounter := make(map[string]int, len(knownStackStatuses)) | ||
| for _, status := range knownStackStatuses { | ||
| stackStatusCounter[status] = 0 | ||
| } |
There was a problem hiding this comment.
Emitting stack_status_counter by iterating over a map produces non-deterministic metric output order, which can make scrapes and unit tests flaky. Prefer emitting counters in a stable order (e.g., iterate over knownStackStatuses and read stackStatusCounter[status]).
internal/collector/heat/stacks.go
Outdated
| // Stack status counter metrics | ||
| for status, count := range stackStatusCounter { | ||
| ch <- prometheus.MustNewConstMetric( | ||
| stackStatusCounterDesc, | ||
| prometheus.GaugeValue, | ||
| float64(count), | ||
| status, | ||
| ) | ||
| } |
There was a problem hiding this comment.
Emitting stack_status_counter by iterating over a map produces non-deterministic metric output order, which can make scrapes and unit tests flaky. Prefer emitting counters in a stable order (e.g., iterate over knownStackStatuses and read stackStatusCounter[status]).
internal/collector/heat/stacks.go
Outdated
| // Count status occurrences | ||
| stackStatusCounter[stack.Status]++ |
There was a problem hiding this comment.
Incrementing stackStatusCounter[stack.Status] will create new entries for unexpected/empty statuses, which then get exported as additional time series (and also impacts output ordering). If you only want the known statuses, guard the increment by checking membership in knownStackStatuses (or normalize unknowns into a single label such as UNKNOWN).
| // Count status occurrences | |
| stackStatusCounter[stack.Status]++ | |
| // Count status occurrences only for known statuses | |
| if _, ok := stackStatusCounter[stack.Status]; ok { | |
| stackStatusCounter[stack.Status]++ | |
| } |
| func mapStackStatusValue(status string) int { | ||
| for idx, s := range knownStackStatuses { | ||
| if status == s { | ||
| return idx | ||
| } | ||
| } | ||
| return -1 | ||
| } |
There was a problem hiding this comment.
mapStackStatusValue performs a linear scan of knownStackStatuses for every stack. For large numbers of stacks this becomes unnecessary overhead; consider building a map[string]int (status -> value) once (e.g., in init() or as a package-level var) and doing O(1) lookups.
| func (c *StacksCollector) Collect(ch chan<- prometheus.Metric) error { | ||
| ctx := context.Background() | ||
|
|
||
| stacks, err := c.queries.GetStackMetrics(ctx) |
There was a problem hiding this comment.
Using context.Background() means the DB query cannot be cancelled and has no upper bound on runtime during a scrape. Consider using context.WithTimeout (with a reasonable scrape-aligned timeout) so a slow/bad DB doesn’t hang collection.
Signed-off-by: Tadas Sutkaitis <tadas.sutkaitis@vexxhost.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| stackStatusDesc = prometheus.NewDesc( | ||
| prometheus.BuildFQName(Namespace, Subsystem, "stack_status"), | ||
| "stack_status", | ||
| []string{ | ||
| "id", | ||
| "name", | ||
| "project_id", | ||
| "status", | ||
| "action", | ||
| }, | ||
| nil, | ||
| ) | ||
|
|
||
| stackStatusCounterDesc = prometheus.NewDesc( | ||
| prometheus.BuildFQName(Namespace, Subsystem, "stack_status_counter"), | ||
| "stack_status_counter", | ||
| []string{ | ||
| "status", | ||
| }, | ||
| nil, | ||
| ) |
There was a problem hiding this comment.
The metric help strings are identical to the metric names, which makes them much less useful in Prometheus/UIs. Also, stack_status_counter is emitted as a gauge (counts per scrape) but its name suggests a Prometheus counter; consider renaming to something like stack_status_count (or similar) and updating the help text to describe what the metric represents (e.g., “Current number of stacks in each status”).
There was a problem hiding this comment.
Skip - original exporter is using stack_status_counter
internal/collector/heat/stacks.go
Outdated
| stackStatusValueMap = func() map[string]int { | ||
| m := make(map[string]int, len(knownStackStatuses)) | ||
| for idx, s := range knownStackStatuses { | ||
| m[s] = idx | ||
| } | ||
| return m | ||
| }() |
There was a problem hiding this comment.
stack_status encodes the status as an index based on slice order, which makes the metric values fragile: inserting/reordering statuses will silently change the numeric meaning and break dashboards/alerts. Consider using a stable, explicit mapping (map literal with fixed values), or avoid numeric encoding entirely (e.g., emit value 1 and rely on the status label since it’s already present per stack).
| stackStatusValueMap = func() map[string]int { | |
| m := make(map[string]int, len(knownStackStatuses)) | |
| for idx, s := range knownStackStatuses { | |
| m[s] = idx | |
| } | |
| return m | |
| }() | |
| // NOTE: Values are explicitly assigned to keep metric encodings stable even | |
| // if knownStackStatuses is reordered or modified. | |
| stackStatusValueMap = map[string]int{ | |
| "INIT_IN_PROGRESS": 0, | |
| "INIT_FAILED": 1, | |
| "INIT_COMPLETE": 2, | |
| "CREATE_IN_PROGRESS": 3, | |
| "CREATE_FAILED": 4, | |
| "CREATE_COMPLETE": 5, | |
| "DELETE_IN_PROGRESS": 6, | |
| "DELETE_FAILED": 7, | |
| "DELETE_COMPLETE": 8, | |
| "UPDATE_IN_PROGRESS": 9, | |
| "UPDATE_FAILED": 10, | |
| "UPDATE_COMPLETE": 11, | |
| "ROLLBACK_IN_PROGRESS": 12, | |
| "ROLLBACK_FAILED": 13, | |
| "ROLLBACK_COMPLETE": 14, | |
| "SUSPEND_IN_PROGRESS": 15, | |
| "SUSPEND_FAILED": 16, | |
| "SUSPEND_COMPLETE": 17, | |
| "RESUME_IN_PROGRESS": 18, | |
| "RESUME_FAILED": 19, | |
| "RESUME_COMPLETE": 20, | |
| "ADOPT_IN_PROGRESS": 21, | |
| "ADOPT_FAILED": 22, | |
| "ADOPT_COMPLETE": 23, | |
| "SNAPSHOT_IN_PROGRESS": 24, | |
| "SNAPSHOT_FAILED": 25, | |
| "SNAPSHOT_COMPLETE": 26, | |
| "CHECK_IN_PROGRESS": 27, | |
| "CHECK_FAILED": 28, | |
| "CHECK_COMPLETE": 29, | |
| } |
|
|
||
| // Collect metrics from stacks collector and track errors | ||
| if err := c.stacksCollector.Collect(ch); err != nil { | ||
| c.logger.Error("stacks collector failed", "error", err) |
There was a problem hiding this comment.
This logs errors that are already logged inside StacksCollector.Collect (which logs "failed to query stacks" before returning the error), resulting in duplicate error logs for the same failure. Prefer logging in one place (either remove logging inside StacksCollector and let the orchestrator own it, or keep it in StacksCollector and avoid logging again here).
| c.logger.Error("stacks collector failed", "error", err) |
| return &OrchestrationCollector{ | ||
| logger: logger, |
There was a problem hiding this comment.
StacksCollector enriches its logger with namespace/subsystem/collector fields, but OrchestrationCollector uses the raw logger, so its logs may miss consistent context. Consider applying the same logger.With(...) pattern here (e.g., include namespace/subsystem/collector like "collector", "orchestration"), especially since this component emits the service-level up metric.
| return &OrchestrationCollector{ | |
| logger: logger, | |
| orchestrationLogger := logger.With( | |
| "namespace", Namespace, | |
| "subsystem", Subsystem, | |
| "collector", "orchestration", | |
| ) | |
| return &OrchestrationCollector{ | |
| logger: orchestrationLogger, |
sql/heat/queries.sql
Outdated
| COALESCE(s.tenant, '') as tenant, | ||
| s.created_at, | ||
| s.updated_at, | ||
| s.deleted_at, | ||
| COALESCE(s.nested_depth, 0) as nested_depth, | ||
| COALESCE(s.disable_rollback, false) as disable_rollback |
There was a problem hiding this comment.
The collector currently only uses id, name, status, action, and tenant, but the query also selects timestamps, deleted_at, nested_depth, and disable_rollback. Selecting and scanning unused columns increases row size and scan overhead; consider trimming the SELECT list to only what the collector consumes (and regenerate sqlc output accordingly).
| COALESCE(s.tenant, '') as tenant, | |
| s.created_at, | |
| s.updated_at, | |
| s.deleted_at, | |
| COALESCE(s.nested_depth, 0) as nested_depth, | |
| COALESCE(s.disable_rollback, false) as disable_rollback | |
| COALESCE(s.tenant, '') as tenant |
| func mapStackStatusValue(status string) int { | ||
| if v, ok := stackStatusValueMap[status]; ok { | ||
| return v | ||
| } | ||
| return -1 | ||
| } |
There was a problem hiding this comment.
The “unknown status” behavior (returning -1) isn’t covered by the new unit tests. Add a test row with an unexpected status value and assert the exported openstack_orchestration_stack_status sample uses -1 (and that it does not affect stack_status_counter counts).
Signed-off-by: Tadas Sutkaitis <tadas.sutkaitis@vexxhost.com>
No description provided.