Skip to content

Comments

feat: add heat metrics#52

Open
Tadas Sutkaitis (fitbeard) wants to merge 7 commits intomainfrom
feat/add_heat
Open

feat: add heat metrics#52
Tadas Sutkaitis (fitbeard) wants to merge 7 commits intomainfrom
feat/add_heat

Conversation

@fitbeard
Copy link
Contributor

No description provided.

Signed-off-by: Tadas Sutkaitis <tadas.sutkaitis@vexxhost.com>
Signed-off-by: Tadas Sutkaitis <tadas.sutkaitis@vexxhost.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a Heat (OpenStack Orchestration) collector backed by sqlc-generated MySQL queries to export stack status metrics to Prometheus.

Changes:

  • Added sqlc configuration plus Heat schema/query definitions and generated Go DB accessors.
  • Implemented Heat orchestration and stacks collectors emitting per-stack status, totals, and an up metric.
  • Wired Heat collector into the global registry and CLI/config flags.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
sqlc.yaml Adds a new sqlc target for Heat (MySQL) to generate Go query code.
sql/heat/schema.sql Defines the Heat stack table schema used by the collector queries.
sql/heat/queries.sql Introduces GetStackMetrics query for collecting stack attributes/status.
internal/db/heat/db.go sqlc-generated DB wrapper for Heat queries.
internal/db/heat/models.go sqlc-generated models for Heat tables.
internal/db/heat/queries.sql.go sqlc-generated implementation for GetStackMetrics.
internal/collector/heat/heat.go Registers Heat collectors with a registry when configured.
internal/collector/heat/orchestration.go Adds orchestration up metric and delegates to stacks collector.
internal/collector/heat/stacks.go Implements stack metrics collection and status mapping.
internal/collector/heat/orchestration_test.go Tests up behavior based on stacks collection success/failure.
internal/collector/heat/stacks_test.go Tests stack metrics output for success/empty/error cases.
internal/collector/collector.go Adds Heat DB URL to config and registers Heat collectors.
cmd/openstack-database-exporter/main.go Adds --heat.database-url flag and plumbs it into config.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +110 to +114
// Initialize status counters
stackStatusCounter := make(map[string]int, len(knownStackStatuses))
for _, status := range knownStackStatuses {
stackStatusCounter[status] = 0
}
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emitting stack_status_counter by iterating over a map produces non-deterministic metric output order, which can make scrapes and unit tests flaky. Prefer emitting counters in a stable order (e.g., iterate over knownStackStatuses and read stackStatusCounter[status]).

Copilot uses AI. Check for mistakes.
Comment on lines 142 to 150
// Stack status counter metrics
for status, count := range stackStatusCounter {
ch <- prometheus.MustNewConstMetric(
stackStatusCounterDesc,
prometheus.GaugeValue,
float64(count),
status,
)
}
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emitting stack_status_counter by iterating over a map produces non-deterministic metric output order, which can make scrapes and unit tests flaky. Prefer emitting counters in a stable order (e.g., iterate over knownStackStatuses and read stackStatusCounter[status]).

Copilot uses AI. Check for mistakes.
Comment on lines 125 to 126
// Count status occurrences
stackStatusCounter[stack.Status]++
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incrementing stackStatusCounter[stack.Status] will create new entries for unexpected/empty statuses, which then get exported as additional time series (and also impacts output ordering). If you only want the known statuses, guard the increment by checking membership in knownStackStatuses (or normalize unknowns into a single label such as UNKNOWN).

Suggested change
// Count status occurrences
stackStatusCounter[stack.Status]++
// Count status occurrences only for known statuses
if _, ok := stackStatusCounter[stack.Status]; ok {
stackStatusCounter[stack.Status]++
}

Copilot uses AI. Check for mistakes.
Comment on lines 155 to 162
func mapStackStatusValue(status string) int {
for idx, s := range knownStackStatuses {
if status == s {
return idx
}
}
return -1
}
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mapStackStatusValue performs a linear scan of knownStackStatuses for every stack. For large numbers of stacks this becomes unnecessary overhead; consider building a map[string]int (status -> value) once (e.g., in init() or as a package-level var) and doing O(1) lookups.

Copilot uses AI. Check for mistakes.
Comment on lines 101 to 104
func (c *StacksCollector) Collect(ch chan<- prometheus.Metric) error {
ctx := context.Background()

stacks, err := c.queries.GetStackMetrics(ctx)
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using context.Background() means the DB query cannot be cancelled and has no upper bound on runtime during a scrape. Consider using context.WithTimeout (with a reasonable scrape-aligned timeout) so a slow/bad DB doesn’t hang collection.

Copilot uses AI. Check for mistakes.
Signed-off-by: Tadas Sutkaitis <tadas.sutkaitis@vexxhost.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +57 to +77
stackStatusDesc = prometheus.NewDesc(
prometheus.BuildFQName(Namespace, Subsystem, "stack_status"),
"stack_status",
[]string{
"id",
"name",
"project_id",
"status",
"action",
},
nil,
)

stackStatusCounterDesc = prometheus.NewDesc(
prometheus.BuildFQName(Namespace, Subsystem, "stack_status_counter"),
"stack_status_counter",
[]string{
"status",
},
nil,
)
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metric help strings are identical to the metric names, which makes them much less useful in Prometheus/UIs. Also, stack_status_counter is emitted as a gauge (counts per scrape) but its name suggests a Prometheus counter; consider renaming to something like stack_status_count (or similar) and updating the help text to describe what the metric represents (e.g., “Current number of stacks in each status”).

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skip - original exporter is using stack_status_counter

Comment on lines 49 to 55
stackStatusValueMap = func() map[string]int {
m := make(map[string]int, len(knownStackStatuses))
for idx, s := range knownStackStatuses {
m[s] = idx
}
return m
}()
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stack_status encodes the status as an index based on slice order, which makes the metric values fragile: inserting/reordering statuses will silently change the numeric meaning and break dashboards/alerts. Consider using a stable, explicit mapping (map literal with fixed values), or avoid numeric encoding entirely (e.g., emit value 1 and rely on the status label since it’s already present per stack).

Suggested change
stackStatusValueMap = func() map[string]int {
m := make(map[string]int, len(knownStackStatuses))
for idx, s := range knownStackStatuses {
m[s] = idx
}
return m
}()
// NOTE: Values are explicitly assigned to keep metric encodings stable even
// if knownStackStatuses is reordered or modified.
stackStatusValueMap = map[string]int{
"INIT_IN_PROGRESS": 0,
"INIT_FAILED": 1,
"INIT_COMPLETE": 2,
"CREATE_IN_PROGRESS": 3,
"CREATE_FAILED": 4,
"CREATE_COMPLETE": 5,
"DELETE_IN_PROGRESS": 6,
"DELETE_FAILED": 7,
"DELETE_COMPLETE": 8,
"UPDATE_IN_PROGRESS": 9,
"UPDATE_FAILED": 10,
"UPDATE_COMPLETE": 11,
"ROLLBACK_IN_PROGRESS": 12,
"ROLLBACK_FAILED": 13,
"ROLLBACK_COMPLETE": 14,
"SUSPEND_IN_PROGRESS": 15,
"SUSPEND_FAILED": 16,
"SUSPEND_COMPLETE": 17,
"RESUME_IN_PROGRESS": 18,
"RESUME_FAILED": 19,
"RESUME_COMPLETE": 20,
"ADOPT_IN_PROGRESS": 21,
"ADOPT_FAILED": 22,
"ADOPT_COMPLETE": 23,
"SNAPSHOT_IN_PROGRESS": 24,
"SNAPSHOT_FAILED": 25,
"SNAPSHOT_COMPLETE": 26,
"CHECK_IN_PROGRESS": 27,
"CHECK_FAILED": 28,
"CHECK_COMPLETE": 29,
}

Copilot uses AI. Check for mistakes.

// Collect metrics from stacks collector and track errors
if err := c.stacksCollector.Collect(ch); err != nil {
c.logger.Error("stacks collector failed", "error", err)
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logs errors that are already logged inside StacksCollector.Collect (which logs "failed to query stacks" before returning the error), resulting in duplicate error logs for the same failure. Prefer logging in one place (either remove logging inside StacksCollector and let the orchestrator own it, or keep it in StacksCollector and avoid logging again here).

Suggested change
c.logger.Error("stacks collector failed", "error", err)

Copilot uses AI. Check for mistakes.
Comment on lines 25 to 26
return &OrchestrationCollector{
logger: logger,
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StacksCollector enriches its logger with namespace/subsystem/collector fields, but OrchestrationCollector uses the raw logger, so its logs may miss consistent context. Consider applying the same logger.With(...) pattern here (e.g., include namespace/subsystem/collector like "collector", "orchestration"), especially since this component emits the service-level up metric.

Suggested change
return &OrchestrationCollector{
logger: logger,
orchestrationLogger := logger.With(
"namespace", Namespace,
"subsystem", Subsystem,
"collector", "orchestration",
)
return &OrchestrationCollector{
logger: orchestrationLogger,

Copilot uses AI. Check for mistakes.
Comment on lines 7 to 12
COALESCE(s.tenant, '') as tenant,
s.created_at,
s.updated_at,
s.deleted_at,
COALESCE(s.nested_depth, 0) as nested_depth,
COALESCE(s.disable_rollback, false) as disable_rollback
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The collector currently only uses id, name, status, action, and tenant, but the query also selects timestamps, deleted_at, nested_depth, and disable_rollback. Selecting and scanning unused columns increases row size and scan overhead; consider trimming the SELECT list to only what the collector consumes (and regenerate sqlc output accordingly).

Suggested change
COALESCE(s.tenant, '') as tenant,
s.created_at,
s.updated_at,
s.deleted_at,
COALESCE(s.nested_depth, 0) as nested_depth,
COALESCE(s.disable_rollback, false) as disable_rollback
COALESCE(s.tenant, '') as tenant

Copilot uses AI. Check for mistakes.
Comment on lines +166 to +171
func mapStackStatusValue(status string) int {
if v, ok := stackStatusValueMap[status]; ok {
return v
}
return -1
}
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “unknown status” behavior (returning -1) isn’t covered by the new unit tests. Add a test row with an unexpected status value and assert the exported openstack_orchestration_stack_status sample uses -1 (and that it does not affect stack_status_counter counts).

Copilot uses AI. Check for mistakes.
Signed-off-by: Tadas Sutkaitis <tadas.sutkaitis@vexxhost.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant