Conversation
Signed-off-by: Tadas Sutkaitis <tadas.sutkaitis@vexxhost.com>
There was a problem hiding this comment.
Pull request overview
Adds Prometheus metrics collection for OpenStack Ironic by generating a new sqlc-backed DB access layer and wiring a new Ironic collector into the exporter configuration/CLI.
Changes:
- Added sqlc configuration + schema/queries for an Ironic MySQL database.
- Introduced an Ironic “baremetal” umbrella collector and a nodes sub-collector that exports
openstack_ironic_nodeandopenstack_ironic_up. - Wired Ironic database URL through config and CLI flags; added collector tests with sqlmock.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| sqlc.yaml | Adds sqlc generation target for Ironic MySQL schema/queries. |
| sql/ironic/schema.sql | Defines nodes table schema for sqlc type generation. |
| sql/ironic/queries.sql | Adds GetNodeMetrics query used by collectors. |
| internal/db/ironic/db.go | Generated sqlc DBTX + Queries wrapper for Ironic. |
| internal/db/ironic/models.go | Generated sqlc Node model from schema. |
| internal/db/ironic/queries.sql.go | Generated sqlc query + row struct for GetNodeMetrics. |
| internal/collector/ironic/ironic.go | Registers Ironic collectors when DB URL is configured. |
| internal/collector/ironic/baremetal.go | Umbrella collector exporting openstack_ironic_up and delegating to sub-collectors. |
| internal/collector/ironic/nodes.go | Exports per-node “info-style” metric with labels. |
| internal/collector/ironic/baremetal_test.go | Tests “up” and node metrics with sqlmock. |
| internal/collector/collector.go | Adds Ironic to global registry + config struct. |
| cmd/openstack-database-exporter/main.go | Adds --ironic.database-url flag/env and passes into config. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Test database connectivity by running a simple query | ||
| _, err := c.queries.GetNodeMetrics(ctx) | ||
| if err != nil { | ||
| c.logger.Error("failed to query Ironic database", "error", err) |
There was a problem hiding this comment.
GetNodeMetrics appears to fetch (and sort) the full nodes dataset, so using it as the 'up' probe makes each scrape do an expensive query twice (once here, once in NodesCollector). Consider using a lightweight probe query (e.g., SELECT 1 / LIMIT 1) or reusing the nodes query result to both determine up and emit node metrics.
| // Test database connectivity by running a simple query | |
| _, err := c.queries.GetNodeMetrics(ctx) | |
| if err != nil { | |
| c.logger.Error("failed to query Ironic database", "error", err) | |
| // Test database connectivity using a lightweight ping | |
| if err := c.db.PingContext(ctx); err != nil { | |
| c.logger.Error("failed to ping Ironic database", "error", err) |
internal/collector/ironic/nodes.go
Outdated
| nodeUUID := "" | ||
| if node.Uuid.Valid { | ||
| nodeUUID = node.Uuid.String | ||
| } | ||
|
|
||
| // Emit individual node metric | ||
| metric, err := prometheus.NewConstMetric( | ||
| c.nodeMetric, | ||
| prometheus.GaugeValue, | ||
| 1, | ||
| nodeUUID, name, powerState, provisionState, resourceClass, maintenance, consoleEnabled, retired, retiredReason, | ||
| ) |
There was a problem hiding this comment.
If uuid is NULL/empty for more than one row, this will emit multiple samples with identical labelsets (e.g., id=\"\"), which can cause Prometheus client gathering to fail with a duplicate-metrics error. To prevent scrape failures, ensure the selected identifier label is always unique (e.g., filter out rows where uuid IS NULL, skip rows with invalid UUIDs, or use a guaranteed-unique column like the numeric primary key for the exported label).
| []string{ | ||
| "id", "name", "power_state", "provision_state", | ||
| "resource_class", "maintenance", "console_enabled", "retired", "retired_reason", | ||
| }, |
There was a problem hiding this comment.
retired_reason is free-form text and can be high-cardinality / unbounded-length, which is strongly discouraged for Prometheus labels and can cause memory blowups and slow scrapes. Consider removing retired_reason as a label (or mapping it to a bounded set), and exposing it via logs or an alternate mechanism; keeping only bounded-state labels (or emitting separate state metrics) will be safer.
| // Describe implements prometheus.Collector | ||
| func (c *NodesCollector) Describe(ch chan<- *prometheus.Desc) { | ||
| ch <- c.nodeMetric | ||
| } | ||
|
|
||
| // Collect implements prometheus.Collector | ||
| func (c *NodesCollector) Collect(ch chan<- prometheus.Metric) error { |
There was a problem hiding this comment.
NodesCollector does not implement prometheus.Collector because Collect returns an error (the interface requires Collect(chan<- Metric) with no return value). Either update the comment to avoid claiming interface conformance, or change the method signature/structure so it actually implements prometheus.Collector.
| type NodesCollector struct { | ||
| db *sql.DB | ||
| queries *ironicdb.Queries | ||
| logger *slog.Logger |
There was a problem hiding this comment.
In this file, db *sql.DB is stored on the struct but never used. Dropping the field (and only keeping queries) would reduce surface area and avoid confusion about which DB handle is intended to be used.
| type BaremetalCollector struct { | ||
| db *sql.DB | ||
| queries *ironicdb.Queries | ||
| logger *slog.Logger |
There was a problem hiding this comment.
In this file, db *sql.DB is stored on the struct but not referenced afterwards. Consider removing it and relying on queries / sub-collectors to avoid unused state.
sql/ironic/schema.sql
Outdated
| `protected` TINYINT(1) NOT NULL DEFAULT '0', | ||
| `protected_reason` TEXT, | ||
| `owner` VARCHAR(255), | ||
| `allocation_id` INT, | ||
| `description` TEXT, | ||
| `retired` TINYINT(1) DEFAULT '0', |
There was a problem hiding this comment.
These TINYINT defaults are quoted as strings. MySQL will usually coerce them, but using numeric literals (DEFAULT 0) is clearer and avoids implicit type conversion.
| `protected` TINYINT(1) NOT NULL DEFAULT '0', | |
| `protected_reason` TEXT, | |
| `owner` VARCHAR(255), | |
| `allocation_id` INT, | |
| `description` TEXT, | |
| `retired` TINYINT(1) DEFAULT '0', | |
| `protected` TINYINT(1) NOT NULL DEFAULT 0, | |
| `protected_reason` TEXT, | |
| `owner` VARCHAR(255), | |
| `allocation_id` INT, | |
| `description` TEXT, | |
| `retired` TINYINT(1) DEFAULT 0, |
sql/ironic/schema.sql
Outdated
| `owner` VARCHAR(255), | ||
| `allocation_id` INT, | ||
| `description` TEXT, | ||
| `retired` TINYINT(1) DEFAULT '0', |
There was a problem hiding this comment.
These TINYINT defaults are quoted as strings. MySQL will usually coerce them, but using numeric literals (DEFAULT 0) is clearer and avoids implicit type conversion.
| `retired` TINYINT(1) DEFAULT '0', | |
| `retired` TINYINT(1) DEFAULT 0, |
Signed-off-by: Tadas Sutkaitis <tadas.sutkaitis@vexxhost.com>
Signed-off-by: Tadas Sutkaitis <tadas.sutkaitis@vexxhost.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
sql/ironic/queries.sql
Outdated
| retired, | ||
| COALESCE(retired_reason, '') as retired_reason | ||
| FROM nodes | ||
| WHERE provision_state != 'deleted' |
There was a problem hiding this comment.
provision_state is nullable in the schema, and in MySQL provision_state != 'deleted' will evaluate to NULL (and be filtered out) when provision_state is NULL. If NULL should be treated as “not deleted”, rewrite the predicate (e.g., provision_state IS NULL OR provision_state != 'deleted' or COALESCE(provision_state, '') != 'deleted') so nodes with NULL state aren’t silently dropped.
| WHERE provision_state != 'deleted' | |
| WHERE provision_state IS NULL OR provision_state != 'deleted' |
sql/ironic/queries.sql
Outdated
| WHERE provision_state != 'deleted' | ||
| ORDER BY created_at; |
There was a problem hiding this comment.
The metrics query orders by created_at, but the collector doesn’t appear to rely on deterministic ordering. This can force an unnecessary filesort on every scrape (and there’s no created_at index in the provided schema). Consider dropping the ORDER BY for scrape efficiency, or add an index if ordering is required for a specific reason.
| WHERE provision_state != 'deleted' | |
| ORDER BY created_at; | |
| WHERE provision_state != 'deleted'; |
| prometheus.BuildFQName(Namespace, Subsystem, "node"), | ||
| "Ironic node status", | ||
| []string{ | ||
| "id", "name", "power_state", "provision_state", |
There was a problem hiding this comment.
The label key id is populated with uuid (see emission below). Since this is a new metric, consider renaming the label to uuid to avoid confusion for metric consumers (and update the test expectations accordingly).
| "id", "name", "power_state", "provision_state", | |
| "uuid", "name", "power_state", "provision_state", |
There was a problem hiding this comment.
Skip this — the original exporter uses id, and we need drop-in compatibility
| // Describe implements prometheus.Collector | ||
| func (c *NodesCollector) Describe(ch chan<- *prometheus.Desc) { | ||
| ch <- c.nodeMetric | ||
| } | ||
|
|
||
| // Collect queries the database and emits node metrics. | ||
| func (c *NodesCollector) Collect(ch chan<- prometheus.Metric) error { | ||
| ctx := context.Background() | ||
|
|
||
| nodes, err := c.queries.GetNodeMetrics(ctx) | ||
| if err != nil { | ||
| c.logger.Error("failed to get node metrics", "error", err) | ||
| return err | ||
| } | ||
|
|
||
| return c.CollectFromRows(ch, nodes) | ||
| } |
There was a problem hiding this comment.
NodesCollector is documented as implementing prometheus.Collector, but its Collect method signature returns error, so it does not satisfy the interface (Collect(chan<- Metric) has no return). To reduce confusion, either (a) change it to match the Prometheus interface and handle errors internally, or (b) rename this method (e.g., CollectOnce/CollectMetrics) and adjust the comment so it’s clear it’s a helper used by the umbrella collector.
| retiredReason := truncateLabel(node.RetiredReason) | ||
|
|
||
| name := "" | ||
| if node.Name.Valid { | ||
| name = node.Name.String | ||
| } |
There was a problem hiding this comment.
Only retired_reason is truncated, but name (and potentially other free-form string labels like resource_class) can also be arbitrarily long. To enforce the stated maxLabelLength policy consistently, consider applying truncateLabel to other label values that come directly from the DB.
There was a problem hiding this comment.
Mohammed Naser (@mnaser) I think we can skip this:)
internal/collector/ironic/nodes.go
Outdated
| for _, node := range nodes { | ||
| // Skip nodes with empty UUID to avoid duplicate label sets | ||
| if !node.Uuid.Valid || node.Uuid.String == "" { | ||
| c.logger.Warn("skipping node with empty UUID") |
There was a problem hiding this comment.
This warning can spam logs on every scrape if the DB contains such rows. Consider lowering the level (e.g., Debug) and/or include an identifying field that is safe to log (e.g., truncated name) to aid troubleshooting without repeated noisy warnings.
| c.logger.Warn("skipping node with empty UUID") | |
| nodeName := "" | |
| if node.Name.Valid { | |
| nodeName = truncateLabel(node.Name.String) | |
| } | |
| c.logger.Debug("skipping node with empty UUID", "name", nodeName) |
Signed-off-by: Tadas Sutkaitis <tadas.sutkaitis@vexxhost.com>
No description provided.