Skip to content

Comments

feat: add ironic metrics#53

Open
Tadas Sutkaitis (fitbeard) wants to merge 6 commits intomainfrom
feat/add_ironic
Open

feat: add ironic metrics#53
Tadas Sutkaitis (fitbeard) wants to merge 6 commits intomainfrom
feat/add_ironic

Conversation

@fitbeard
Copy link
Contributor

No description provided.

Signed-off-by: Tadas Sutkaitis <tadas.sutkaitis@vexxhost.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Prometheus metrics collection for OpenStack Ironic by generating a new sqlc-backed DB access layer and wiring a new Ironic collector into the exporter configuration/CLI.

Changes:

  • Added sqlc configuration + schema/queries for an Ironic MySQL database.
  • Introduced an Ironic “baremetal” umbrella collector and a nodes sub-collector that exports openstack_ironic_node and openstack_ironic_up.
  • Wired Ironic database URL through config and CLI flags; added collector tests with sqlmock.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
sqlc.yaml Adds sqlc generation target for Ironic MySQL schema/queries.
sql/ironic/schema.sql Defines nodes table schema for sqlc type generation.
sql/ironic/queries.sql Adds GetNodeMetrics query used by collectors.
internal/db/ironic/db.go Generated sqlc DBTX + Queries wrapper for Ironic.
internal/db/ironic/models.go Generated sqlc Node model from schema.
internal/db/ironic/queries.sql.go Generated sqlc query + row struct for GetNodeMetrics.
internal/collector/ironic/ironic.go Registers Ironic collectors when DB URL is configured.
internal/collector/ironic/baremetal.go Umbrella collector exporting openstack_ironic_up and delegating to sub-collectors.
internal/collector/ironic/nodes.go Exports per-node “info-style” metric with labels.
internal/collector/ironic/baremetal_test.go Tests “up” and node metrics with sqlmock.
internal/collector/collector.go Adds Ironic to global registry + config struct.
cmd/openstack-database-exporter/main.go Adds --ironic.database-url flag/env and passes into config.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 60 to 63
// Test database connectivity by running a simple query
_, err := c.queries.GetNodeMetrics(ctx)
if err != nil {
c.logger.Error("failed to query Ironic database", "error", err)
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetNodeMetrics appears to fetch (and sort) the full nodes dataset, so using it as the 'up' probe makes each scrape do an expensive query twice (once here, once in NodesCollector). Consider using a lightweight probe query (e.g., SELECT 1 / LIMIT 1) or reusing the nodes query result to both determine up and emit node metrics.

Suggested change
// Test database connectivity by running a simple query
_, err := c.queries.GetNodeMetrics(ctx)
if err != nil {
c.logger.Error("failed to query Ironic database", "error", err)
// Test database connectivity using a lightweight ping
if err := c.db.PingContext(ctx); err != nil {
c.logger.Error("failed to ping Ironic database", "error", err)

Copilot uses AI. Check for mistakes.
Comment on lines 99 to 110
nodeUUID := ""
if node.Uuid.Valid {
nodeUUID = node.Uuid.String
}

// Emit individual node metric
metric, err := prometheus.NewConstMetric(
c.nodeMetric,
prometheus.GaugeValue,
1,
nodeUUID, name, powerState, provisionState, resourceClass, maintenance, consoleEnabled, retired, retiredReason,
)
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If uuid is NULL/empty for more than one row, this will emit multiple samples with identical labelsets (e.g., id=\"\"), which can cause Prometheus client gathering to fail with a duplicate-metrics error. To prevent scrape failures, ensure the selected identifier label is always unique (e.g., filter out rows where uuid IS NULL, skip rows with invalid UUIDs, or use a guaranteed-unique column like the numeric primary key for the exported label).

Copilot uses AI. Check for mistakes.
Comment on lines +36 to +39
[]string{
"id", "name", "power_state", "provision_state",
"resource_class", "maintenance", "console_enabled", "retired", "retired_reason",
},
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retired_reason is free-form text and can be high-cardinality / unbounded-length, which is strongly discouraged for Prometheus labels and can cause memory blowups and slow scrapes. Consider removing retired_reason as a label (or mapping it to a bounded set), and exposing it via logs or an alternate mechanism; keeping only bounded-state labels (or emitting separate state metrics) will be safer.

Copilot uses AI. Check for mistakes.
Comment on lines 45 to 51
// Describe implements prometheus.Collector
func (c *NodesCollector) Describe(ch chan<- *prometheus.Desc) {
ch <- c.nodeMetric
}

// Collect implements prometheus.Collector
func (c *NodesCollector) Collect(ch chan<- prometheus.Metric) error {
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NodesCollector does not implement prometheus.Collector because Collect returns an error (the interface requires Collect(chan<- Metric) with no return value). Either update the comment to avoid claiming interface conformance, or change the method signature/structure so it actually implements prometheus.Collector.

Copilot uses AI. Check for mistakes.
Comment on lines 13 to 16
type NodesCollector struct {
db *sql.DB
queries *ironicdb.Queries
logger *slog.Logger
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this file, db *sql.DB is stored on the struct but never used. Dropping the field (and only keeping queries) would reduce surface area and avoid confusion about which DB handle is intended to be used.

Copilot uses AI. Check for mistakes.
Comment on lines 14 to 17
type BaremetalCollector struct {
db *sql.DB
queries *ironicdb.Queries
logger *slog.Logger
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this file, db *sql.DB is stored on the struct but not referenced afterwards. Consider removing it and relying on queries / sub-collectors to avoid unused state.

Copilot uses AI. Check for mistakes.
Comment on lines 50 to 55
`protected` TINYINT(1) NOT NULL DEFAULT '0',
`protected_reason` TEXT,
`owner` VARCHAR(255),
`allocation_id` INT,
`description` TEXT,
`retired` TINYINT(1) DEFAULT '0',
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These TINYINT defaults are quoted as strings. MySQL will usually coerce them, but using numeric literals (DEFAULT 0) is clearer and avoids implicit type conversion.

Suggested change
`protected` TINYINT(1) NOT NULL DEFAULT '0',
`protected_reason` TEXT,
`owner` VARCHAR(255),
`allocation_id` INT,
`description` TEXT,
`retired` TINYINT(1) DEFAULT '0',
`protected` TINYINT(1) NOT NULL DEFAULT 0,
`protected_reason` TEXT,
`owner` VARCHAR(255),
`allocation_id` INT,
`description` TEXT,
`retired` TINYINT(1) DEFAULT 0,

Copilot uses AI. Check for mistakes.
`owner` VARCHAR(255),
`allocation_id` INT,
`description` TEXT,
`retired` TINYINT(1) DEFAULT '0',
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These TINYINT defaults are quoted as strings. MySQL will usually coerce them, but using numeric literals (DEFAULT 0) is clearer and avoids implicit type conversion.

Suggested change
`retired` TINYINT(1) DEFAULT '0',
`retired` TINYINT(1) DEFAULT 0,

Copilot uses AI. Check for mistakes.
Signed-off-by: Tadas Sutkaitis <tadas.sutkaitis@vexxhost.com>
Signed-off-by: Tadas Sutkaitis <tadas.sutkaitis@vexxhost.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

retired,
COALESCE(retired_reason, '') as retired_reason
FROM nodes
WHERE provision_state != 'deleted'
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

provision_state is nullable in the schema, and in MySQL provision_state != 'deleted' will evaluate to NULL (and be filtered out) when provision_state is NULL. If NULL should be treated as “not deleted”, rewrite the predicate (e.g., provision_state IS NULL OR provision_state != 'deleted' or COALESCE(provision_state, '') != 'deleted') so nodes with NULL state aren’t silently dropped.

Suggested change
WHERE provision_state != 'deleted'
WHERE provision_state IS NULL OR provision_state != 'deleted'

Copilot uses AI. Check for mistakes.
Comment on lines 13 to 14
WHERE provision_state != 'deleted'
ORDER BY created_at;
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metrics query orders by created_at, but the collector doesn’t appear to rely on deterministic ordering. This can force an unnecessary filesort on every scrape (and there’s no created_at index in the provided schema). Consider dropping the ORDER BY for scrape efficiency, or add an index if ordering is required for a specific reason.

Suggested change
WHERE provision_state != 'deleted'
ORDER BY created_at;
WHERE provision_state != 'deleted';

Copilot uses AI. Check for mistakes.
prometheus.BuildFQName(Namespace, Subsystem, "node"),
"Ironic node status",
[]string{
"id", "name", "power_state", "provision_state",
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The label key id is populated with uuid (see emission below). Since this is a new metric, consider renaming the label to uuid to avoid confusion for metric consumers (and update the test expectations accordingly).

Suggested change
"id", "name", "power_state", "provision_state",
"uuid", "name", "power_state", "provision_state",

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skip this — the original exporter uses id, and we need drop-in compatibility

Comment on lines 55 to 71
// Describe implements prometheus.Collector
func (c *NodesCollector) Describe(ch chan<- *prometheus.Desc) {
ch <- c.nodeMetric
}

// Collect queries the database and emits node metrics.
func (c *NodesCollector) Collect(ch chan<- prometheus.Metric) error {
ctx := context.Background()

nodes, err := c.queries.GetNodeMetrics(ctx)
if err != nil {
c.logger.Error("failed to get node metrics", "error", err)
return err
}

return c.CollectFromRows(ch, nodes)
}
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NodesCollector is documented as implementing prometheus.Collector, but its Collect method signature returns error, so it does not satisfy the interface (Collect(chan<- Metric) has no return). To reduce confusion, either (a) change it to match the Prometheus interface and handle errors internally, or (b) rename this method (e.g., CollectOnce/CollectMetrics) and adjust the comment so it’s clear it’s a helper used by the umbrella collector.

Copilot uses AI. Check for mistakes.
Comment on lines +112 to +117
retiredReason := truncateLabel(node.RetiredReason)

name := ""
if node.Name.Valid {
name = node.Name.String
}
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only retired_reason is truncated, but name (and potentially other free-form string labels like resource_class) can also be arbitrarily long. To enforce the stated maxLabelLength policy consistently, consider applying truncateLabel to other label values that come directly from the DB.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mohammed Naser (@mnaser) I think we can skip this:)

for _, node := range nodes {
// Skip nodes with empty UUID to avoid duplicate label sets
if !node.Uuid.Valid || node.Uuid.String == "" {
c.logger.Warn("skipping node with empty UUID")
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning can spam logs on every scrape if the DB contains such rows. Consider lowering the level (e.g., Debug) and/or include an identifying field that is safe to log (e.g., truncated name) to aid troubleshooting without repeated noisy warnings.

Suggested change
c.logger.Warn("skipping node with empty UUID")
nodeName := ""
if node.Name.Valid {
nodeName = truncateLabel(node.Name.String)
}
c.logger.Debug("skipping node with empty UUID", "name", nodeName)

Copilot uses AI. Check for mistakes.
Signed-off-by: Tadas Sutkaitis <tadas.sutkaitis@vexxhost.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant