Skip to content

feat: add sparrow_instance_info metric for ownership metadata (#354)#391

Open
ayoub3bidi wants to merge 3 commits intotelekom:mainfrom
ayoub3bidi:feat/354-ownership-metadata
Open

feat: add sparrow_instance_info metric for ownership metadata (#354)#391
ayoub3bidi wants to merge 3 commits intotelekom:mainfrom
ayoub3bidi:feat/354-ownership-metadata

Conversation

@ayoub3bidi
Copy link

Motivation

Operators need to identify which team owns each Sparrow instance and route alerts accordingly. This PR exposes optional ownership and platform metadata as a single Prometheus info metric (sparrow_instance_info) so multi-team deployments can correlate metrics and route by team/platform without changing existing check metrics.

Changes

  • Config: New optional metadata block under startup config: metadata.team.name, metadata.team.email, metadata.platform. All fields optional; omitted values appear as empty labels.
  • CLI: Flags --metadataTeamName, --metadataTeamEmail, --metadataPlatform; env vars SPARROW_METADATA_TEAM_NAME, etc.
  • Metric: Single gauge sparrow_instance_info{team_name, team_email, platform, instance_name} with value 1, registered once at startup in sparrow.New(). Registration failure is logged and non-fatal.
  • Helm: chart/values.yaml documents optional metadata under sparrowConfig; backward compatible when omitted.
  • Docs: README (config table, instance info metric, PromQL examples), docs/sparrow_run.md (flags), design doc in docs/ownership-metadata-design.md.

For additional information look at the commits.

Tests done

  • Unit tests in pkg/sparrow/metrics/instance_info_test.go: full metadata, empty metadata, partial metadata, double registration (AlreadyRegisteredError). Existing pkg/sparrow and pkg/config tests pass. Helm template validated with default and metadata values.

  • Unit tests succeeded

  • E2E tests succeeded

TODO

  • I've assigned this PR to myself
  • I've labeled this PR correctly

@lvlcn-t lvlcn-t added feature Introduces a new feature area/config Issues/PRs related to the startup/sparrow config area area/metrics Issues/PRs related to metrics labels Feb 11, 2026
@ayoub3bidi
Copy link
Author

Hi @lvlcn-t, sorry for the late response. I’ve applied the requested changes: switched metadata to a generic key/value map, removed the CLI flags, and updated the README/Helm/docs accordingly (including the header level fix). Thanks for the review.


Just write out the path to the attribute, delimited by `_`.

#### Instance metadata (optional)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add this header to the table of contents above


Sparrow exposes optional instance metadata via a dedicated Prometheus **info metric** (`sparrow_instance_info`), so operators can identify owners, route alerts correctly, and correlate metrics across multiple Sparrow deployments.

## Why Option 1 (Dedicated Info Metric)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This heading seems out of place as there is no option 2 here, so please change it so the document can be read in isolation :D

Comment on lines +64 to +66
func isValidLabelName(name string) bool {
return labelNamePattern.MatchString(name)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove the regex here. you can use the validation functions from prometheus directly:

import "github.com/prometheus/common/model"

func isValidLabelName(name string) bool {
    return model.UTF8Validation.IsValidLabelName(name)
}

then theres probably no need to have the wrapper function either

@niklastreml
Copy link
Member

Thank you for the contributions, once the comments are addressed I think this can be merged @lvlcn-t

Copy link
Collaborator

@lvlcn-t lvlcn-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ayoub3bidi Thanks for fixing my main concern about the metadata object.

I think after you've addressed @niklastreml's and my minor review improvements comments, this will be ready to be merged.

@niklastreml We should probably open an issue for splitting up the README into smaller documentation files in the docs/ dir since it grew quite large now.

README.md Outdated

You can optionally configure instance metadata so operators can identify owners, route alerts, and correlate metrics across deployments. This metadata is exposed as a single Prometheus info-style metric, `sparrow_instance_info`, emitted once per instance at startup.

`metadata` is a map of arbitrary key-value pairs. Keys must be valid Prometheus label names (e.g. `team_name`, `platform`, `region`, `environment`). The key `instance_name` is reserved and set automatically from the Sparrow DNS name.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider rephrasing it like this:

Suggested change
`metadata` is a map of arbitrary key-value pairs. Keys must be valid Prometheus label names (e.g. `team_name`, `platform`, `region`, `environment`). The key `instance_name` is reserved and set automatically from the Sparrow DNS name.
`metadata` is a map of arbitrary key-value pairs. Keys must be valid Prometheus label names (e.g. `team_name`, `platform`, `region`, `environment`). The key `instance_name` is reserved and automatically set to the sparrow's name.

Comment on lines +31 to +53
var found bool
for _, mf := range metrics {
if mf.GetName() == instanceInfoMetricName {
found = true
if len(mf.GetMetric()) != 1 {
t.Errorf("expected 1 metric, got %d", len(mf.GetMetric()))
}
for _, m := range mf.GetMetric() {
if m.GetGauge().GetValue() != 1 {
t.Errorf("expected value 1, got %v", m.GetGauge().GetValue())
}
labels := make(map[string]string)
for _, lp := range m.GetLabel() {
labels[lp.GetName()] = lp.GetValue()
}
if labels["instance_name"] != "sparrow.example.com" || labels["team_name"] != "platform-team" ||
labels["team_email"] != "platform@example.com" || labels["platform"] != "k8s-prod-eu" {
t.Errorf("unexpected labels: %v", labels)
}
}
break
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simplify this a lot by flattening for loops:

	found := false
	for _, mf := range metrics {
		if mf.GetName() != instanceInfoMetricName {
			continue
		}
		found = true
		
		if len(mf.GetMetric()) != 1 {
			t.Errorf("expected 1 metric, got %d", len(mf.GetMetric()))
		}

		const expectedValue = 1
		for _, m := range mf.GetMetric() {
			if m.GetGauge().GetValue() != expectedValue {
				t.Errorf("%q metric value expected %d, got %f", instanceInfoMetricName, expectedValue, m.GetGauge().GetValue())
			}

			labels := make(map[string]string)
			for _, lp := range m.GetLabel() {
				labels[lp.GetName()] = lp.GetValue()
			}
			if !maps.Equal(expectedLabels, labels) {
				t.Errorf("expected labels %v, got %v", expectedLabels, labels)
			}
		}
	}

Please also take a look at the other loops in these tests, the same can probably be applied to them as well.

@ayoub3bidi
Copy link
Author

Hi @lvlcn-t and @niklastreml, thanks for the reviews. All comments have been addressed:

  • README: added the Instance metadata section to the table of contents
  • Design doc: renamed the heading to "Why a Dedicated Info Metric"
  • instance_info.go: replaced the custom regex with model.UTF8Validation.IsValidLabelName from prometheus/common
  • instance_info_test.go: simplified the loops, introduced expectedLabels and maps.Equal, and improved error messages

Tests are passing. I’ll open an issue to track splitting the README into smaller docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/config Issues/PRs related to the startup/sparrow config area area/metrics Issues/PRs related to metrics feature Introduces a new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants