feat: add sparrow_instance_info metric for ownership metadata (#354)#391
feat: add sparrow_instance_info metric for ownership metadata (#354)#391ayoub3bidi wants to merge 3 commits intotelekom:mainfrom
Conversation
|
Hi @lvlcn-t, sorry for the late response. I’ve applied the requested changes: switched metadata to a generic key/value map, removed the CLI flags, and updated the README/Helm/docs accordingly (including the header level fix). Thanks for the review. |
|
|
||
| Just write out the path to the attribute, delimited by `_`. | ||
|
|
||
| #### Instance metadata (optional) |
There was a problem hiding this comment.
please add this header to the table of contents above
docs/ownership-metadata-design.md
Outdated
|
|
||
| Sparrow exposes optional instance metadata via a dedicated Prometheus **info metric** (`sparrow_instance_info`), so operators can identify owners, route alerts correctly, and correlate metrics across multiple Sparrow deployments. | ||
|
|
||
| ## Why Option 1 (Dedicated Info Metric) |
There was a problem hiding this comment.
This heading seems out of place as there is no option 2 here, so please change it so the document can be read in isolation :D
pkg/sparrow/metrics/instance_info.go
Outdated
| func isValidLabelName(name string) bool { | ||
| return labelNamePattern.MatchString(name) | ||
| } |
There was a problem hiding this comment.
please remove the regex here. you can use the validation functions from prometheus directly:
import "github.com/prometheus/common/model"
func isValidLabelName(name string) bool {
return model.UTF8Validation.IsValidLabelName(name)
}then theres probably no need to have the wrapper function either
|
Thank you for the contributions, once the comments are addressed I think this can be merged @lvlcn-t |
lvlcn-t
left a comment
There was a problem hiding this comment.
@ayoub3bidi Thanks for fixing my main concern about the metadata object.
I think after you've addressed @niklastreml's and my minor review improvements comments, this will be ready to be merged.
@niklastreml We should probably open an issue for splitting up the README into smaller documentation files in the docs/ dir since it grew quite large now.
README.md
Outdated
|
|
||
| You can optionally configure instance metadata so operators can identify owners, route alerts, and correlate metrics across deployments. This metadata is exposed as a single Prometheus info-style metric, `sparrow_instance_info`, emitted once per instance at startup. | ||
|
|
||
| `metadata` is a map of arbitrary key-value pairs. Keys must be valid Prometheus label names (e.g. `team_name`, `platform`, `region`, `environment`). The key `instance_name` is reserved and set automatically from the Sparrow DNS name. |
There was a problem hiding this comment.
Consider rephrasing it like this:
| `metadata` is a map of arbitrary key-value pairs. Keys must be valid Prometheus label names (e.g. `team_name`, `platform`, `region`, `environment`). The key `instance_name` is reserved and set automatically from the Sparrow DNS name. | |
| `metadata` is a map of arbitrary key-value pairs. Keys must be valid Prometheus label names (e.g. `team_name`, `platform`, `region`, `environment`). The key `instance_name` is reserved and automatically set to the sparrow's name. |
| var found bool | ||
| for _, mf := range metrics { | ||
| if mf.GetName() == instanceInfoMetricName { | ||
| found = true | ||
| if len(mf.GetMetric()) != 1 { | ||
| t.Errorf("expected 1 metric, got %d", len(mf.GetMetric())) | ||
| } | ||
| for _, m := range mf.GetMetric() { | ||
| if m.GetGauge().GetValue() != 1 { | ||
| t.Errorf("expected value 1, got %v", m.GetGauge().GetValue()) | ||
| } | ||
| labels := make(map[string]string) | ||
| for _, lp := range m.GetLabel() { | ||
| labels[lp.GetName()] = lp.GetValue() | ||
| } | ||
| if labels["instance_name"] != "sparrow.example.com" || labels["team_name"] != "platform-team" || | ||
| labels["team_email"] != "platform@example.com" || labels["platform"] != "k8s-prod-eu" { | ||
| t.Errorf("unexpected labels: %v", labels) | ||
| } | ||
| } | ||
| break | ||
| } | ||
| } |
There was a problem hiding this comment.
You can simplify this a lot by flattening for loops:
found := false
for _, mf := range metrics {
if mf.GetName() != instanceInfoMetricName {
continue
}
found = true
if len(mf.GetMetric()) != 1 {
t.Errorf("expected 1 metric, got %d", len(mf.GetMetric()))
}
const expectedValue = 1
for _, m := range mf.GetMetric() {
if m.GetGauge().GetValue() != expectedValue {
t.Errorf("%q metric value expected %d, got %f", instanceInfoMetricName, expectedValue, m.GetGauge().GetValue())
}
labels := make(map[string]string)
for _, lp := range m.GetLabel() {
labels[lp.GetName()] = lp.GetValue()
}
if !maps.Equal(expectedLabels, labels) {
t.Errorf("expected labels %v, got %v", expectedLabels, labels)
}
}
}Please also take a look at the other loops in these tests, the same can probably be applied to them as well.
…ata documentation
|
Hi @lvlcn-t and @niklastreml, thanks for the reviews. All comments have been addressed:
Tests are passing. I’ll open an issue to track splitting the README into smaller docs. |
Motivation
Operators need to identify which team owns each Sparrow instance and route alerts accordingly. This PR exposes optional ownership and platform metadata as a single Prometheus info metric (
sparrow_instance_info) so multi-team deployments can correlate metrics and route by team/platform without changing existing check metrics.Changes
metadatablock under startup config:metadata.team.name,metadata.team.email,metadata.platform. All fields optional; omitted values appear as empty labels.--metadataTeamName,--metadataTeamEmail,--metadataPlatform; env varsSPARROW_METADATA_TEAM_NAME, etc.sparrow_instance_info{team_name, team_email, platform, instance_name}with value 1, registered once at startup insparrow.New(). Registration failure is logged and non-fatal.chart/values.yamldocuments optional metadata undersparrowConfig; backward compatible when omitted.docs/sparrow_run.md(flags), design doc indocs/ownership-metadata-design.md.For additional information look at the commits.
Tests done
Unit tests in
pkg/sparrow/metrics/instance_info_test.go: full metadata, empty metadata, partial metadata, double registration (AlreadyRegisteredError). Existingpkg/sparrowandpkg/configtests pass. Helm template validated with default and metadata values.Unit tests succeeded
E2E tests succeeded
TODO