-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Labels
area/checksIssues/PRs related to ChecksIssues/PRs related to ChecksbugSomething isn't workingSomething isn't workingrefactoringRefactoring of existing codeRefactoring of existing code
Description
Problem to investigate & solve
Currently, 3 different latency metrics are available.
- Counter
- Latency time
- Histogram
If the health check fails (internally) the latency time will be 0. The status code as well.
This might be ok for the counter and latency metrics but might be not the best practice for the histogram. The buckets will be filled.
Example with 2 errors and 308 total requests:
# HELP sparrow_latency_duration Latency of targets in seconds
# TYPE sparrow_latency_duration histogram
sparrow_latency_duration_bucket{target="https://gitlab.devops.telekom.de",le="0.005"} **2**
sparrow_latency_duration_bucket{target="https://gitlab.devops.telekom.de",le="0.01"} **2**
sparrow_latency_duration_bucket{target="https://gitlab.devops.telekom.de",le="0.025"} **2**
sparrow_latency_duration_bucket{target="https://gitlab.devops.telekom.de",le="0.05"} **2**
sparrow_latency_duration_bucket{target="https://gitlab.devops.telekom.de",le="0.1"} **2**
sparrow_latency_duration_bucket{target="https://gitlab.devops.telekom.de",le="0.25"} **2**
sparrow_latency_duration_bucket{target="https://gitlab.devops.telekom.de",le="0.5"} 288
sparrow_latency_duration_bucket{target="https://gitlab.devops.telekom.de",le="1"} 307
sparrow_latency_duration_bucket{target="https://gitlab.devops.telekom.de",le="2.5"} 308
sparrow_latency_duration_bucket{target="https://gitlab.devops.telekom.de",le="5"} 308
sparrow_latency_duration_bucket{target="https://gitlab.devops.telekom.de",le="10"} 308
sparrow_latency_duration_bucket{target="https://gitlab.devops.telekom.de",le="+Inf"} 308
sparrow_latency_duration_sum{target="https://gitlab.devops.telekom.de"} 120.39378972299998
sparrow_latency_duration_count{target="https://gitlab.devops.telekom.de"} 308
As @puffitos stated in #45 we should probably solve this with labelling or another set of metrics. E.g. label for the checks state.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area/checksIssues/PRs related to ChecksIssues/PRs related to ChecksbugSomething isn't workingSomething isn't workingrefactoringRefactoring of existing codeRefactoring of existing code