NETOBSERV-2657: Allow external/contributed recording rules in Network Health#2500
NETOBSERV-2657: Allow external/contributed recording rules in Network Health#2500leandroberetta wants to merge 1 commit intonetobserv:mainfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2500 +/- ##
==========================================
- Coverage 72.39% 72.07% -0.33%
==========================================
Files 105 105
Lines 10851 10889 +38
==========================================
- Hits 7856 7848 -8
- Misses 2518 2557 +39
- Partials 477 484 +7
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
@leandroberetta: This pull request references NETOBSERV-2657 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
memodi
left a comment
There was a problem hiding this comment.
Hi @leandroberetta - few questions/comments:
- could you add documentation for this feature describing the example and the fields that are necessary ?
- It looks like users can add recording for any metric they would like and it would show up on Network Health page - my concerns is - netobserv NS is deleted when flowcollector is deleted and users would lose their rules, I wonder how do we make users aware of this?
- do we provide hint in UI to differentiate user defined rules?
| list := &monitoringv1.PrometheusRuleList{} | ||
| if err := cl.List(ctx, list, client.MatchingLabels{"netobserv": "true"}); err != nil { | ||
| log.FromContext(ctx).Error(err, "Failed to list PrometheusRules for recording annotations") | ||
| return out |
There was a problem hiding this comment.
when there are transient issues (API request failing etc.) retrieving the rules, it would overwrite existing rules with empty recording rules?
There was a problem hiding this comment.
Good catch, I'll fix it.
| raw, hasAnnot := pr.Annotations[recordingAnnotationsAnnotation] | ||
| if hasAnnot && raw != "" { | ||
| var perRule map[string]map[string]string | ||
| if err := json.Unmarshal([]byte(raw), &perRule); err != nil { |
There was a problem hiding this comment.
is simply validating JSON is sufficient? do we want to validate specific fields?
There was a problem hiding this comment.
I think is ok, but we can improve this on a follow up PR is your ok.
|
New changes are detected. LGTM label has been removed. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Hi @memodi, sorry for the delay and thanks for the review.
I added the documentation to the HealthRules.md
PrometheusRules can be on any namespace. I added a mention of this in the docs.
We aren't today. I will create an issue for that if you're ok. |
|
@memodi in the description of the issue you have a simple HealthRule which you can update to play with the thresholds, the recording rules will appear in the Namespaces tab. |
|
Hi @leandroberetta - thanks for addressing feedback. could you resolve merge conflicts and add |
9fa9368 to
de2419d
Compare
|
/ok-to-test |
|
New images: quay.io/netobserv/network-observability-operator:ff365a2
quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-ff365a2
quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-ff365a2They will expire in two weeks. To deploy this build: # Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:ff365a2 make deploy
# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-ff365a2Or as a Catalog Source: apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
name: netobserv-dev
namespace: openshift-marketplace
spec:
sourceType: grpc
image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-ff365a2
displayName: NetObserv development catalog
publisher: Me
updateStrategy:
registryPoll:
interval: 1m |
|
/ok-to-test |
87c26ac to
70f4b09
Compare
|
/ok-to-test |
docs/HealthRules.md
Outdated
|
|
||
| ### Lifecycle and namespace | ||
|
|
||
| Custom PrometheusRules are **not** owned by the FlowCollector. If you put them in the **NetObserv namespace** (e.g. `netobserv`), that namespace may be removed when the FlowCollector is uninstalled (depending on install mode), and your rules would be deleted. To avoid that, create your PrometheusRules in **another namespace** (e.g. `monitoring` or a dedicated `netobserv-rules`); they will still be picked up cluster-wide. If you do keep rules in the NetObserv namespace, back them up (e.g. in Git) and re-apply after reinstalling. |
There was a problem hiding this comment.
I think we should add that those ns will need:
openshift.io/cluster-monitoring: "true" label
There was a problem hiding this comment.
That annotation is used for letting know Prometheus that that namespace should be scraped. That is not required, but I think the example in the description was outdated and for that reason, you didn't see any rule in the page.
I've updated the docs to improve and add more clarity to the requirements.
A label netobserv true was missing and also the annotaion was renamed some commits ago (now is network-health). Sorry @memodi !
|
@leandroberetta: This pull request references NETOBSERV-2657 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@leandroberetta: This pull request references NETOBSERV-2657 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
64d29d5 to
05b1033
Compare
|
/ok-to-test |
|
New images: quay.io/netobserv/network-observability-operator:305b85f
quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-305b85f
quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-305b85fThey will expire in two weeks. To deploy this build: # Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:305b85f make deploy
# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-305b85fOr as a Catalog Source: apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
name: netobserv-dev
namespace: openshift-marketplace
spec:
sourceType: grpc
image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-305b85f
displayName: NetObserv development catalog
publisher: Me
updateStrategy:
registryPoll:
interval: 1m |
|
/label qe-approved thanks @leandroberetta |
|
@leandroberetta: This pull request references NETOBSERV-2657 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Description
PrometheusRule CRD won't allow to have annotations in rules of type "record", for this reason, today is not possible to for third party users to bring new recording rules, just alerts. This PR changes that, allowing users to bring their own recording rules using the following convention:
Dependencies
netobserv/netobserv-web-console#1311
Checklist
If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.