-
Notifications
You must be signed in to change notification settings - Fork 675
feat(exporters): replace opencensus with opentelemetry #1141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
* feat(exporters): replace opencensus with opentelemetry * fix(stackdriver_exporter): use metrics util constants * chore: use global meter and init * feat(exporters): add resource info, use global meter * chore: update deps * fix: use a gauge directly, narrow scope info * cleanup: deps * fix: try initOnce pattern for global problem metrics * fix(problemmetrics): init is being called too late * fix(gauge): remove _ratio suffix * fix: clean up after rebase * fix: cleanup after rebase * fix: cleanup after rebase * fix: cleanup after rebase * fix: remove nodename * fix(test): remove instance id check * fix(test): add attr values * fix(test): unexported resources * fix(test): initialize global problem metrics manager * fix(lint): make the linter happy
Hi @daveoy. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: daveoy The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@wangzhen127 can i get an /ok-to-test please |
/ok-to-test |
e2e test is complaining about the presence of otel's default |
@hakman @wangzhen127 looks like we're ready for review over here be gentle |
Can we get a review? |
@daveoy Sorry, but I am not very familiar with this area. I can give it a try, but probably @wangzhen127 or someone from Google has better context. |
Thanks for uploading this PR! Sorry for the delay. Got busy in the past week. Will try to get to this within this week. |
@dashpole can you help review or suggest somebody? |
I would think we should start testing OpenTelemetry metrics with OpenTelemetry collector. It should be reasonably easy to integrate to e2e. @dashpole are there a good examples where OTel colelctor is integrated in e2e tests? We may also eliminate the Stackdriver portion since Otel protocol must be reasonalbly universal |
@@ -0,0 +1,52 @@ | |||
/* | |||
Copyright 2019 The Kubernetes Authors All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated: we may need to eliminate the year completely
go.opentelemetry.io/otel/trace v1.36.0 // indirect | ||
go.opentelemetry.io/otel/trace v1.37.0 // indirect | ||
go.yaml.in/yaml/v2 v2.4.2 // indirect | ||
go.yaml.in/yaml/v3 v3.0.4 // indirect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought one of these will be eliminated after opencensus removed. Are there more dependencies on older one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe I was thinking about the gopkg.in/yaml
Adding @rkolchmeyer as well |
Not sure if you've seen https://opentelemetry.io/docs/specs/otel/compatibility/opencensus/#migration-path, but that might be helpful. |
You can test OTLP in an integration test similar to this, but for metrics: https://github.com/kubernetes/kubernetes/blob/9624c3dcdc9a9e1286e8ca32d07d231e69ed2f0c/test/integration/apiserver/tracing/tracing_test.go#L354. Not sure if integrating the OTel collector will be easier or not. |
For unit tests, consider using go.opentelemetry.io/otel/sdk/metric/metricdata/metricdatatest. The test can set up an SDK with a |
If you need to write integration tests for google cloud monitoring, you can do something similar to https://github.com/GoogleCloudPlatform/opentelemetry-operations-go/blob/7130d1aded77c51f613a9d949f166d119cd595e9/internal/cloudmock/metrics.go |
} | ||
|
||
// MeterName is the standard meter name used across the application | ||
const MeterName = "node-problem-detector" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually recommend that the meter name is the package name where metrics are defined. But given this is used as an implementation of your metric interface, that might not make sense.
Overall, this looks correct to me. With better testing of the outputs (ideally done as a separate PR prior to this migration PR), including prometheus and GCM you could do this migration with higher confidence that it isn't breaking anything |
thank you all for your comments, keep them coming! i should mention that this is running in my environment and the problemdaemons and metrics (problem_counter and problem_gauge) are operating as expected. i'll see if i can mock the Google cloud monitoring stuff for higher confidence on the stackdriver side. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like there are issues on GCE:
rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: timeSeries[57] (metric.type="compute.googleapis.com/guest/disk/operation_bytes_count", metric.labels={"direction": "write", "device_name"
: "sda8", "service_name": "node-problem-detector"}): unrecognized metric labels [service_name];
and that error repeats for a bunch of other metrics. I guess this PR adds a new "service_name" label that Cloud Monitoring doesn't expect. Is that label necessary for all opentelemetry clients, or can node-problem-detector avoid sending that on GCE?
Record(labelValues map[string]string, value int64) error | ||
} | ||
|
||
// NewInt64Metric create a Int64Metric metric, returns nil when viewName is empty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the "returns nil when viewName is empty" behavior wasn't retained - the program exits in this situation in the new implementation. Would it make sense to keep this behavior? Same for the float64 metric.
YOu will want to add the |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
should close #1008
to note: