-
Notifications
You must be signed in to change notification settings - Fork 141
feat(@temporalio/interceptors-opentelemetry): implement all interceptors #1835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(@temporalio/interceptors-opentelemetry): implement all interceptors #1835
Conversation
dd68680 to
b9e320c
Compare
| span.setAttribute(RUN_ID_ATTR_KEY, input.workflowExecution.runId); | ||
| } | ||
| if (input.reason) { | ||
| span.setAttribute(TERMINATE_REASON_ATTR_KEY, input.reason); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsure if this is useful information to stuff in the span by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arguably, tracing value is very limited on terminate, as that call will not be relayed to a workflow worker, and server is not emitting OTel spans. But still, trace will show the outbound grpc call, which can certainly turn out to be helpful in some cases, and then the termination reason could maybe be pertinent. I'm ok with that.
| span.setAttribute(NEXUS_SERVICE_ATTR_KEY, input.service); | ||
| span.setAttribute(NEXUS_OPERATION_ATTR_KEY, input.operation); | ||
| span.setAttribute(NEXUS_ENDPOINT_ATTR_KEY, input.endpoint); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is too much info to add by default
| export const WORKFLOW_ID_ATTR_KEY = 'temporal_workflow_id'; | ||
| /** As in activity id */ | ||
| export const ACTIVITY_ID_ATTR_KEY = 'temporal_activity_id'; | ||
| /** As in update id */ | ||
| export const UPDATE_ID_ATTR_KEY = 'temporal_update_id'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These differ from Ruby attributes which are camel cased
| /** Default trace header for opentelemetry interceptors */ | ||
| export const TRACE_HEADER = '_tracer-data'; | ||
| /** As in workflow run id */ | ||
| export const RUN_ID_ATTR_KEY = 'run_id'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ruby OTEL uses temporalRunId
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh? Have you compared with other SDKs, beside Ruby? We'd ideally want cross-SDKs compatibility of OTel tracing, as a customer could be operating different languages within a single Temporal application.
Not saying we'll prioritize, of course, but at the very least we should settle on what names we want to normalize to across the board, so that we eventually converge to something consistent.
| } | ||
| } | ||
|
|
||
| function handleError(err: any, span: otel.Span, acceptableErrors?: (err: unknown) => boolean): void { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we think of another verb than handle here? Something that better communicates the fact that the function will be "encoding/attaching error details to the span", rather than "handling the error situation itself".
| ensureWorkflowModuleLoaded(); | ||
| } | ||
|
|
||
| public async startTimer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we tracing Timers in any other SDK?
| * | ||
| * @since Introduced in 1.13.3 | ||
| */ | ||
| OpenTelemetryInterceptorsInstrumentsAllMethods: defineFlag(6, true, [isAtLeast({ major: 1, minor: 13, patch: 3 })]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, there's no need for the sdk-version-based alternate condition, as the numeric flag will be properly emited anyway. Arguably not a big deal to have the alt condition anyway, but if we always do, then we'll fall into the situation where the "alternative" is actually the "primary"...
What was changed
Add OTEL spans for all interceptors provided by the interface.
Why?
As discovered in #1677 these can end up causing NDE errors if replaying an old history without these interceptors. Adding these all at once makes gating their usage far easier.
Checklist
Closes [Feature Request] Add missing hooks on OTel interceptors #1678
How was this tested:
Doc comments should suffice