Skip to content

Conversation

@chris-olszewski
Copy link
Member

What was changed

Add OTEL spans for all interceptors provided by the interface.

Why?

As discovered in #1677 these can end up causing NDE errors if replaying an old history without these interceptors. Adding these all at once makes gating their usage far easier.

Checklist

  1. Closes [Feature Request] Add missing hooks on OTel interceptors #1678

  2. How was this tested:

  • Existing replay tests
  • Added a replay test of smorgasbord history from 1.13.2
  • Updated span test to check for new spans added
  • Added additional test to verify update with start behavior stays intact when used with OTEL interceptors
  1. Any docs updates needed?
    Doc comments should suffice

@chris-olszewski chris-olszewski force-pushed the olszewski/feat_implement_all_otel_interceptors branch from dd68680 to b9e320c Compare November 21, 2025 19:52
span.setAttribute(RUN_ID_ATTR_KEY, input.workflowExecution.runId);
}
if (input.reason) {
span.setAttribute(TERMINATE_REASON_ATTR_KEY, input.reason);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure if this is useful information to stuff in the span by default

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably, tracing value is very limited on terminate, as that call will not be relayed to a workflow worker, and server is not emitting OTel spans. But still, trace will show the outbound grpc call, which can certainly turn out to be helpful in some cases, and then the termination reason could maybe be pertinent. I'm ok with that.

Comment on lines +239 to +241
span.setAttribute(NEXUS_SERVICE_ATTR_KEY, input.service);
span.setAttribute(NEXUS_OPERATION_ATTR_KEY, input.operation);
span.setAttribute(NEXUS_ENDPOINT_ATTR_KEY, input.endpoint);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is too much info to add by default

@chris-olszewski chris-olszewski marked this pull request as ready for review November 24, 2025 14:15
@chris-olszewski chris-olszewski requested a review from a team as a code owner November 24, 2025 14:15
@chris-olszewski chris-olszewski marked this pull request as draft November 26, 2025 21:10
Comment on lines +18 to +22
export const WORKFLOW_ID_ATTR_KEY = 'temporal_workflow_id';
/** As in activity id */
export const ACTIVITY_ID_ATTR_KEY = 'temporal_activity_id';
/** As in update id */
export const UPDATE_ID_ATTR_KEY = 'temporal_update_id';
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These differ from Ruby attributes which are camel cased

/** Default trace header for opentelemetry interceptors */
export const TRACE_HEADER = '_tracer-data';
/** As in workflow run id */
export const RUN_ID_ATTR_KEY = 'run_id';
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ruby OTEL uses temporalRunId

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh? Have you compared with other SDKs, beside Ruby? We'd ideally want cross-SDKs compatibility of OTel tracing, as a customer could be operating different languages within a single Temporal application.

Not saying we'll prioritize, of course, but at the very least we should settle on what names we want to normalize to across the board, so that we eventually converge to something consistent.

}
}

function handleError(err: any, span: otel.Span, acceptableErrors?: (err: unknown) => boolean): void {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we think of another verb than handle here? Something that better communicates the fact that the function will be "encoding/attaching error details to the span", rather than "handling the error situation itself".

ensureWorkflowModuleLoaded();
}

public async startTimer(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we tracing Timers in any other SDK?

*
* @since Introduced in 1.13.3
*/
OpenTelemetryInterceptorsInstrumentsAllMethods: defineFlag(6, true, [isAtLeast({ major: 1, minor: 13, patch: 3 })]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, there's no need for the sdk-version-based alternate condition, as the numeric flag will be properly emited anyway. Arguably not a big deal to have the alt condition anyway, but if we always do, then we'll fall into the situation where the "alternative" is actually the "primary"...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Add missing hooks on OTel interceptors

3 participants