Skip to content

[Bug]:OpenTelemetry TraceId Mismatch When Using SubAgent as Tool #705

@yangxb2010000

Description

@yangxb2010000

Describe the bug

When using a subagent as a tool (registered via toolkit.registration().subAgent()), the subagent's OpenTelemetry spans have a different traceId than the main agent's spans. This breaks the distributed trace continuity, making it impossible to track the complete call chain in trace visualization tools like Langfuse or Jaeger.

To Reproduce

Code to reproduce

import io.agentscope.core.ReActAgent;
import io.agentscope.core.formatter.dashscope.DashScopeChatFormatter;
import io.agentscope.core.message.Msg;
import io.agentscope.core.message.MsgRole;
import io.agentscope.core.message.TextBlock;
import io.agentscope.core.model.DashScopeChatModel;
import io.agentscope.core.tool.Toolkit;
import io.agentscope.core.tracing.TracerRegistry;
import io.agentscope.core.tracing.telemetry.TelemetryTracer;

// 1. Initialize tracing
TelemetryTracer tracer = TelemetryTracer.builder()
    .endpoint("http://localhost:4318/v1/traces")
    .build();
TracerRegistry.register(tracer);

// 2. Create subagent with tools
ReActAgent weatherSpecialist = ReActAgent.builder()
    .name("WeatherSpecialist")
    .sysPrompt("You are a weather specialist...")
    .model(DashScopeChatModel.builder()
        .apiKey(apiKey)
        .modelName("qwen-plus")
        .formatter(new DashScopeChatFormatter())
        .build())
    .toolkit(new Toolkit().registerTool(new WeatherTools()))
    .build();

// 3. Register subagent as tool
Toolkit mainToolkit = new Toolkit();
mainToolkit.registration()
    .subAgent(() -> weatherSpecialist)
    .apply();

// 4. Create main agent
ReActAgent mainAgent = ReActAgent.builder()
    .name("Assistant")
    .sysPrompt("Use the weather specialist for weather queries")
    .model(DashScopeChatModel.builder()
        .apiKey(apiKey)
        .modelName("qwen-plus")
        .formatter(new DashScopeChatFormatter())
        .build())
    .toolkit(mainToolkit)
    .build();

// 5. Execute - this will create disconnected traces
Msg userMsg = Msg.builder()
    .role(MsgRole.USER)
    .content(TextBlock.builder().text("What's the weather in Shanghai?").build())
    .build();
mainAgent.call(userMsg).block();

Steps to reproduce

  1. Register a subagent as a tool using toolkit.registration().subAgent()
  2. Enable OpenTelemetry tracing using TelemetryTracer
  3. Call the main agent with a query that will invoke the subagent tool
  4. Check the trace output in Langfuse/Jaeger

Observed behavior

  • Main agent's spans have traceId: abc123...
  • Subagent's spans have traceId: def456... (different!)
  • The two traces are not connected in the trace visualization

Expected behavior

  • Main agent and subagent should share the same traceId
  • Subagent's spans should have the main agent's span as parent
  • Complete trace chain should be visible: Main Agent → SubAgent Tool → SubAgent LLM → Tools

Error messages

No errors are thrown. The issue is only visible when inspecting the traces in Langfuse/Jaeger.

Environment

  • AgentScope Version: 1.0.9-SNAPSHOT (Java)
  • Java Version: 17
  • OS: macOS/Linux/Windows (platform-independent)
  • OpenTelemetry Version: Using io.opentelemetry:opentelemetry-api and agentscope-extensions-studio

Root Cause Analysis

The issue is caused by missing OpenTelemetry trace context propagation in two locations:

1. AgentBase.createEventStream()

The Flux.create callback synchronously calls callSupplier.get(), which creates the agent's Mono outside of the Reactor subscription context. This prevents trace context from being propagated to agent.call().

2. SubAgentTool.executeWithStreaming/WithoutStreaming()

These methods call agent.stream() or agent.call() without using contextWrite() to propagate the Reactor Context (containing trace information) to the downstream agent invocation.

Solution

The fix involves three changes to ensure trace context propagation:

1. AgentBase.createEventStream - Use Mono.defer

return Flux.deferContextual(
    ctxView ->
        Mono.defer(() -> callSupplier.get())  // Delay to subscription time
            .contextWrite(context -> context.putAll(ctxView))  // Propagate trace
)

2. SubAgentTool.executeWithStreaming - Add contextWrite

return Mono.deferContextual(
    ctxView ->
        Mono.from(agent.stream(...))
            .contextWrite(context -> context.putAll(ctxView))  // Propagate trace
)

3. SubAgentTool.executeWithoutStreaming - Add contextWrite

return Mono.deferContextual(
    ctxView ->
        agent.call(...)
            .contextWrite(context -> context.putAll(ctxView))  // Propagate trace
)

Additional context

Why this happens

  • OpenTelemetry's Reactor instrumentation stores trace context in Reactor's Context using special keys
  • context.putAll(ctxView) copies these keys to downstream operations
  • Without deferContextual + contextWrite, the trace context is not propagated when creating new Publishers
  • Mono.defer() ensures the supplier executes at subscription time when trace context is available

Impact

This is a critical issue for any application using:

  • Subagents as tools
  • Distributed tracing with OpenTelemetry
  • Trace analysis tools (Langfuse, Jaeger, Zipkin, etc.)

Testing

After applying the fix, verify trace continuity by:

  1. Running the example with tracing enabled
  2. Checking Langfuse/Jaeger for a single connected trace
  3. Confirming all spans share the same traceId with correct parent-child relationships

Priority: High
Type: Bug
Component: Core tracing / SubAgent tool integration

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions