Skip to content

fix: restore _current_span in scheduler to fix task failures (issue #…#380

Open
NexionisJake wants to merge 1 commit intoGetBindu:mainfrom
NexionisJake:fix/issue-353-current-span
Open

fix: restore _current_span in scheduler to fix task failures (issue #…#380
NexionisJake wants to merge 1 commit intoGetBindu:mainfrom
NexionisJake:fix/issue-353-current-span

Conversation

@NexionisJake
Copy link
Contributor

Commit 1cc2a61 changed _TaskOperation to carry trace_id/span_id strings but the in-memory path never restored the
span in the worker, silently dropping trace context and causing every task to fail when the worker tried to access
_current_span.

Changes:

  • base.py: revert _TaskOperation fields from trace_id/span_id back to _current_span: Span; import Span from
    opentelemetry.trace; update docstring to clarify Redis divergence
  • memory_scheduler.py: update _send_operation() to pass _current_span=get_current_span() instead of
    trace_id/span_id; remove unused math import; replace math.inf buffer with 100 to restore backpressure; drop
    get_trace_context import

workers/base.py already handles the None-span case via nullcontext() so no change is needed there. The Redis
scheduler correctly serialises trace_id/span_id strings for cross-process transport (CASE 2).

Summary

  • Problem: 1cc2a61 refactored _TaskOperation to use serialisable trace_id/span_id strings, but never updated
    InMemoryScheduler to pass a live span — so _current_span was never set in the task dict, the worker received None,
    and every task was silently marked failed
  • Why it matters: 100% of tasks submitted via the A2A protocol failed immediately with no artifacts produced; the
    broad except block in the worker swallowed the error with no visible signal
  • What changed: _TaskOperation carries _current_span: Span again; _send_operation() in InMemoryScheduler now
    calls get_current_span() directly; math.inf buffer replaced with 100 to prevent unbounded memory growth under
    backpressure
  • What did NOT change (scope boundary): RedisScheduler is untouched — it legitimately needs trace_id/span_id
    strings because live Span objects cannot be JSON-serialised across processes; workers/base.py is untouched — it
    already guards against a missing span with nullcontext()

Change Type (select all that apply)

  • Bug fix
  • Feature
  • Refactor
  • Documentation
  • Security hardening
  • Tests
  • Chore/infra

Scope (select all touched areas)

  • Server / API endpoints
  • Extensions (DID, x402, etc.)
  • Storage backends
  • Scheduler backends
  • Observability / monitoring
  • Authentication / authorization
  • CLI / utilities
  • Tests
  • Documentation
  • CI/CD / infra

Linked Issue/PR

User-Visible / Behavior Changes

Tasks submitted via the A2A message/send endpoint now complete successfully and return artifacts. Previously every task
immediately transitioned to "failed" state with no artifacts.

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/credentials handling changed? No
  • New/changed network calls? No
  • Database schema/migration changes? No
  • Authentication/authorization changes? No
  • If any Yes, explain risk + mitigation: N/A

Verification

Environment

  • OS: Linux 6.17.0-19-generic
  • Python version: 3.12.3
  • Storage backend: InMemoryStorage
  • Scheduler backend: InMemoryScheduler

Steps to Test

  1. Start the server: python examples/beginner/echo_simple_agent.py
  2. Submit a task: curl -X POST http://localhost:3773/ -H 'Content-Type: application/json' -d '{"jsonrpc":"2.0","id":"<uu id>","method":"message/send","params":{"configuration":{"acceptedOutputModes":["text"]},"message":{"messageId":"<uuid>"," contextId":"<uuid>","taskId":"<uuid>","kind":"message","role":"user","parts":[{"kind":"text","text":"Hello"}]}}}'
  3. Poll the task: curl -X POST http://localhost:3773/ -H 'Content-Type: application/json' -d '{"jsonrpc":"2.0","id":"<uuid>","method":"tasks/get","params":{"taskId":"<task-uuid>"}}'

Expected Behavior

  • Task transitions: submittedworkingcompleted
  • Response includes a non-empty artifacts array

Actual Behavior

  • Task transitions immediately to failed
  • No artifacts field in response
  • Worker logs a silent KeyError on _current_span swallowed by the broad except block

Evidence (attach at least one)

  • Failing test before + passing after
  • Test output / logs
  • Screenshot / recording
  • Performance metrics (if relevant)

Full suite after fix: 666 passed, 18 skipped, 0 failures

…etBindu#353)

Commit 1cc2a61 changed _TaskOperation to carry trace_id/span_id strings
but the in-memory path never restored the span in the worker, silently
dropping trace context and causing every task to fail when the worker
tried to access _current_span.

Changes:
- base.py: revert _TaskOperation fields from trace_id/span_id back to
  _current_span: Span; import Span from opentelemetry.trace; update
  docstring to clarify Redis divergence
- memory_scheduler.py: update _send_operation() to pass
  _current_span=get_current_span() instead of trace_id/span_id;
  remove unused math import; replace math.inf buffer with 100 to
  restore backpressure; drop get_trace_context import

workers/base.py already handles the None-span case via nullcontext()
so no change is needed there. The Redis scheduler correctly serialises
trace_id/span_id strings for cross-process transport (CASE 2).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Document Processing Failures

1 participant