Fix/document processing failures 353 by Co-vengers · Pull Request #354 · GetBindu/Bindu

Co-vengers · 2026-03-10T17:01:48Z

Fix: Document Processing Failures (#353)

Branch: fix/document-processing-failures-353
Base: main

Problem

All tasks submitted via the A2A protocol (message/send) failed immediately without executing the agent. The tasks/get response returned tasks in failed state with no artifacts field.

Root Cause

Commit 1cc2a61 (fix(scheduler): resolve anio buffer deadlock, cpu burn loop, and trace serialization) refactored the scheduler to serialize OpenTelemetry trace context as primitive trace_id/span_id strings instead of passing a live Span object. However, the worker (bindu/server/workers/base.py) was not updated in that commit and still accessed task_operation["_current_span"].

This caused a KeyError on every task operation, which the worker's broad exception handler caught and used to mark the task as failed — before any agent logic ran.

message/send → task submitted ✅ → scheduled ✅ → worker KeyError ❌ → task failed (no artifacts)

Issues Resolved

Issue 1: Trace Context Mismatch (CRITICAL)

The scheduler sent {trace_id: "...", span_id: "..."} but the worker expected {_current_span: <Span>}.

Fix: Updated bindu/server/workers/base.py to reconstruct a NonRecordingSpan from the serialized trace_id/span_id strings. Added a _reconstruct_span() helper that:

Parses hex-encoded trace/span IDs into a SpanContext
Wraps it in a NonRecordingSpan for trace correlation
Falls back to an invalid span context if IDs are missing or malformed

Issue 2: Missing `artifacts` in Response (CONSEQUENCE)

tasks/get returned tasks without the artifacts field because tasks never reached the completed state — they crashed before agent execution.

Fix: Resolved automatically by fixing Issue 1. Once tasks execute successfully, ManifestWorker._handle_terminal_state() generates artifacts via build_artifacts() and persists them with update_task().

Issue 3: Unbounded Scheduler Buffer (MINOR)

The InMemoryScheduler used math.inf as the anyio stream buffer size, which could accumulate tasks without backpressure during failures.

Fix: Replaced with a bounded buffer of 100, preserving the deadlock fix while preventing unbounded memory growth.

Files Changed

File	Change
`bindu/server/workers/base.py`	Added `_reconstruct_span()` helper; updated `_handle_task_operation()` to use `trace_id`/`span_id`
`bindu/server/scheduler/memory_scheduler.py`	Replaced `math.inf` buffer with bounded buffer (100)
`tests/conftest.py`	Added `SpanContext`, `TraceFlags`, `NonRecordingSpan`, `INVALID_SPAN_CONTEXT` stubs; registered `opentelemetry.trace.span` submodule

Verification

All 666 unit tests pass with 0 failures:

================= 666 passed, 18 skipped, 77 warnings in 5.99s =================

After this fix, the following flow works end-to-end:

# 1. Submit document for analysis
curl -X POST http://localhost:3773/ \
  -H 'Content-Type: application/json' \
  -d '{
    "jsonrpc": "2.0",
    "id": "test-001",
    "method": "message/send",
    "params": {
      "message": {
        "messageId": "msg-001", "contextId": "ctx-001", "taskId": "task-001",
        "kind": "message", "role": "user",
        "parts": [
          {"kind": "text", "text": "Analyze the uploaded document and summarize."},
          {"kind": "file", "text": "paper.pdf", "file": {"name": "paper.pdf", "mimeType": "application/pdf", "bytes": "<base64>"}}
        ]
      }
    }
  }'

# 2. Poll task status — should reach "completed" with artifacts
curl -X POST http://localhost:3773/ \
  -H 'Content-Type: application/json' \
  -d '{
    "jsonrpc": "2.0",
    "id": "test-002",
    "method": "tasks/get",
    "params": {"taskId": "task-001"}
  }'

…Bindu#353) Worker accessed task_operation["_current_span"] but scheduler now sends primitive trace_id/span_id strings. Add _reconstruct_span() helper to rebuild a NonRecordingSpan from hex-encoded IDs with graceful fallback.

Replace math.inf buffer size with a constant of 100 to prevent unbounded memory growth while still allowing task enqueue before the worker loop is ready.

Add SpanContext, TraceFlags, NonRecordingSpan, and INVALID_SPAN_CONTEXT mocks. Register opentelemetry.trace.span submodule so worker imports resolve in the test environment.

Co-vengers added 3 commits March 10, 2026 22:07

fix(scheduler): replace unbounded stream buffer with bounded limit

5264616

Replace math.inf buffer size with a constant of 100 to prevent unbounded memory growth while still allowing task enqueue before the worker loop is ready.

test: add opentelemetry.trace.span stubs for NonRecordingSpan imports

7f08f05

Add SpanContext, TraceFlags, NonRecordingSpan, and INVALID_SPAN_CONTEXT mocks. Register opentelemetry.trace.span submodule so worker imports resolve in the test environment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/document processing failures 353#354

Fix/document processing failures 353#354
Co-vengers wants to merge 3 commits intoGetBindu:mainfrom
Co-vengers:fix/document-processing-failures-353

Co-vengers commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Co-vengers commented Mar 10, 2026

Fix: Document Processing Failures (#353)

Problem

Root Cause

Issues Resolved

Issue 1: Trace Context Mismatch (CRITICAL)

Issue 2: Missing artifacts in Response (CONSEQUENCE)

Issue 3: Unbounded Scheduler Buffer (MINOR)

Files Changed

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Issue 2: Missing `artifacts` in Response (CONSEQUENCE)