Skip to content

[Bug]: Document Processing Failures #353

@Co-vengers

Description

@Co-vengers

Issue Analysis: Document Processing Failures on main

Overview

The main branch fails to process documents submitted via the A2A protocol (message/send). Tasks immediately transition to failed state without executing the agent, and the tasks/get response lacks the artifacts field. The feature/document-analyzer branch does not have these issues.


Commit Origin

Reference commit: 6aa857fdocs(example): update document‑analyzer README with file text field

All issues arose after commit 6aa857f. That commit (on the feature/document-analyzer branch) only touched a README. The breaking changes were introduced by two subsequent commits merged into main from separate branches that forked from the same parent (700d111):

Commit Date Description Issues Introduced
1cc2a61 Mar 6, 2026 fix(scheduler): resolve anio buffer deadlock, cpu burn loop, and trace serialization Issue 1 (trace context mismatch) and Issue 4 (unbounded buffer)
a6f2206 Mar 7, 2026 refactor(storage): harden memory layer, fix OOM risks, and optimize database indexes Storage API changes (additional offset param, interface changes)
16f1353 Mar 8, 2026 style: apply consistent formatting and add comprehensive docstrings Docstring-only follow-up to 1cc2a61

The critical breaking commit is 1cc2a61. It changed the scheduler's _TaskOperation TypedDict from _current_span: Span to trace_id: str | None / span_id: str | None, but did not update the worker (bindu/server/workers/base.py) which still expects _current_span. This half-completed refactor causes every task to crash.

            700d111 (common ancestor)
           /       \
  6aa857f          1cc2a61 ← scheduler trace refactor (BROKE worker contract)
  (feature/        a6f2206 ← storage refactor
   document-       16f1353 ← formatting follow-up
   analyzer)          |
                   6d189cb (HEAD of main)

Issues

1. Trace Context Mismatch — Worker Crashes on Every Task (CRITICAL)

Introduced by: 1cc2a61 (fix(scheduler): resolve anio buffer deadlock, cpu burn loop, and trace serialization)

Impact: All tasks fail immediately. No documents are ever processed.

The scheduler and worker have an incompatible interface for passing OpenTelemetry trace context:

Component File Sends/Expects
Scheduler base type bindu/server/scheduler/base.py (L67–76) trace_id: str, span_id: str
InMemoryScheduler bindu/server/scheduler/memory_scheduler.py (L68–72) Sends trace_id/span_id strings
Worker bindu/server/workers/base.py (L130) Expects task_operation["_current_span"] (Span object)

Commit 1cc2a61 changed _TaskOperation and InMemoryScheduler to use primitive trace_id/span_id strings, but did not update the worker (bindu/server/workers/base.py was not in the commit's changeset). The worker still calls use_span(task_operation["_current_span"]) which raises a KeyError on every task, caught by the broad except clause which marks the task as failed.

Fix required: Either:

  • (A) Update base.py worker to reconstruct a span from trace_id/span_id strings, or
  • (B) Revert the scheduler to pass the live _current_span Span object (matching feature/document-analyzer). This involves:
    • bindu/server/scheduler/base.py: Change _TaskOperation fields from trace_id/span_id back to _current_span: Span
    • bindu/server/scheduler/memory_scheduler.py: Remove _get_trace_context() helper; pass get_current_span() directly
    • Same for bindu/server/scheduler/redis_scheduler.py

2. Response Missing artifacts Field (CONSEQUENCE OF 1)

Impact: tasks/get returns a task with no artifacts.

This is not a separate bug — it is a direct consequence of Issue 1. The processing flow is:

message/send → task created (state: "submitted", no artifacts)
            → scheduled to worker
            → worker crashes on _current_span KeyError
            → task marked "failed" (no artifacts generated)
tasks/get   → returns failed task without artifacts

Artifacts are only generated in ManifestWorker._handle_terminal_state() when state is "completed". Since the worker never reaches agent execution, no artifacts are ever created.

Fix required: Resolving Issue 1 will fix this — once tasks execute successfully, build_artifacts() will produce artifacts and update_task() will persist them.

3. Frontend Does Not Pass File Parts to Agent Messages (MODERATE)

Impact: Frontend file uploads are constructed but never reach the agent due to Issue 1. If Issue 1 is fixed, this path works correctly on main.

On main, frontend/src/lib/utils/agentMessageHandler.ts accepts a files parameter, builds FilePart objects with the A2A-required text field, and includes them in the message payload. The frontend/src/lib/server/endpoints/bindu/types.ts FilePart interface also requires text: string.

On feature/document-analyzer, the frontend file upload code is entirely removed — the files parameter is dropped and messages only contain TextPart. The FilePart type also drops the text field.

No fix required for backend processing — the curl-based API path works correctly for file uploads. The frontend code on main is structurally correct but untestable while Issue 1 exists.

4. InMemoryScheduler Uses Unbounded Buffer (MINOR)

Introduced by: 1cc2a61 (fix(scheduler): resolve anio buffer deadlock, cpu burn loop, and trace serialization)

File: bindu/server/scheduler/memory_scheduler.py (L53–55)

On main, the anyio memory object stream is created with math.inf buffer:

anyio.create_memory_object_stream[TaskOperation](math.inf)

On feature/document-analyzer, it uses the default (unbuffered):

anyio.create_memory_object_stream[TaskOperation]()

The math.inf buffer was added to prevent a deadlock where the API server hangs if no worker is immediately ready to receive. However, an unbounded buffer can silently accumulate tasks during failures without backpressure.

Fix required: Evaluate whether a bounded buffer (e.g., 100) is more appropriate than math.inf, or keep the default if the worker startup is guaranteed before task submission.


Root Cause Chain

curl message/send (with file parts)
  ↓
Task submitted to storage (state: "submitted")              ✅
  ↓
Task scheduled via InMemoryScheduler.run_task()             ✅
  sends: {operation: "run", params: ..., trace_id: "...", span_id: "..."}
  ↓
Worker._handle_task_operation() receives TaskOperation      ✅
  ↓
Worker accesses task_operation["_current_span"]             ❌ KeyError
  ↓
Exception caught → storage.update_task(state="failed")      ← task never runs
  ↓
tasks/get returns: {state: "failed", NO artifacts}

Files Requiring Changes

File Change Priority
bindu/server/scheduler/base.py Fix _TaskOperation type to match worker expectations Critical
bindu/server/scheduler/memory_scheduler.py Fix trace context passing to match _TaskOperation type Critical
bindu/server/scheduler/redis_scheduler.py Fix trace context passing to match _TaskOperation type Critical
bindu/server/workers/base.py Ensure _handle_task_operation matches scheduler's task operation format Critical

Verification

After fixing, the following should work:

# 1. Send document
curl -X POST http://localhost:3773/ \
  -H 'Content-Type: application/json' \
  -d '{
    "jsonrpc": "2.0",
    "id": "test-001",
    "method": "message/send",
    "params": {
      "message": {
        "messageId": "msg-001",
        "contextId": "ctx-001",
        "taskId": "task-001",
        "kind": "message",
        "role": "user",
        "parts": [
          {"kind": "text", "text": "Analyze this document"},
          {"kind": "file", "text": "paper.pdf", "file": {"name": "paper.pdf", "mimeType": "application/pdf", "bytes": "<base64>"}}
        ]
      }
    }
  }'
# Expected: task in "submitted" state

# 2. Check task status (after processing)
curl -X POST http://localhost:3773/ \
  -H 'Content-Type: application/json' \
  -d '{
    "jsonrpc": "2.0",
    "id": "test-002",
    "method": "tasks/get",
    "params": {"taskId": "task-001"}
  }'
# Expected: task in "completed" state WITH artifacts array

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    In progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions