-
Notifications
You must be signed in to change notification settings - Fork 318
Description
Issue Analysis: Document Processing Failures on main
Overview
The main branch fails to process documents submitted via the A2A protocol (message/send). Tasks immediately transition to failed state without executing the agent, and the tasks/get response lacks the artifacts field. The feature/document-analyzer branch does not have these issues.
Commit Origin
Reference commit: 6aa857f — docs(example): update document‑analyzer README with file text field
All issues arose after commit 6aa857f. That commit (on the feature/document-analyzer branch) only touched a README. The breaking changes were introduced by two subsequent commits merged into main from separate branches that forked from the same parent (700d111):
| Commit | Date | Description | Issues Introduced |
|---|---|---|---|
1cc2a61 |
Mar 6, 2026 | fix(scheduler): resolve anio buffer deadlock, cpu burn loop, and trace serialization |
Issue 1 (trace context mismatch) and Issue 4 (unbounded buffer) |
a6f2206 |
Mar 7, 2026 | refactor(storage): harden memory layer, fix OOM risks, and optimize database indexes |
Storage API changes (additional offset param, interface changes) |
16f1353 |
Mar 8, 2026 | style: apply consistent formatting and add comprehensive docstrings |
Docstring-only follow-up to 1cc2a61 |
The critical breaking commit is 1cc2a61. It changed the scheduler's _TaskOperation TypedDict from _current_span: Span to trace_id: str | None / span_id: str | None, but did not update the worker (bindu/server/workers/base.py) which still expects _current_span. This half-completed refactor causes every task to crash.
700d111 (common ancestor)
/ \
6aa857f 1cc2a61 ← scheduler trace refactor (BROKE worker contract)
(feature/ a6f2206 ← storage refactor
document- 16f1353 ← formatting follow-up
analyzer) |
6d189cb (HEAD of main)
Issues
1. Trace Context Mismatch — Worker Crashes on Every Task (CRITICAL)
Introduced by: 1cc2a61 (fix(scheduler): resolve anio buffer deadlock, cpu burn loop, and trace serialization)
Impact: All tasks fail immediately. No documents are ever processed.
The scheduler and worker have an incompatible interface for passing OpenTelemetry trace context:
| Component | File | Sends/Expects |
|---|---|---|
| Scheduler base type | bindu/server/scheduler/base.py (L67–76) |
trace_id: str, span_id: str |
| InMemoryScheduler | bindu/server/scheduler/memory_scheduler.py (L68–72) |
Sends trace_id/span_id strings |
| Worker | bindu/server/workers/base.py (L130) |
Expects task_operation["_current_span"] (Span object) |
Commit 1cc2a61 changed _TaskOperation and InMemoryScheduler to use primitive trace_id/span_id strings, but did not update the worker (bindu/server/workers/base.py was not in the commit's changeset). The worker still calls use_span(task_operation["_current_span"]) which raises a KeyError on every task, caught by the broad except clause which marks the task as failed.
Fix required: Either:
- (A) Update
base.pyworker to reconstruct a span fromtrace_id/span_idstrings, or - (B) Revert the scheduler to pass the live
_current_spanSpan object (matchingfeature/document-analyzer). This involves:bindu/server/scheduler/base.py: Change_TaskOperationfields fromtrace_id/span_idback to_current_span: Spanbindu/server/scheduler/memory_scheduler.py: Remove_get_trace_context()helper; passget_current_span()directly- Same for
bindu/server/scheduler/redis_scheduler.py
2. Response Missing artifacts Field (CONSEQUENCE OF 1)
Impact: tasks/get returns a task with no artifacts.
This is not a separate bug — it is a direct consequence of Issue 1. The processing flow is:
message/send → task created (state: "submitted", no artifacts)
→ scheduled to worker
→ worker crashes on _current_span KeyError
→ task marked "failed" (no artifacts generated)
tasks/get → returns failed task without artifacts
Artifacts are only generated in ManifestWorker._handle_terminal_state() when state is "completed". Since the worker never reaches agent execution, no artifacts are ever created.
Fix required: Resolving Issue 1 will fix this — once tasks execute successfully, build_artifacts() will produce artifacts and update_task() will persist them.
3. Frontend Does Not Pass File Parts to Agent Messages (MODERATE)
Impact: Frontend file uploads are constructed but never reach the agent due to Issue 1. If Issue 1 is fixed, this path works correctly on main.
On main, frontend/src/lib/utils/agentMessageHandler.ts accepts a files parameter, builds FilePart objects with the A2A-required text field, and includes them in the message payload. The frontend/src/lib/server/endpoints/bindu/types.ts FilePart interface also requires text: string.
On feature/document-analyzer, the frontend file upload code is entirely removed — the files parameter is dropped and messages only contain TextPart. The FilePart type also drops the text field.
No fix required for backend processing — the curl-based API path works correctly for file uploads. The frontend code on main is structurally correct but untestable while Issue 1 exists.
4. InMemoryScheduler Uses Unbounded Buffer (MINOR)
Introduced by: 1cc2a61 (fix(scheduler): resolve anio buffer deadlock, cpu burn loop, and trace serialization)
File: bindu/server/scheduler/memory_scheduler.py (L53–55)
On main, the anyio memory object stream is created with math.inf buffer:
anyio.create_memory_object_stream[TaskOperation](math.inf)On feature/document-analyzer, it uses the default (unbuffered):
anyio.create_memory_object_stream[TaskOperation]()The math.inf buffer was added to prevent a deadlock where the API server hangs if no worker is immediately ready to receive. However, an unbounded buffer can silently accumulate tasks during failures without backpressure.
Fix required: Evaluate whether a bounded buffer (e.g., 100) is more appropriate than math.inf, or keep the default if the worker startup is guaranteed before task submission.
Root Cause Chain
curl message/send (with file parts)
↓
Task submitted to storage (state: "submitted") ✅
↓
Task scheduled via InMemoryScheduler.run_task() ✅
sends: {operation: "run", params: ..., trace_id: "...", span_id: "..."}
↓
Worker._handle_task_operation() receives TaskOperation ✅
↓
Worker accesses task_operation["_current_span"] ❌ KeyError
↓
Exception caught → storage.update_task(state="failed") ← task never runs
↓
tasks/get returns: {state: "failed", NO artifacts}
Files Requiring Changes
| File | Change | Priority |
|---|---|---|
bindu/server/scheduler/base.py |
Fix _TaskOperation type to match worker expectations |
Critical |
bindu/server/scheduler/memory_scheduler.py |
Fix trace context passing to match _TaskOperation type |
Critical |
bindu/server/scheduler/redis_scheduler.py |
Fix trace context passing to match _TaskOperation type |
Critical |
bindu/server/workers/base.py |
Ensure _handle_task_operation matches scheduler's task operation format |
Critical |
Verification
After fixing, the following should work:
# 1. Send document
curl -X POST http://localhost:3773/ \
-H 'Content-Type: application/json' \
-d '{
"jsonrpc": "2.0",
"id": "test-001",
"method": "message/send",
"params": {
"message": {
"messageId": "msg-001",
"contextId": "ctx-001",
"taskId": "task-001",
"kind": "message",
"role": "user",
"parts": [
{"kind": "text", "text": "Analyze this document"},
{"kind": "file", "text": "paper.pdf", "file": {"name": "paper.pdf", "mimeType": "application/pdf", "bytes": "<base64>"}}
]
}
}
}'
# Expected: task in "submitted" state
# 2. Check task status (after processing)
curl -X POST http://localhost:3773/ \
-H 'Content-Type: application/json' \
-d '{
"jsonrpc": "2.0",
"id": "test-002",
"method": "tasks/get",
"params": {"taskId": "task-001"}
}'
# Expected: task in "completed" state WITH artifacts arrayMetadata
Metadata
Assignees
Labels
Type
Projects
Status