Skip to content

Diagnostics: classify provider runner failures#67

Open
gaofeng21cn wants to merge 1 commit intoResearAI:mainfrom
gaofeng21cn:codex/upstream-runner-diagnostics
Open

Diagnostics: classify provider runner failures#67
gaofeng21cn wants to merge 1 commit intoResearAI:mainfrom
gaofeng21cn:codex/upstream-runner-diagnostics

Conversation

@gaofeng21cn
Copy link
Copy Markdown
Contributor

Problem

Runner failures currently expose some deterministic local errors, but provider account/quota blockers and transient upstream provider errors are not separated clearly enough. Retry-exhausted events also lack a structured diagnosis payload, making daemon and doctor follow-up harder.

Solution

  • Add generic Codex provider diagnostics for account/quota/billing/credential blockers and transient upstream provider failures.
  • Treat provider account blockers as non-retryable so the daemon stops blind retries and surfaces actionable guidance.
  • Preserve retry/backoff behavior for transient upstream provider failures, then park the quest with an external-provider continuation reason after retry exhaustion.
  • Attach structured diagnosis payloads to runner.turn_retry_exhausted events and propagate diagnosis code/guidance to turn errors.
  • Extend doctor coverage for provider account failures with problem/why/fix rendering.

MDS provenance

Adapted the generic runner-diagnostics parts of MedDeepScientist commits bdd6f6f, 2f584e7, and the generic diagnostic/doctor pieces of 838f87e. MedDeepScientist, MAS, medical-fork, manuscript-readiness, and publication-authority semantics were intentionally excluded.

Tests

  • python -m py_compile src/deepscientist/diagnostics/runner_failures.py src/deepscientist/daemon/app.py src/deepscientist/doctor.py
  • git diff --check
  • uv run --with pytest pytest -q tests/test_runner_failure_diagnostics.py tests/test_daemon_api.py::test_daemon_retry_exhausts_after_five_attempts tests/test_daemon_api.py::test_daemon_retry_exhaustion_records_provider_diagnosis_payload tests/test_daemon_api.py::test_daemon_stops_retry_for_provider_account_blocker tests/test_daemon_api.py::test_daemon_skips_retry_for_non_retryable_minimax_protocol_error tests/test_daemon_api.py::test_daemon_skips_retry_for_unknown_binary_attachment_extension_error tests/test_doctor.py::test_doctor_reports_recent_runtime_failure_with_problem_why_fix tests/test_doctor.py::test_doctor_reports_provider_account_runtime_failure_with_problem_why_fix tests/test_doctor.py::test_doctor_surfaces_probe_diagnosis_for_known_tool_argument_error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant