Skip to content

switch_context should retry only transient errors, not all anyhow errors #78

@ndenev

Description

@ndenev

Summary

switch_context retries on all errors, including permanent failures, because its retry predicate is unconditional.

Affected code

  • src/kubernetes/client.rs (switch_context uses retry_with_backoff with |_: &anyhow::Error| true)

Problem

Current behavior retries permanent errors (config/auth/validation) the same as transient network errors. This causes avoidable delay, log noise, and poorer UX.

Why this matters

  • Slower fail-fast behavior in REPL/CLI context switching.
  • Misleading retry logs for non-retryable failures.
  • Unnecessary delay in scripts/automation.

Proposed fix

  1. Add retry classification for switch_context path:
    • retry only transient failures (timeouts, transport errors, 429/5xx, temporary unavailability)
    • do not retry permanent failures (invalid context, auth/RBAC, deterministic config errors)
  2. Keep exponential backoff for retryable classes.
  3. Improve logging to indicate retryable vs non-retryable decision.

Tests to add

  1. Permanent error case: verify no retries (or minimal retries as explicitly intended).
  2. Transient error case: verify retries occur and respect max attempts.
  3. Ensure final aggregated error reporting remains clear across multiple contexts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions