Skip to content

Conversation

mldangelo
Copy link
Member

Summary

Remove the retry command and --retry-errors flag functionality that was added in #5647, as it is redundant with existing filtering capabilities.

Problem Analysis

The retry functionality provided two ways to re-run failed tests:

  1. promptfoo retry <evalId> - retries ERROR results from a specific evaluation
  2. promptfoo eval --retry-errors - retries ERROR results from latest evaluation

However, the same practical outcome can be achieved more simply with the existing:

promptfoo eval --filter-errors-only <evalId>

Why the existing approach is better

Feature Retry Commands Existing --filter-errors-only
Database modification ❌ Deletes ERROR results ✅ No database changes
Implementation complexity ❌ High (deletion + metric recalculation) ✅ Low (simple filtering)
Data integrity ❌ Risk of corruption ✅ Preserves historical data
Result tracking ❌ Replaces deleted results ✅ Creates new evaluation
Same practical outcome ✅ Re-runs error tests ✅ Re-runs error tests

Changes Made

  • ✅ Removed src/commands/retry.ts (entire retry command implementation)
  • ✅ Removed --retry-errors flag and logic from src/commands/eval.ts
  • ✅ Removed retry command registration from src/main.ts
  • ✅ Removed retry documentation from CLI docs
  • ✅ Removed retry testing examples from examples/retry-testing/
  • ✅ Reverted schema changes in site/static/config-schema.json
  • ✅ Reverted type changes in src/types/index.ts

Quality Assurance

  • ✅ All linting and formatting checks pass (npm run l && npm run f)
  • ✅ TypeScript compilation successful (npm run tsc)
  • ✅ All existing tests pass (npm test)
  • ✅ No test changes needed (retry functionality had no dedicated tests)

Implications

Users who were relying on the retry commands can achieve the same result with:

Instead of:

promptfoo retry <evalId>
# or
promptfoo eval --retry-errors

Use:

promptfoo eval --filter-errors-only <evalId>

This approach provides the same functionality while being simpler, safer, and more consistent with the existing codebase architecture.

Remove retry command and --retry-errors flag as they are redundant
with existing --filter-errors-only capability.

The retry functionality provided two ways to re-run failed tests:
1. promptfoo retry <evalId>
2. promptfoo eval --retry-errors

However, the same outcome can be achieved more simply with:
promptfoo eval --filter-errors-only <evalId>

The existing filter approach is preferable because:
- Simpler implementation without database manipulation
- No risk of corrupting evaluation data
- Preserves evaluation history
- Same practical result (re-running error tests)

Reverted changes from commit ab6ae2b:
- Removed src/commands/retry.ts
- Removed --retry-errors flag and logic from eval command
- Removed retry command registration from main.ts
- Removed retry documentation
- Removed retry testing examples
- Reverted schema and type changes
Copy link
Contributor

use-tusk bot commented Sep 24, 2025

⏩ No test execution environment matched (5c65cd8) View output ↗


View check history

Commit Status Output Created (UTC)
d46e3dc ⏩ No test execution environment matched Output Sep 24, 2025 1:18AM
5c65cd8 ⏩ No test execution environment matched Output Sep 29, 2025 6:21AM

View output in GitHub ↗

Copy link
Contributor

coderabbitai bot commented Sep 24, 2025

📝 Walkthrough

Walkthrough

This change removes the retry-errors functionality across the codebase. It deletes the retry CLI command (src/commands/retry.ts) and its wiring (src/main.ts), removes the --retry-errors option from eval (src/commands/eval.ts), and updates related schemas/types (site/static/config-schema.json, src/types/index.ts). Documentation is updated to remove the option and “Retry Errors” section (site/docs/usage/command-line.md). The retry-testing example and its files are removed (examples/retry-testing/*). Eval command logic is adjusted to rely on resumeRaw for flow control and option application, replacing prior resumeEval checks.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title "revert: remove redundant retry functionality" is concise, accurately summarizes the main change (removal of the retry command and --retry-errors flag), and clearly communicates the primary intent to reviewers.
Description Check ✅ Passed The PR description provides a clear summary, rationale, affected files, and QA steps that match the raw summary of changes, so it is directly related to and explanatory of the changeset.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/revert-retry-functionality

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 995fde4 and d46e3dc.

📒 Files selected for processing (9)
  • examples/retry-testing/README.md (0 hunks)
  • examples/retry-testing/errorProvider.js (0 hunks)
  • examples/retry-testing/promptfooconfig.yaml (0 hunks)
  • site/docs/usage/command-line.md (0 hunks)
  • site/static/config-schema.json (0 hunks)
  • src/commands/eval.ts (5 hunks)
  • src/commands/retry.ts (0 hunks)
  • src/main.ts (0 hunks)
  • src/types/index.ts (0 hunks)
💤 Files with no reviewable changes (8)
  • site/static/config-schema.json
  • examples/retry-testing/promptfooconfig.yaml
  • site/docs/usage/command-line.md
  • examples/retry-testing/errorProvider.js
  • examples/retry-testing/README.md
  • src/types/index.ts
  • src/commands/retry.ts
  • src/main.ts
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/gh-cli-workflow.mdc)

Prefer not to introduce new TypeScript types; use existing interfaces whenever possible

**/*.{ts,tsx}: Follow consistent import order (Biome will handle sorting)
Use consistent curly braces for all control statements
Prefer const over let; avoid var
Use object shorthand syntax whenever possible
Use async/await for asynchronous code
Use consistent error handling with proper type checks

Files:

  • src/commands/eval.ts
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/docusaurus.mdc:0-0
Timestamp: 2025-07-18T17:24:58.606Z
Learning: Applies to site/src/pages/**/*.md : Use 'eval' instead of 'evaluation' in all documentation; when referring to command line usage, use 'npx promptfoo eval' rather than 'npx promptfoo evaluation'; maintain consistency with this terminology across all examples, code blocks, and explanations.
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/docusaurus.mdc:0-0
Timestamp: 2025-07-18T17:24:58.606Z
Learning: Applies to site/blog/**/*.md : Use 'eval' instead of 'evaluation' in all documentation; when referring to command line usage, use 'npx promptfoo eval' rather than 'npx promptfoo evaluation'; maintain consistency with this terminology across all examples, code blocks, and explanations.
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/docusaurus.mdc:0-0
Timestamp: 2025-07-18T17:24:58.606Z
Learning: Applies to site/src/pages/**/*.mdx : Use 'eval' instead of 'evaluation' in all documentation; when referring to command line usage, use 'npx promptfoo eval' rather than 'npx promptfoo evaluation'; maintain consistency with this terminology across all examples, code blocks, and explanations.
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/docusaurus.mdc:0-0
Timestamp: 2025-07-18T17:24:58.606Z
Learning: Applies to site/docs/**/*.md : Use 'eval' instead of 'evaluation' in all documentation; when referring to command line usage, use 'npx promptfoo eval' rather than 'npx promptfoo evaluation'; maintain consistency with this terminology across all examples, code blocks, and explanations.
🧬 Code graph analysis (1)
src/commands/eval.ts (1)
src/index.ts (1)
  • loadApiProvider (164-164)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: Redteam (Production API)
  • GitHub Check: Share Test
  • GitHub Check: Build Docs
  • GitHub Check: webui tests
  • GitHub Check: Redteam (Staging API)
  • GitHub Check: Run Integration Tests
  • GitHub Check: Test on Node 22.x and ubuntu-latest
  • GitHub Check: Test on Node 24.x and windows-latest
  • GitHub Check: Test on Node 24.x and ubuntu-latest
  • GitHub Check: Test on Node 22.x and macOS-latest
  • GitHub Check: Test on Node 22.x and windows-latest
  • GitHub Check: Test on Node 20.x and macOS-latest
  • GitHub Check: Test on Node 20.x and windows-latest
  • GitHub Check: Test on Node 20.x and ubuntu-latest
  • GitHub Check: Generate Assets
  • GitHub Check: Build on Node 24.x
  • GitHub Check: Build on Node 20.x
  • GitHub Check: Style Check
  • GitHub Check: Build on Node 22.x
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (6)
src/commands/eval.ts (6)

292-304: Gating test filtering behind non-resume is correct

Skipping filters during resume preserves test indices and result alignment. Looks good.


316-321: Provider filtering correctly skipped during resume

Avoiding provider filtering on resume is consistent with preserving prior run topology. LGTM.


341-348: Grader provider injection conditioned on non-resume is reasonable

Setting defaultTest.options.provider only when not resuming avoids changing persisted behavior. LGTM.


349-355: Merging --var into defaultTest only when not resuming is correct

This prevents altering variable sets for resumed runs. LGTM.


356-358: generateSuggestions gating matches intent

Not enabling suggestions for a resumed run is consistent with reproducibility. LGTM.


699-765: Watch mode disabled during resume is appropriate

Avoids unexpected re-entry into a resumed run and aligns with reproducibility. LGTM.


Comment @coderabbitai help to get the list of available commands and usage tips.

@mldangelo mldangelo requested a review from typpo September 29, 2025 06:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant