feat(leaderboard): ghost performance leaderboard and A/B testing#79
Merged
feat(leaderboard): ghost performance leaderboard and A/B testing#79
Conversation
3d7ee0d to
d6578ff
Compare
- GhostLeaderboard: SQLite-backed outcome tracking per ghost profile - TaskOutcome: records success, latency, token usage, user rating (-1/0/1) - GhostMetrics: aggregate stats with composite rank_score (60% success, 20% user rating, 20% token efficiency) - A/B testing: route configurable fraction of requests to a challenger ghost - Auto-promotion: recommend promoting challenger when it outperforms control by promotion_threshold (default: 10%) over min_samples (default: 50) - ASCII leaderboard with star ratings and head-to-head comparison - 'sparks leaderboard [show|compare <a> <b>|reset]' CLI subcommand - 7 unit tests covering recording, ranking, A/B routing, promotion check Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add TaskOutcome::new() convenience constructor to reduce struct literal verbosity in tests
- Fix compare() output: add column headers (ghost names + separator) so each column is identifiable
- Use successful_tasks in format_row() ("{success}/{total}") to eliminate dead-field warning
- Add four missing tests: ab_route with fraction=1.0, rank_score with zero tokens, reset(), and format_leaderboard with data
- Add [leaderboard] section to config.example.toml with all fields documented
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
d6578ff to
7f76e4f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds a performance leaderboard for ghost profiles with SQLite-backed tracking and A/B testing support.
Changes
src/leaderboard.rs—GhostLeaderboardwith outcome recording, rankings, A/B routing, auto-promotion recommendations, and head-to-head comparisonsrc/config.rs—LeaderboardConfigwithab_test_ghost,ab_test_fraction,min_samples_for_recommendation,promotion_thresholdsrc/main.rs—sparks leaderboard [show|compare <a> <b>|reset]config.example.toml—[leaderboard]sectionFeatures
Leaderboard
Ranks all ghost profiles by composite score: 60% success rate + 20% user rating + 20% token efficiency.
A/B Testing
Route
ab_test_fraction(default: 10%) of requests to a challenger ghost, then automatically recommend promotion when the challenger outperforms the control bypromotion_threshold(default: 10%) overmin_samples(default: 50) tasks.Comparison
Type of Change
Pre-PR Checklist
cargo check -qpassescargo test -qpasses (398 tests, 0 failed)