feat: active benchmark measurement (Story 20.1)#103
Conversation
Implement the "Measure" button feature for token efficiency measurement. Sends controlled test requests to the Anthropic Messages API, forces usage polls, and computes tokens-per-percent (TPP) from observed utilization deltas. Key components: - TPPMeasurement model with BenchmarkVariant and MeasurementSource enums - Database migration v6->v7 with tpp_measurements table and indexes - BenchmarkService: Messages API integration, 3 variants (output/input/cache-heavy), adaptive retry when delta is below detection threshold - TPPStorageService: SQLite persistence for benchmark results - BenchmarkSectionView: analytics UI with progress, result cards, weighting discovery - Settings UI: Token Efficiency section with enable toggle and variant checkboxes - Forced poll integration via PollingEngine.performForcedPoll() - Full service wiring through AppDelegate -> AnalyticsWindow -> AnalyticsView
- validatePreconditions: remove dead code (both else branches returned .tokenExpired identical); require .connected status, not .disconnected - BenchmarkSectionView: fix ForEach non-unique IDs when multiple variants run for same model (was id: \.model, now uses enumerated offset) - SettingsView: call syncBenchmarkVariants() in reset action so variant preference changes are actually persisted, not just reflected in UI - Story status: in-progress -> done; sprint-status synced
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (25)
📝 WalkthroughWalkthroughImplemented Story 20.1: a complete active-benchmark measurement system featuring token-per-percent (TPP) measurement models, a Changes
Sequence DiagramsequenceDiagram
participant UI as BenchmarkSectionView
participant BS as BenchmarkService
participant PE as PollingEngine
participant API as Messages API
participant AppState
participant Storage as TPPStorageService
participant DB as DatabaseManager
UI->>BS: validatePreconditions()
BS->>AppState: check OAuth, utilization, recent activity
AppState-->>BS: validation result
BS-->>UI: BenchmarkValidation (e.g., ready)
UI->>BS: runBenchmark(models, variants, onProgress)
activate BS
loop for each model & variant
BS->>UI: onProgress(.sendingRequest)
BS->>API: POST /messages with variant config
API-->>BS: MessagesAPIResponse (tokens)
BS->>UI: onProgress(.polling)
BS->>PE: performForcedPoll()
PE->>AppState: refresh utilization
AppState-->>PE: updated delta
PE-->>BS: poll complete
BS->>UI: onProgress(.computingResult)
BS->>BS: compute TPP from delta & tokens
BS->>Storage: storeBenchmarkResult(TPPMeasurement)
Storage->>DB: INSERT into tpp_measurements
DB-->>Storage: success
Storage-->>BS: result stored
BS->>UI: onProgress(.completed) with result
end
deactivate BS
BS-->>UI: [BenchmarkVariantResult]
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
Story
20.1: Active Benchmark Measurement ("Measure" Button)
Test plan
Summary by CodeRabbit
New Features
Chores