Add compound tool reliability scoring by ccf · Pull Request #219 · ccf/primer

ccf · 2026-04-23T23:38:12Z

Summary

add call volume and average call-chain length to toolchain reliability rows
estimate per-call reliability from observed session success, then project a standardized 10-step compound reliability rate
surface calls and 10-step compound reliability in the Harness Intelligence toolchain reliability table
mark the roadmap item shipped

Verification

PYTHONPATH=/Users/ccf/git/primer/src:/Users/ccf/git/primer pytest --import-mode=importlib tests/test_maturity.py -q
ruff check src/primer/common/schemas.py src/primer/server/services/maturity_service.py tests/test_maturity.py
ruff format --check src/primer/common/schemas.py src/primer/server/services/maturity_service.py tests/test_maturity.py
cd frontend && ./node_modules/.bin/eslint src/components/maturity/toolchain-reliability-table.tsx src/components/maturity/__tests__/toolchain-reliability-table.test.tsx src/types/api.ts
cd frontend && ./node_modules/.bin/vitest run --run src/components/maturity/__tests__/toolchain-reliability-table.test.tsx
cd frontend && ./node_modules/.bin/tsc --noEmit
full pre-push hooks passed

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Compound reliability formula uses wrong rate base
- Changed the formula to derive the per-call success rate via nth-root (success_rate^(1/avg_calls)) and return that as the compound reliability rate, instead of double-compounding the already-aggregate session success rate.

Or push these changes by commenting:

@cursor push 134ec109df

Preview (134ec109df)

diff --git a/src/primer/server/services/maturity_service.py b/src/primer/server/services/maturity_service.py
--- a/src/primer/server/services/maturity_service.py
+++ b/src/primer/server/services/maturity_service.py
@@ -866,7 +866,8 @@
         avg_calls = bucket["total_call_count"] / len(bucket["sessions"])
         if avg_calls <= 0:
             return None
-        return round(success_rate**avg_calls, 3)
+        per_call_rate = success_rate ** (1.0 / avg_calls)
+        return round(per_call_rate, 3)
 
     customization_breakdown = [
         CustomizationUsage(

diff --git a/tests/test_maturity.py b/tests/test_maturity.py
--- a/tests/test_maturity.py
+++ b/tests/test_maturity.py
@@ -551,13 +551,13 @@
     assert read_tool["failure_session_rate"] == 0.5
     assert read_tool["recovery_rate"] == 0.5
     assert read_tool["success_rate"] == 0.5
-    assert read_tool["compound_reliability_rate"] == 0.0
+    assert read_tool["compound_reliability_rate"] == 0.955
     assert read_tool["abandonment_rate"] == 0.5
 
     bash_tool = reliability_rows[("built_in_tool", "Bash")]
     assert bash_tool["success_rate"] == 0.5
     assert bash_tool["avg_calls_per_session"] == 2.0
-    assert bash_tool["compound_reliability_rate"] == 0.25
+    assert bash_tool["compound_reliability_rate"] == 0.707
 
 
 def test_maturity_builds_delegation_patterns(

_{You can send follow-ups to the cloud agent here.}

^{Reviewed by Cursor Bugbot for commit 1f22d68. Configure here.}

Add compound tool reliability scoring

1f22d68

cursor Bot reviewed Apr 23, 2026

View reviewed changes

Comment thread src/primer/server/services/maturity_service.py Outdated

Fix compound reliability rate base

86141c6

ccf merged commit 06ca96b into main Apr 24, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add compound tool reliability scoring#219

Add compound tool reliability scoring#219
ccf merged 2 commits intomainfrom
feat/compound-tool-reliability

ccf commented Apr 23, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ccf commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Uh oh!

cursor Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ccf commented Apr 23, 2026 •

edited

Loading

cursor Bot left a comment •

edited

Loading