Skip to content

Add compound tool reliability scoring#219

Merged
ccf merged 2 commits intomainfrom
feat/compound-tool-reliability
Apr 24, 2026
Merged

Add compound tool reliability scoring#219
ccf merged 2 commits intomainfrom
feat/compound-tool-reliability

Conversation

@ccf
Copy link
Copy Markdown
Owner

@ccf ccf commented Apr 23, 2026

Summary

  • add call volume and average call-chain length to toolchain reliability rows
  • estimate per-call reliability from observed session success, then project a standardized 10-step compound reliability rate
  • surface calls and 10-step compound reliability in the Harness Intelligence toolchain reliability table
  • mark the roadmap item shipped

Verification

  • PYTHONPATH=/Users/ccf/git/primer/src:/Users/ccf/git/primer pytest --import-mode=importlib tests/test_maturity.py -q
  • ruff check src/primer/common/schemas.py src/primer/server/services/maturity_service.py tests/test_maturity.py
  • ruff format --check src/primer/common/schemas.py src/primer/server/services/maturity_service.py tests/test_maturity.py
  • cd frontend && ./node_modules/.bin/eslint src/components/maturity/toolchain-reliability-table.tsx src/components/maturity/__tests__/toolchain-reliability-table.test.tsx src/types/api.ts
  • cd frontend && ./node_modules/.bin/vitest run --run src/components/maturity/__tests__/toolchain-reliability-table.test.tsx
  • cd frontend && ./node_modules/.bin/tsc --noEmit
  • full pre-push hooks passed

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Compound reliability formula uses wrong rate base
    • Changed the formula to derive the per-call success rate via nth-root (success_rate^(1/avg_calls)) and return that as the compound reliability rate, instead of double-compounding the already-aggregate session success rate.

Create PR

Or push these changes by commenting:

@cursor push 134ec109df
Preview (134ec109df)
diff --git a/src/primer/server/services/maturity_service.py b/src/primer/server/services/maturity_service.py
--- a/src/primer/server/services/maturity_service.py
+++ b/src/primer/server/services/maturity_service.py
@@ -866,7 +866,8 @@
         avg_calls = bucket["total_call_count"] / len(bucket["sessions"])
         if avg_calls <= 0:
             return None
-        return round(success_rate**avg_calls, 3)
+        per_call_rate = success_rate ** (1.0 / avg_calls)
+        return round(per_call_rate, 3)
 
     customization_breakdown = [
         CustomizationUsage(

diff --git a/tests/test_maturity.py b/tests/test_maturity.py
--- a/tests/test_maturity.py
+++ b/tests/test_maturity.py
@@ -551,13 +551,13 @@
     assert read_tool["failure_session_rate"] == 0.5
     assert read_tool["recovery_rate"] == 0.5
     assert read_tool["success_rate"] == 0.5
-    assert read_tool["compound_reliability_rate"] == 0.0
+    assert read_tool["compound_reliability_rate"] == 0.955
     assert read_tool["abandonment_rate"] == 0.5
 
     bash_tool = reliability_rows[("built_in_tool", "Bash")]
     assert bash_tool["success_rate"] == 0.5
     assert bash_tool["avg_calls_per_session"] == 2.0
-    assert bash_tool["compound_reliability_rate"] == 0.25
+    assert bash_tool["compound_reliability_rate"] == 0.707
 
 
 def test_maturity_builds_delegation_patterns(

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit 1f22d68. Configure here.

Comment thread src/primer/server/services/maturity_service.py Outdated
@ccf ccf merged commit 06ca96b into main Apr 24, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant