-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Labels
documentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or request
Description
Summary
Create a benchmark script that measures prompt token counts, retrieval accuracy, and cost savings when using SkillMesh routing vs loading all tools into the prompt.
Acceptance Criteria
- Create a benchmark script in
benchmarks/orscripts/ - Measure token counts for full catalog vs top-K routed context (K=3, 5, 10)
- Report accuracy: does the top-K include the correct expert for a set of test queries?
- Include at least 20 test queries across data, ML, DevOps, web, and cloud domains
- Output a summary table suitable for inclusion in the README
- Document how to run the benchmarks
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
documentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or request