Skip to content

Add benchmarks: prompt size with vs without SkillMesh #3

@varunreddy

Description

@varunreddy

Summary

Create a benchmark script that measures prompt token counts, retrieval accuracy, and cost savings when using SkillMesh routing vs loading all tools into the prompt.

Acceptance Criteria

  • Create a benchmark script in benchmarks/ or scripts/
  • Measure token counts for full catalog vs top-K routed context (K=3, 5, 10)
  • Report accuracy: does the top-K include the correct expert for a set of test queries?
  • Include at least 20 test queries across data, ML, DevOps, web, and cloud domains
  • Output a summary table suitable for inclusion in the README
  • Document how to run the benchmarks

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions