Add benchmarks: prompt size with vs without SkillMesh

## Summary
Create a benchmark script that measures prompt token counts, retrieval accuracy, and cost savings when using SkillMesh routing vs loading all tools into the prompt.

## Acceptance Criteria
- [ ] Create a benchmark script in `benchmarks/` or `scripts/`
- [ ] Measure token counts for full catalog vs top-K routed context (K=3, 5, 10)
- [ ] Report accuracy: does the top-K include the correct expert for a set of test queries?
- [ ] Include at least 20 test queries across data, ML, DevOps, web, and cloud domains
- [ ] Output a summary table suitable for inclusion in the README
- [ ] Document how to run the benchmarks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmarks: prompt size with vs without SkillMesh #3

Summary

Acceptance Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add benchmarks: prompt size with vs without SkillMesh #3

Description

Summary

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions