contextbridge · hev · Dec 21, 2025
diff --git a/BENCHMARK_RESULTS.md b/BENCHMARK_RESULTS.md
@@ -0,0 +1,188 @@
+# MCP Universe Benchmark Results
+
+Comparison of Claude Code agent performance across different MCP transport configurations.
+
+## Test Configuration
+
+- **Test Suite**: Repository Management (10 GitHub tasks, 48 evaluations)
+- **Agent**: claude-code-agent
+- **Model**: claude-opus-4-5-20251101
+- **Max Iterations**: 20
+
+---
+
+## Results Summary
+
+| Run | Transport | Passed | Failed | Score | Total Time | Notes |
+|-----|-----------|--------|--------|-------|------------|-------|
+| **Run 1** | Direct API (GitHub MCP via Docker) | 15 | 33 | **31.25%** | ~50min | Baseline |
+| **Run 2** | ContextBridge (mcp-remote stdio) | 15 | 33 | **31.25%** | ~50min | Same as baseline |
+| **Run 3** | ContextBridge (HTTP transport) | 1 | 47 | **2.08%** | ~50min | Significant regression |
+
+---
+
+## Run 1: Direct API (Baseline)
+
+**Date**: 2024-12-20 07:30
+**Report**: `log/report_20251220_073022_060a7ee3-59a8-4c92-970b-5df96e9e5c81.md`
+**Transport**: GitHub MCP server via Docker (stdio)
+
+| Task | Passed | Failed | Score | Time |
+|------|--------|--------|-------|------|
+| github_task_0001 | 3 | 4 | 0.43 | 111s |
+| github_task_0002 | 2 | 5 | 0.29 | - |
+| github_task_0003 | 2 | 8 | 0.20 | - |
+| github_task_0004 | 3 | 4 | 0.43 | - |
+| github_task_0005 | 3 | 4 | 0.43 | - |
+| github_task_0006 | 0 | 2 | 0.00 | - |
+| github_task_0007 | 1 | 1 | 0.50 | - |
+| github_task_0008 | 0 | 2 | 0.00 | - |
+| github_task_0009 | 0 | 2 | 0.00 | - |
+| github_task_0010 | 1 | 1 | 0.50 | - |
+
+**Total**: 15/48 passed (31.25%)
+
+---
+
+## Run 2: ContextBridge (mcp-remote stdio)
+
+**Date**: 2024-12-20 07:58
+**Report**: `log/report_20251220_075853_f9e8c86a-8599-4c75-b3ce-5ffc73b6db91.md`
+**Transport**: ContextBridge via mcp-remote (stdio proxy)
+
+| Task | Passed | Failed | Score | Time |
+|------|--------|--------|-------|------|
+| github_task_0001 | 2 | 5 | 0.29 | 82s |
+| github_task_0002 | 3 | 4 | 0.43 | - |
+| github_task_0003 | 2 | 8 | 0.20 | - |
+| github_task_0004 | 3 | 4 | 0.43 | - |
+| github_task_0005 | 3 | 4 | 0.43 | - |
+| github_task_0006 | 0 | 2 | 0.00 | - |
+| github_task_0007 | 0 | 2 | 0.00 | - |
+| github_task_0008 | 1 | 1 | 0.50 | - |
+| github_task_0009 | 0 | 2 | 0.00 | - |
+| github_task_0010 | 1 | 1 | 0.50 | - |
+
+**Total**: 15/48 passed (31.25%)
+
+---
+
+## Run 3: ContextBridge (HTTP transport)
+
+**Date**: 2024-12-20 09:58
+**Report**: `log/report_20251220_095837_8ec88e24-a7d9-4a30-9e8f-e8d74c4783f3.md`
+**Transport**: ContextBridge via Claude Code SDK HTTP transport
+
+| Task | Passed | Failed | Score | Time |
+|------|--------|--------|-------|------|
+| github_task_0001 | 0 | 7 | 0.00 | 100s |
+| github_task_0002 | 0 | 7 | 0.00 | 12s |
+| github_task_0003 | 0 | 10 | 0.00 | 225s |
+| github_task_0004 | 0 | 7 | 0.00 | 153s |
+| github_task_0005 | 0 | 7 | 0.00 | 565s |
+| github_task_0006 | 0 | 2 | 0.00 | 405s |
+| github_task_0007 | 1 | 1 | 0.50 | 395s |
+| github_task_0008 | 0 | 2 | 0.00 | 52s |
+| github_task_0009 | 0 | 2 | 0.00 | 305s |
+| github_task_0010 | 0 | 2 | 0.00 | 567s |
+
+**Total**: 1/48 passed (2.08%)
+
+### Run 3 Failure Analysis
+
+Primary failure reasons:
+- **"the repository doesn't exist"** - Most common failure, indicates the agent couldn't create repos via ContextBridge
+- **"the branches don't exist"** - Secondary failure
+- **"the file content is not found"** - Tertiary failure
+- **"the PR doesn't exist"** - Downstream failure
+
+**Root Cause**: The Claude Code SDK HTTP transport to ContextBridge appears to have connectivity or authentication issues. The agent received the prompts but couldn't execute GitHub operations through the gateway.
+
+---
+
+## Analysis
+
+### Performance Comparison
+
+| Metric | Run 1 (Direct) | Run 2 (mcp-remote) | Run 3 (HTTP) |
+|--------|----------------|--------------------| --------------|
+| Success Rate | 31.25% | 31.25% | 2.08% |
+| Total Passed | 15 | 15 | 1 |
+| Total Failed | 33 | 33 | 47 |
+| Task 1 Latency | 111s | 82s | 100s |
+
+### Key Findings
+
+1. **Run 1 vs Run 2**: Equivalent performance
+   - Both achieved 31.25% success rate
+   - mcp-remote stdio transport works correctly with ContextBridge
+   - Task-level variance exists but balances out
+
+2. **Run 3: HTTP transport failure**
+   - Dramatic regression: 2.08% vs 31.25%
+   - Only github_task_0007 partially succeeded (1/2 evals)
+   - All other tasks failed to create repositories
+   - Suggests HTTP transport configuration or ContextBridge authentication issue
+
+### Potential Run 3 Issues
+
+1. **HTTP transport not fully supported** by Claude Code SDK for MCP
+2. **Missing authentication headers** in HTTP config
+3. **ContextBridge gateway** may require different authentication for HTTP vs SSE
+4. **Tool discovery failure** - agent may not have received tool list from gateway
+
+---
+
+## Known Issues
+
+1. **Evaluator Bug**: `IndexError` in `github__get_file_contents` (line 61 in functions.py)
+   - `output.content[1].resource.text` fails when content list is empty
+   - Affects all runs equally
+
+2. **LLM Call Tracking**: Reports show 0 LLM calls for claude-code-agent
+   - Tracking issue only, doesn't affect actual execution
+
+---
+
+## Recommendations
+
+1. **Investigate HTTP transport failure**
+   - Check ContextBridge logs for Run 3
+   - Verify HTTP authentication is working
+   - Consider using mcp-remote as the stable option
+
+2. **Fix evaluator bug**
+   - Add bounds checking in `github__get_file_contents`
+   - Would likely improve reported success rates
+
+3. **For production use**
+   - Use mcp-remote stdio transport until HTTP is debugged
+   - Both Run 1 and Run 2 show equivalent 31.25% success rate
+
+---
+
+## Quick Mode Comparison (Run 4 vs Run 5)
+
+**Date**: 2024-12-21
+
+| Transport | Task 0001 | Task 0007 | Task 0010 | Total | Score | Time |
+|-----------|-----------|-----------|-----------|-------|-------|------|
+| **Run 4: Direct GitHub MCP** | 0/7 | 1/2 | 0/2 | **1/11** | **9.09%** | ~2min |
+| **Run 5: ContextBridge HTTP** | 0/7 | 0/2 | 0/2 | **0/11** | **0.00%** | ~2min |
+
+### Key Finding
+
+ContextBridge via HTTP transport performed worse than direct GitHub MCP:
+- The agent trace shows `search_tools` as first action instead of actual GitHub operations
+- Suggests tools aren't properly exposed via HTTP transport
+- Authentication works (Bearer token accepted) but tool discovery/execution may be incomplete
+
+### ContextBridge Connection Issues Encountered
+
+1. **mcp-remote SSE errors** - Required Node.js 20.18.1+ (upgraded to 22)
+2. **OAuth localhost callback** - ContextBridge only supports hosted callback, not localhost
+3. **HTTP transport fallback** - Used direct HTTP with Bearer token from cached auth
+
+---
+
+*Last Updated: 2024-12-21*
diff --git a/claude.md b/claude.md
@@ -0,0 +1,133 @@
+# MCP Universe - Fork for MCP Gateway Testing
+
+## Project Overview
+
+This is a fork of the original MCP Universe repository, specifically created to test and evaluate an MCP gateway implementation.
+
+## Project Goals
+
+### 1. Initial Testing Phase
+- **Objective**: Run repository management tests using direct Anthropic API access
+- **Approach**: Use personal Anthropic API key to establish baseline performance
+- **Test Suite**: Repository management benchmark (34 GitHub-related tasks)
+- **Models**: Testing with Claude 4.5 models (Sonnet, Opus, Haiku)
+
+### 2. MCP Gateway Integration Phase
+- **Objective**: Test the same benchmarks through an MCP gateway
+- **Approach**: Configure the gateway URL and route requests through it
+- **Purpose**: Validate gateway functionality and performance
+
+### 3. Comparison & Analysis Phase
+- **Objective**: Compare direct API vs. gateway performance
+- **Metrics to Compare**:
+  - Test pass/fail rates
+  - Response times
+  - Token usage
+  - Cost efficiency
+  - Error rates
+  - Overall reliability
+
+## Current Status
+
+**Phase**: Initial Setup
+**Next Step**: Run repository management tests with direct Anthropic API
+
+## Implementation Plan
+
+### Step 1: Direct Anthropic API Testing (Current)
+Detailed implementation plan saved at: `REPO_MANAGEMENT_TEST_PLAN.md`
+
+**Summary**:
+1. Configure `.env` with `ANTHROPIC_API_KEY` and GitHub credentials
+2. Update `mcpuniverse/benchmark/configs/test/repository_management.yaml`:
+   - Change `type: openai` → `type: claude`
+   - Set `model_name` to Claude 4.5 variant
+3. Run benchmark: `pytest tests/benchmark/test_benchmark_repository_management.py`
+4. Collect baseline metrics and results
+
+### Step 2: MCP Gateway Testing (Planned)
+1. Configure MCP gateway URL in environment
+2. Update configuration to route through gateway
+3. Run the same benchmark suite
+4. Collect gateway performance metrics
+
+### Step 3: Comparison Analysis (Planned)
+1. Compare direct API vs. gateway results
+2. Document performance differences
+3. Identify optimization opportunities
+4. Generate comprehensive comparison report
+
+## Repository Structure
+
+Key files and directories:
+- `REPO_MANAGEMENT_TEST_PLAN.md` - Detailed test execution plan
+- `mcpuniverse/benchmark/configs/test/` - Benchmark configurations
+- `tests/benchmark/` - Benchmark test suites
+- `log/` - Test execution logs and reports
+- `.env` - Environment configuration (not committed)
+
+## Reference Documentation
+
+### Previous Work
+- **OpenRouter Migration Plan**: `/Users/hev/.claude/plans/soft-swimming-snowflake.md`
+  - Documents previous effort to consolidate LLM providers
+  - Not currently active for this fork
+
+### Claude 4.5 Models
+| Model | API Name | Use Case |
+|-------|----------|----------|
+| Sonnet 4.5 | `claude-sonnet-4-5-20250929` | Balanced performance/cost |
+| Opus 4.5 | `claude-opus-4-5-20251101` | Maximum capability |
+| Haiku 4.5 | `claude-haiku-4-5` | Speed/cost optimization |
+
+## Testing Methodology
+
+### Baseline Testing (Direct API)
+- **Provider**: Anthropic (direct API)
+- **Authentication**: `ANTHROPIC_API_KEY`
+- **Configuration**: `type: claude` in YAML config
+- **Benchmark**: Repository management (34 tasks)
+
+### Gateway Testing (Upcoming)
+- **Provider**: MCP Gateway
+- **Authentication**: Gateway-specific credentials
+- **Configuration**: Gateway URL + model routing
+- **Benchmark**: Same 34 repository management tasks
+
+### Comparison Metrics
+1. **Functional Metrics**
+   - Task success rate
+   - Correctness of outputs
+   - Error handling
+
+2. **Performance Metrics**
+   - Request latency
+   - Total execution time
+   - Throughput
+
+3. **Cost Metrics**
+   - Token usage
+   - API costs
+   - Resource utilization
+
+4. **Reliability Metrics**
+   - Error rates
+   - Retry counts
+   - Failure patterns
+
+## Next Steps
+
+1. ✅ Create implementation plan (REPO_MANAGEMENT_TEST_PLAN.md)
+2. ⏳ Set up environment (.env file)
+3. ⏳ Run baseline tests with direct Anthropic API
+4. ⏳ Document baseline results
+5. ⏳ Configure MCP gateway
+6. ⏳ Run gateway tests
+7. ⏳ Generate comparison analysis
+8. ⏳ Document findings and recommendations
+
+---
+
+**Last Updated**: 2025-12-07
+**Primary Contact**: [Your contact info]
+**Original Repository**: [Link to upstream MCP-Universe]