Add multi-provider support (OpenAI, Gemini, Mistral/OpenRouter)

## Summary
Currently the benchmark runs against a single LLM provider. Add support for running the same benchmark against OpenAI (GPT-4o), Google Gemini (2.5 Pro), and open-source models via Mistral/OpenRouter to enable cross-provider comparison.

## What needs to happen
- [ ] Add provider configuration (env vars or config file) for OpenAI, Gemini, and Mistral/OpenRouter
- [ ] Abstract the LLM call layer so provider can be swapped without changing benchmark logic
- [ ] Add a `--provider` flag or config option to select which provider to run against
- [ ] Run the benchmark on each provider and record results
- [ ] Document required API keys and setup for each provider in README

## Acceptance criteria
- Benchmark can run against at least 3 providers (Claude, OpenAI, Gemini)
- Results are recorded in a comparable format regardless of provider
- README documents how to configure and run with each provider

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multi-provider support (OpenAI, Gemini, Mistral/OpenRouter) #1

Summary

What needs to happen

Acceptance criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add multi-provider support (OpenAI, Gemini, Mistral/OpenRouter) #1

Description

Summary

What needs to happen

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions