Running Finance Agent Benchmark

Our Finance Agent benchmark evaluates LLMs on their ability to use tools to research and answer complex financial questions about companies, financial statements, and SEC filings.

The agent has access to the following tools:

web_search: Search the web for information (via Tavily)
edgar_search: Search the SEC's EDGAR database for filings
parse_html_page: Parse and extract content from web pages
retrieve_information: Access stored information from previous steps

For more details on the benchmark, please refer to our public website.

Set up

Dependencies

Install uv for dependency management. Then run:

make install
source .venv/bin/activate

Platform

Access to the Vals platform is gated and requires approval. Please reach out to us at vals.ai to request access.

Once approved, make an account on platform.vals.ai with your company email address. Go to the admin page and create a new API key for yourself.

Environment Variables

Create a .env file in the root of the project and add the following:

VALS_API_KEY=<api_key>

# LLM API Keys (only set the ones you plan on using)
OPENAI_API_KEY=<openai_api_key>
ANTHROPIC_API_KEY=<anthropic_api_key>
GOOGLE_API_KEY=<google_api_key>
ETC_API_KEY=<etc_api_key>

# Tool API Keys
TAVILY_API_KEY=<tavily_api_key>
SEC_EDGAR_API_KEY=<sec_api_key>  # supports semicolon-separated keys for round-robin rotation, e.g. key1;key2;key3

You can create a Tavily API key here, and an SEC API key here.

The .env takes precedence over set environment variables.

Finally, you should add the "Test Suite IDs" to suites.json. These should have generally been provided to you via email, but you can also find them in the platform, by navigating to the "Test Suites" page, clicking the relevant test suite, and looking on the right sidebar under "Test Suite ID".

Running the benchmark

For a list of command line options, run finance-agent --help

To run, for example, a single question on openai/gpt-5.2-2025-12-11:

finance-agent --questions "What was Apple's revenue in 2023?" --model openai/gpt-5.2-2025-12-11

You can specify multiple questions at once:

finance-agent --questions "What was Apple's revenue in 2023?" "What was NFLX's revenue in 2024?"

You can also specify a list of questions in a text file, one question per line:

finance-agent --question-file data/public.txt

The default configuration is the one we used to run the benchmark.

List of Models

A list of available models can be found at our model library, and also by running make browse-models in the model library repository.

To run your own harness or model, just modify the get_custom_model function as needed. To see the full documentation on how the SDK works, visit our docs.

Logs

The agent writes detailed logs to the logs/ directory. Each run creates a timestamped directory with per-question log files containing tool usage, token counts, and error tracking.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
data		data
finance_agent		finance_agent
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Running Finance Agent Benchmark

Set up

Dependencies

Platform

Environment Variables

Running the benchmark

List of Models

Logs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Running Finance Agent Benchmark

Set up

Dependencies

Platform

Environment Variables

Running the benchmark

List of Models

Logs

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages