Skip to content

Security: BXS-Lab/EVA-Bench

Security

SECURITY.md

Security Policy

Reporting Vulnerabilities

If you discover a security vulnerability in EVA-Bench, please report it privately by emailing the maintainers rather than opening a public issue.

Trust Boundaries

eval() Usage

EVA-Bench uses Python eval() in two locations for evaluating condition expressions:

  1. src/eva_bench/scorer/sentinels.py_eval_logic_condition() evaluates sentinel trigger conditions.
  2. src/eva_bench/simulator/engine.py_eval_trigger_condition() evaluates contingency injection triggers.

Trust model: All condition strings are authored by benchmark maintainers in committed task JSON files. They are not derived from model output, user input, or any external source. The eval() calls:

  • Clear __builtins__ to prevent access to Python built-ins.
  • Inject only safe helper functions (comparisons, state lookups).
  • Are bounded to simple boolean expressions.

If you extend EVA-Bench with user-supplied or model-generated condition strings, replace eval() with a safe expression evaluator such as simpleeval or an AST-based parser.

API Keys

API keys for model providers (OpenAI, Anthropic, Google) are loaded from environment variables via .env files. The .env file is excluded from version control via .gitignore. The included .env.example contains only placeholder values.

Benchmark Data

Task JSON files, traces, and scoring results contain no personally identifiable information. The NASA corpus consists of publicly available documents from the NASA Technical Reports Server (NTRS).

There aren’t any published security advisories