-
Notifications
You must be signed in to change notification settings - Fork 0
Closed
Description
Objective
Add proper skill-trigger evaluator support for the Codex provider. Currently, the evaluator maps provider kinds to tool-name semantics (e.g., Claude uses Skill/Read, Copilot uses Read File/readFile), but Codex has no mapping.
Context
Codex supports skills via .agents/ or .codex/ folders. However, its tool call format is different from Claude/Copilot:
command_execution— shell commands (how Codex reads files, including skill files)file_change— file modificationsmcp:server/tool— MCP tool calls
When Codex triggers a skill, it likely reads the skill file via a command_execution (e.g., cat .codex/skills/my-skill.md). Detecting skill triggering requires matching the skill name inside the command_execution input string rather than checking a dedicated Skill tool.
Design latitude
- Investigate what Codex's actual tool call looks like when it reads a skill file — is it always
command_executionwithcat? Does it use a different mechanism? - Decide whether to add Codex to
PROVIDER_TOOL_SEMANTICSwith a custom matcher, or whether theskill_tools/read_toolsconfig override (added in feat: make agentv-bench work across all agent harnesses #641) is sufficient for users to configure per-eval - Consider whether
command_executioninput is a string (not JSON with fields) — the currentToolMatcherexpects named fields likeinput.skillorinput.file_path, which may not fit
Acceptance signals
- Codex has an entry in
PROVIDER_TOOL_SEMANTICSinskill-trigger.ts - Unit test showing skill-trigger detection with a realistic Codex tool call
- Or: documented recommendation for using config overrides if a static mapping isn't feasible
Non-goals
- Changes to the Codex provider itself
- Adding
.agents/or.codex/folder discovery to agentv
Related
- feat: make agentv-bench work across all agent harnesses #641 — feat: make agentv-bench work across all agent harnesses (added multi-provider support)
- feat(evaluator): make skill-trigger target-agnostic #613 — make skill-trigger evaluator target-agnostic
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels