Mechanism to test CLI `--ai` Output Effectiveness

> No way to evaluate whether the AI-optimized output from CLI commands (using the -ai flag) is actually effective for AI agents. This is distinct from skill testing - it's about whether the format and content of CLI output gives agents the information they need in the most effective way possible. 

The question: "When an AI agent calls `tdn list --ai`, does the output convey the information effectively?". This is essentially testing "information density" - can the AI extract the information it needs, understand it in context and effectively use it to reason.

# Approach

LLM-as-Judge for information extraction. We might write test cases like this:

```yaml
vault: busy-freelancer
command: tdn list --project "Website Redesign" --ai
questions:
  - "How many tasks are in this project?"
  - "Which task is due soonest?"
  - "Are there any blocked tasks?"
ground_truth:
  - 7
  - "Update homepage copy (due 2025-01-15)"
  - "No"
 ```

We can then:

1. Run the command and get the output
2. Ask LLM One the questions based on the output
3. Have LLM Two compare it's answers to our ground truth.

### Rough thoughts on things to try

- Sparse information in vault (eg many empty projects and areas, few tasks) vs overflowing vault
- Totally un-primed LLM vs minimally-primed vs has our Skill available
- Effectiveness when piped to `head --n` with decreasing `n`
- Effectiveness or error responses - does it know what to try next?
- Thinking needed to get to next tool call - when the question is an instruction to do something, how quickly did the LLM return "make X tool call". 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mechanism to test CLI `--ai` Output Effectiveness #27

Approach

Rough thoughts on things to try

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Mechanism to test CLI --ai Output Effectiveness #27

Description

Approach

Rough thoughts on things to try

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Mechanism to test CLI `--ai` Output Effectiveness #27