-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
autoresearchAutoresearch optimization loopAutoresearch optimization loopfindingAutoresearch discoveryAutoresearch discovery
Description
Dialog Cluster Optimization — Dry Run Results
Two templates tested over the dialog cluster (frequency_penalty, presence_penalty) with 30 iterations each.
Template: board_meeting (seed 900)
Pareto frontier (4 configs):
| Run ID | Quality | Cost | CR |
|---|---|---|---|
| dry_d5220354a722 | 0.8555 | $0.1135 | 0.7353 |
| dry_e7bde27876d5 | 0.8647 | $0.1204 | 0.7462 |
| dry_e71b269e4c0f | 0.8661 | $0.1488 | 0.7570 |
| dry_f1b67d775db7 | 0.8838 | $0.1573 | 0.7957 |
- Best quality: q=0.8838 (freq=0.4436, pres=0.3461)
- Best efficiency: eff=7.5362 (freq=0.5579, pres=0.4826)
Template: hound_shadow_directorial (seed 1000)
Pareto frontier (3 configs):
| Run ID | Quality | Cost | CR |
|---|---|---|---|
| dry_1a474fa5b69a | 0.6549 | $0.1146 | 0.3627 |
| dry_8737100dc6c7 | 0.7622 | $0.1244 | 0.5679 |
| dry_ea8099c8993d | 0.9142 | $0.1289 | 0.8424 |
- Best quality: q=0.9142 (freq=0.4679, pres=0.5347)
- Best efficiency: eff=7.0929 (same config — dominates the frontier)
Key Findings
- hound_shadow_directorial achieves higher peak quality (0.9142 vs 0.8838) and also better cost-efficiency on its best config
- Frequency penalty sweet spot: 0.40–0.56 — both templates converge on moderate frequency penalty; values outside this range tend to get discarded
- Presence penalty is more variable: board_meeting favors lower (0.35–0.48), hound_shadow_directorial benefits from higher (0.53) presence penalty for peak quality
- Balanced penalties (~0.45 freq / ~0.50 pres) are the most robust starting point for dialog quality
- Very low presence penalty (<0.1) tanks quality even with good frequency penalty (see dry_1a474fa5b69a: q=0.65)
- Cost stays in a tight $0.11–0.19 band regardless of dialog params — dialog penalties don't significantly affect token usage
Recommended Defaults
For production dialog config:
frequency_penalty: 0.45presence_penalty: 0.50
This balances quality and cost across both narrative structures tested.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
autoresearchAutoresearch optimization loopAutoresearch optimization loopfindingAutoresearch discoveryAutoresearch discovery