Skip to content

Autoresearch Pro-6: Dialog quality findings (M10/M11) #23

@realityinspector

Description

@realityinspector

Dialog Cluster Optimization — Dry Run Results

Two templates tested over the dialog cluster (frequency_penalty, presence_penalty) with 30 iterations each.

Template: board_meeting (seed 900)

Pareto frontier (4 configs):

Run ID Quality Cost CR
dry_d5220354a722 0.8555 $0.1135 0.7353
dry_e7bde27876d5 0.8647 $0.1204 0.7462
dry_e71b269e4c0f 0.8661 $0.1488 0.7570
dry_f1b67d775db7 0.8838 $0.1573 0.7957
  • Best quality: q=0.8838 (freq=0.4436, pres=0.3461)
  • Best efficiency: eff=7.5362 (freq=0.5579, pres=0.4826)

Template: hound_shadow_directorial (seed 1000)

Pareto frontier (3 configs):

Run ID Quality Cost CR
dry_1a474fa5b69a 0.6549 $0.1146 0.3627
dry_8737100dc6c7 0.7622 $0.1244 0.5679
dry_ea8099c8993d 0.9142 $0.1289 0.8424
  • Best quality: q=0.9142 (freq=0.4679, pres=0.5347)
  • Best efficiency: eff=7.0929 (same config — dominates the frontier)

Key Findings

  1. hound_shadow_directorial achieves higher peak quality (0.9142 vs 0.8838) and also better cost-efficiency on its best config
  2. Frequency penalty sweet spot: 0.40–0.56 — both templates converge on moderate frequency penalty; values outside this range tend to get discarded
  3. Presence penalty is more variable: board_meeting favors lower (0.35–0.48), hound_shadow_directorial benefits from higher (0.53) presence penalty for peak quality
  4. Balanced penalties (~0.45 freq / ~0.50 pres) are the most robust starting point for dialog quality
  5. Very low presence penalty (<0.1) tanks quality even with good frequency penalty (see dry_1a474fa5b69a: q=0.65)
  6. Cost stays in a tight $0.11–0.19 band regardless of dialog params — dialog penalties don't significantly affect token usage

Recommended Defaults

For production dialog config:

  • frequency_penalty: 0.45
  • presence_penalty: 0.50

This balances quality and cost across both narrative structures tested.

Metadata

Metadata

Assignees

No one assigned

    Labels

    autoresearchAutoresearch optimization loopfindingAutoresearch discovery

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions