Autoresearch Pro-8: TDF + training data quality findings

## Summary

Pro-8 TDF + training data quality sweep: 30 iterations x 11 templates (330 total configs), dry-run mode, seed 1200, 27-dimensional config space across 6 mechanism clusters (fidelity, temporal, knowledge, entity, model, dialog).

## Pareto Frontier (11 globally optimal configs)

| Run ID | quality_composite | cost_usd | causal_resolution |
|--------|------------------|----------|-------------------|
| dry_c6b8017a0e9a | 0.8474 | $0.01 | 0.7003 |
| dry_e6decce10b69 | 0.8680 | $0.02 | 0.7439 |
| dry_e0bb43f090ba | 0.8723 | $0.03 | 0.7351 |
| dry_f0ee92879a71 | 0.8794 | $0.03 | 0.7792 |
| dry_e0ab3a5b4da2 | 0.8801 | $0.04 | 0.7696 |
| dry_eaa9ce2a1863 | 0.8898 | $0.05 | 0.7625 |
| dry_f552c3ba6ea3 | 0.9016 | $0.05 | 0.8109 |
| dry_fa024f540806 | 0.9032 | $0.12 | 0.8191 |
| dry_fab762e388f0 | 0.9034 | $0.14 | 0.8081 |
| dry_fa3fd5bea489 | 0.9133 | $0.14 | 0.8327 |
| **dry_fe43c441925c** | **0.9176** | **$0.19** | **0.8478** |

**Best quality:** dry_fe43c441925c (q=0.9176)
**Best efficiency:** dry_c6b8017a0e9a (eff=84.74 quality/$)

## Highest Quality Config (Best for TDF Export / Training Data)

The config that produces the highest quality_composite (0.9176) across all 11 templates, and therefore the best training data for downstream fine-tuning:

**Model:** `qwen/qwen-2.5-72b-instruct` (temperature=0.5348, top_p=0.7288, max_tokens=4183)
**Compression:** NMF with 2 components
**Temporal:** directorial mode, dramatic_tension=0.4698, low foreshadowing (0.066), coincidence_boost=1.065
**Knowledge:** forecast_horizon=50d, max_expectations=8, anxiety_conservatism=0.35
**Entity:** animism_level=6, night_penalty=1.07, fatigue_accumulation=0.33
**Dialog:** very low frequency_penalty (0.097), low presence_penalty (0.174)

## Key Findings for Training Data Quality

### 1. Model selection dominates quality
- **Top-tier quality (>0.90)** consistently uses either `qwen/qwen-2.5-72b-instruct` or `mistralai/mistral-large-latest`
- **Budget tier (q=0.84-0.89)** achievable with `meta-llama/llama-3.1-8b-instruct` at ~5-10x lower cost
- `deepseek/deepseek-chat` lands in the middle at moderate cost

### 2. Temperature sweet spot: 0.5-0.95
- Best quality config uses temperature=0.5348 (moderate, coherent)
- Highest efficiency configs tend toward 0.9-1.05 (more creative but noisier)
- Very low (<0.3) or very high (>1.1) temperatures hurt quality_composite

### 3. NMF compression outperforms PCA/SVD for quality
- 5 of 7 configs with q>0.90 use NMF
- Fewer components (2-7) is better than many (10) for quality
- SVD appears in one high-quality config (cyclical mode, q=0.8794)

### 4. Directorial and cyclical temporal modes produce highest quality
- The top config uses directorial mode with moderate dramatic tension (~0.47)
- Cyclical mode appears in 3 of the top 7 Pareto configs
- Branching mode is mid-tier; portal/forward not represented on the frontier

### 5. Dialog penalties should be low for quality
- Best quality config: freq_penalty=0.097, presence_penalty=0.174
- High penalties (>0.5) appear only in budget-tier configs
- This makes sense: low repetition penalties allow richer, more detailed narrative outputs

### 6. High animism level correlates with quality
- Top config uses animism_level=6 (max)
- Budget configs use level 2-4
- Higher animism = more entity variety = richer training data

## Cost Implications

| Tier | quality_composite | cost_usd | Recommended Use |
|------|------------------|----------|-----------------|
| Premium (qwen-72b/mistral-large) | 0.90-0.92 | $0.12-0.19 | Production TDF export, fine-tuning dataset |
| Mid-range (8b + high tokens) | 0.88-0.90 | $0.04-0.05 | Validation runs, secondary training data |
| Budget (8b, low tokens) | 0.84-0.87 | $0.01-0.03 | Rapid prototyping, parameter sweeps |

For a 1000-run training dataset: Premium = ~$150-190, Mid-range = ~$40-50, Budget = ~$10-30.

## Recommendation

For TDF export quality targeting downstream fine-tuning:
1. Use the Premium config (qwen-72b, NMF-2, directorial, low dialog penalties) for the core training set
2. Augment with Mid-range configs across different temporal modes for diversity
3. The 19x cost difference between budget and premium is justified by the 8% quality improvement (0.84 vs 0.92), which compounds through fine-tuning loss curves

## Artifacts

- Results JSONL: `autoresearch/results/dry_run_20260316_083717.jsonl` (330 runs)
- Pareto frontier: `autoresearch/results/pareto_20260316_083717.json` (11 optimal configs)
- Branch: `autoresearch/pro/tdf-training`
- Seed: 1200, iterations: 30/template, templates: 11 (all)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autoresearch Pro-8: TDF + training data quality findings #24

Summary

Pareto Frontier (11 globally optimal configs)

Highest Quality Config (Best for TDF Export / Training Data)

Key Findings for Training Data Quality

1. Model selection dominates quality

2. Temperature sweet spot: 0.5-0.95

3. NMF compression outperforms PCA/SVD for quality

4. Directorial and cyclical temporal modes produce highest quality

5. Dialog penalties should be low for quality

6. High animism level correlates with quality

Cost Implications

Recommendation

Artifacts

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Run ID	quality_composite	cost_usd	causal_resolution
dry_c6b8017a0e9a	0.8474	$0.01	0.7003
dry_e6decce10b69	0.8680	$0.02	0.7439
dry_e0bb43f090ba	0.8723	$0.03	0.7351
dry_f0ee92879a71	0.8794	$0.03	0.7792
dry_e0ab3a5b4da2	0.8801	$0.04	0.7696
dry_eaa9ce2a1863	0.8898	$0.05	0.7625
dry_f552c3ba6ea3	0.9016	$0.05	0.8109
dry_fa024f540806	0.9032	$0.12	0.8191
dry_fab762e388f0	0.9034	$0.14	0.8081
dry_fa3fd5bea489	0.9133	$0.14	0.8327
dry_fe43c441925c	0.9176	$0.19	0.8478

Tier	quality_composite	cost_usd	Recommended Use
Premium (qwen-72b/mistral-large)	0.90-0.92	$0.12-0.19	Production TDF export, fine-tuning dataset
Mid-range (8b + high tokens)	0.88-0.90	$0.04-0.05	Validation runs, secondary training data
Budget (8b, low tokens)	0.84-0.87	$0.01-0.03	Rapid prototyping, parameter sweeps

Autoresearch Pro-8: TDF + training data quality findings #24

Description

Summary

Pareto Frontier (11 globally optimal configs)

Highest Quality Config (Best for TDF Export / Training Data)

Key Findings for Training Data Quality

1. Model selection dominates quality

2. Temperature sweet spot: 0.5-0.95

3. NMF compression outperforms PCA/SVD for quality

4. Directorial and cyclical temporal modes produce highest quality

5. Dialog penalties should be low for quality

6. High animism level correlates with quality

Cost Implications

Recommendation

Artifacts

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions