Record: 5-expert Hedge Mixer + TTT (3-seed mean val_bpb=1.0745) by RoyiRa · Pull Request #688 · openai/parameter-golf

RoyiRa · 2026-03-25T06:32:22Z

Summary

3-seed mean val_bpb: 1.0745 (std 0.021) | <15.5 MB | 8xH100 SXM, 600s

Results

Seed	Pre-TTT BPB	Post-TTT BPB	Artifact
1337	1.1248	1.0560	15.48 MB
42	1.1257	1.0970	15.41 MB
7	1.1251	1.0704	15.43 MB
Mean	1.1252	1.0745

Key Technique: 5-expert Logistic Context Mixer

GPU-vectorized online context mixing using the Hedge algorithm. Five experts blend predictions in log-probability space during TTT eval:

Expert	Source
Neural	Base model log-softmax
Unigram	Token frequency from scored tokens
Bigram	P(next \| prev) from scored tokens
Trigram	Hashed P(next \| prev2, prev1) with 64K buckets
Entropy	Neural model entropy as confidence regularizer

N-gram tables built incrementally from already-scored tokens only (legal). Expert weights updated online via Hedge: log_w -= eta * loss.

Each expert produces an NLL for every token. The mixer maintains learned weights (one per expert) updated via the Hedge algorithm. At each position, the mixed prediction is:
mixed_NLL = -log(sum_k w_k * exp(-NLL_k))

Training Budget

GPTQ calibration runs within the 600s training budget (18s reserved).

Phase	Time
Training loop	582s
EMA + GPTQ calibration + quantization	~18s
Total training	~600s
TTT eval with mixer	~562s

Reproduction

pip install -r requirements.txt
SEED=1337 MAX_WALLCLOCK_SECONDS=600 USE_MIXER=1 TTT_LR=0.0001 TTT_CHUNK_TOKENS=131072 \
  torchrun --standalone --nproc_per_node=8 train_gpt.py

Credits

Base model: PR Record: int5 GPTQ + Soft-Round QAT (3-seed mean 1.1162) #606 by @gowtham0992
TTT recipe: PR Non-record: 11L Depth Recurrence + High-Yield Legal TTT (1.14458 BPB) #461 by @Christopher-Lee-McClendon

RoyiRa added 3 commits March 25, 2026 08:31

Record: 5-expert Hedge Mixer + TTT (3-seed mean val_bpb=1.0745)

bba994f

Move submission to records/track_10min_16mb/

a7afc01

Remove old submission directory

9a1d2fd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: 5-expert Hedge Mixer + TTT (3-seed mean val_bpb=1.0745)#688

Record: 5-expert Hedge Mixer + TTT (3-seed mean val_bpb=1.0745)#688
RoyiRa wants to merge 3 commits intoopenai:mainfrom
RoyiRa:submission-2026-03-24

RoyiRa commented Mar 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RoyiRa commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Results

Key Technique: 5-expert Logistic Context Mixer

Training Budget

Reproduction

Credits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RoyiRa commented Mar 25, 2026 •

edited

Loading