Non-record: Fixed Bank QAT + XSA5 + Label Smoothing (1.1352) by suchitj2702 · Pull Request #667 · openai/parameter-golf

suchitj2702 · 2026-03-25T01:24:33Z

Summary

Non-record experimental submission exploring three changes on top of the LeakyReLU + Legal TTT + Parallel Muon stack:

Fixed broken Bank QAT: Implemented STE int6 fake-quantization directly in GPT.forward() for all bank parameters. The SOTA's QAT is dead code because bank params bypass CastedLinear and torch.compile constant-folds the _qat_enabled flag.
XSA expanded to 5 layers (from 4)
Label smoothing 0.05 added to cross-entropy loss
TTT hyperparameter tuning: LR 0.003 (from 0.002), momentum 0.95 (from 0.9)

Result: 1.1352 BPB — does not beat SOTA (1.1194). Submitted as a non-record with findings.

Key Findings

Change	Impact
QAT fix	Sound idea but recompilation costs ~50s + 5ms/step → 460 fewer training steps
Label smoothing	Counterproductive — model is compute-limited, not overfitting
XSA5	Neutral to slightly negative vs XSA4
TTT LR/momentum tuning	Original values (0.002/0.9) were better

Test plan

Ran on 1xH100 SXM (smoke test, 907 steps) — all code paths work
Ran on 8xH100 SXM with QAT enabled — 6,719 steps, 1.1376 BPB
Ran on 8xH100 SXM without QAT — 7,062 steps, 1.1352 BPB
Artifact size: 15.44 MB (under 16 MB cap)
Eval time: ~530s (under 600s cap)

🤖 Generated with Claude Code

Non-record experimental submission exploring: - STE int6 fake-quantization fix for bank parameters (QAT was dead code) - XSA expanded to last 5 layers - Label smoothing 0.05 - TTT LR/momentum tuning Result: 1.1352 BPB (worse than SOTA 1.1194). Key findings: - QAT recompilation too expensive (~50s + 5ms/step overhead) - Label smoothing counterproductive on compute-limited model - XSA5 neutral-to-negative vs XSA4 - Original TTT hyperparameters were better Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: Fixed Bank QAT + XSA5 + Label Smoothing (1.1352)#667

Non-record: Fixed Bank QAT + XSA5 + Label Smoothing (1.1352)#667
suchitj2702 wants to merge 1 commit intoopenai:mainfrom
suchitj2702:submission/fixed-qat-xsa5-label-smoothing

suchitj2702 commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

suchitj2702 commented Mar 25, 2026

Summary

Key Findings

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant