Skip to content

Record Submission: 1.1078 BPB — XSA6 + BigramHash4K on Hedge Mixer Stack#720

Open
agalimova wants to merge 1 commit intoopenai:mainfrom
agalimova:submission/xsa6-bigram4k-hedgemixer
Open

Record Submission: 1.1078 BPB — XSA6 + BigramHash4K on Hedge Mixer Stack#720
agalimova wants to merge 1 commit intoopenai:mainfrom
agalimova:submission/xsa6-bigram4k-hedgemixer

Conversation

@agalimova
Copy link

Summary

Changes from PR #700

Parameter Default Ours
XSA_LAST_N 4 6
BIGRAM_VOCAB_SIZE 2048 4096

Test plan

  • 3 seeds run on 8xH100 SXM (torch 2.9+cu126, FA3)
  • Mean improvement over merged SOTA (1.1194): -0.0116 BPB
  • All runs under 16MB artifact limit (15.3MB)
  • All runs under 600s training wallclock
  • Full training logs available (summaries included, full logs on request)

🤖 Generated with Claude Code

Built on PR openai#700 with hyperparameter improvements found via
autoresearch-multi combinatorial search:
- XSA_LAST_N=6 (extended from 4 to 6 layers)
- BIGRAM_VOCAB_SIZE=4096 (doubled from 2048)

3-seed mean: 1.1078 (std 0.0045)
Seeds: 42=1.1045, 1337=1.1061, 2025=1.1129

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant