Skip to content

Non-record: 30ep Cosine TTT on SwiGLU + U-Net (1xH100, val_bpb=1.1175)#661

Open
andrewbaggio1 wants to merge 1 commit intoopenai:mainfrom
andrewbaggio1:submission/cosine-ttt-30ep-v2
Open

Non-record: 30ep Cosine TTT on SwiGLU + U-Net (1xH100, val_bpb=1.1175)#661
andrewbaggio1 wants to merge 1 commit intoopenai:mainfrom
andrewbaggio1:submission/cosine-ttt-30ep-v2

Conversation

@andrewbaggio1
Copy link

Summary

Non-record 1xH100 submission. Single change from PR #462: TTT_EPOCHS=30 (default 10).

val_bpb = 1.1175 (sliding window stride=64, seed 1337) | 7.5 MB artifact | 1xH100 SXM

Results (1xH100 SXM, seed 1337)

Metric Value
Training steps 936 (wallclock capped)
Post-quant roundtrip val_bpb 1.0684
Sliding window val_bpb 1.1175
Artifact size 7.5 MB
TTT time 3,376s (30 epochs cosine)

Approach

PR #462's full SwiGLU + U-Net architecture with 30-epoch cosine TTT (vs default 10). Consistent with PR #481's finding that more cosine TTT epochs improve results.

Architecture

PR #462's stack unchanged: 11L SwiGLU (hidden=1792), U-Net gated skips, BigramHash (8192), SmearGate, EMA (0.9985), Late QAT, Partial RoPE, LN Scale, Int6+zstd.

Limitation

1xH100 only — needs 8xH100 verification. 30 TTT epochs estimated ~7 min on 8xH100 (within eval budget).

Credits

PR #462 (JoeProAI), PR #481 (mrdavtan), PR #442 (sjp611), PR #398 (felipe-parodi)

Test plan

  • train_gpt.py compiles (ast.parse passes)
  • Artifact under 16 MB (7.5 MB)
  • PR only adds files to one new folder
  • submission.json includes all required fields
  • Train log included
  • Pending: 8xH100 verification + additional seeds

🤖 Generated with Claude Code

…pb=1.1175)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant