Record: 30ep Cosine TTT on LeakyReLU² stack (3-seed mean val_bpb=1.0781)#672
Open
andrewbaggio1 wants to merge 1 commit intoopenai:mainfrom
Open
Record: 30ep Cosine TTT on LeakyReLU² stack (3-seed mean val_bpb=1.0781)#672andrewbaggio1 wants to merge 1 commit intoopenai:mainfrom
andrewbaggio1 wants to merge 1 commit intoopenai:mainfrom
Conversation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dhruvjatkar
pushed a commit
to dhruvjatkar/parameter-golf
that referenced
this pull request
Mar 25, 2026
…line reference Fetched train_gpt.py verbatim from upstream openai/parameter-golf PR openai#672 which achieves 1.0781 BPB (3-seed mean, std=0.0041) using TTT_EPOCHS=30 with cosine TTT schedule. This replaces 1.1194 as the baseline to beat. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 tasks
dhruvjatkar
pushed a commit
to dhruvjatkar/parameter-golf
that referenced
this pull request
Mar 25, 2026
- Target to beat: 1.0781 BPB (PR openai#672, TTT_EPOCHS=30 Cosine TTT) - Add single-agent protocol section - Mark crontab auto-submitter as non-functional - Add operational lessons from March 2026 - Update preferred source script to PR672 baseline Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dhruvjatkar
pushed a commit
to dhruvjatkar/parameter-golf
that referenced
this pull request
Mar 25, 2026
…al lessons - New target: 1.0781 BPB (PR openai#672, TTT_EPOCHS=30 Cosine TTT) - Merged SOTA kept as 1.1194 for context - Add single-agent protocol (one agent on cluster at a time) - Add operational lessons from March 2026 - Mark crontab auto-submitter as non-functional - Update milestones relative to 1.0781 - Update preferred source script to PR672 baseline Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dhruvjatkar
pushed a commit
to dhruvjatkar/parameter-golf
that referenced
this pull request
Mar 25, 2026
PR openai#672 maxes TTT at 30 epochs (590s/600s eval budget), so all future improvements must be orthogonal to TTT. This update: - Sets 1.0781 BPB (PR openai#672) as the new target to beat - Reorders Top 8 directions: XSA-all confirmed at #1, Full GPTQ #2, SwiGLU #3, Muon-VS #4, aggressive quant openai#5, MASA openai#6, depth recurrence openai#7 with int6 risk warning, AdEMAMix openai#8 - Deprioritizes TTT-related directions already exploited by PR openai#672 - Collapses ~1000 lines of stale Round 0-3.9 session logs into a concise historical summary - Removes resolved blockers (flash_attn, SSH hangs, local runtime) - Adds fresh Round 1 section with 5 submitted experiments Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closed
5 tasks
dhruvjatkar
pushed a commit
to dhruvjatkar/parameter-golf
that referenced
this pull request
Mar 25, 2026
…al lessons - New target: 1.0781 BPB (PR openai#672, TTT_EPOCHS=30 Cosine TTT) - Merged SOTA kept as 1.1194 for context - Add single-agent protocol (one agent on cluster at a time) - Add operational lessons from March 2026 - Mark crontab auto-submitter as non-functional - Update milestones relative to 1.0781 - Update preferred source script to PR672 baseline Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dhruvjatkar
pushed a commit
to dhruvjatkar/parameter-golf
that referenced
this pull request
Mar 25, 2026
- Target to beat: 1.0781 BPB (PR openai#672, TTT_EPOCHS=30 Cosine TTT) - Add single-agent protocol section - Mark crontab auto-submitter as non-functional - Add operational lessons from March 2026 - Update preferred source script to PR672 baseline Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dhruvjatkar
pushed a commit
to dhruvjatkar/parameter-golf
that referenced
this pull request
Mar 25, 2026
PR openai#672 maxes TTT at 30 epochs (590s/600s eval budget), so all future improvements must be orthogonal to TTT. This update: - Sets 1.0781 BPB (PR openai#672) as the new target to beat - Reorders Top 8 directions: XSA-all confirmed at #1, Full GPTQ #2, SwiGLU #3, Muon-VS #4, aggressive quant #5, MASA openai#6, depth recurrence openai#7 with int6 risk warning, AdEMAMix openai#8 - Deprioritizes TTT-related directions already exploited by PR openai#672 - Collapses ~1000 lines of stale Round 0-3.9 session logs into a concise historical summary - Removes resolved blockers (flash_attn, SSH hangs, local runtime) - Adds fresh Round 1 section with 5 submitted experiments Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dhruvjatkar
pushed a commit
to dhruvjatkar/parameter-golf
that referenced
this pull request
Mar 25, 2026
…line reference Fetched train_gpt.py verbatim from upstream openai/parameter-golf PR openai#672 which achieves 1.0781 BPB (3-seed mean, std=0.0041) using TTT_EPOCHS=30 with cosine TTT schedule. This replaces 1.1194 as the baseline to beat. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4 tasks
xexyz
added a commit
to xexyz/parameter-golf
that referenced
this pull request
Mar 25, 2026
30-epoch cosine pre-eval Test-Time Training on PR openai#414 consensus stack. Adapts quantized model on validation data before sliding-window eval. - Pre-TTT post-quant: 1.1594 BPB - Post-TTT sliding (stride=64): 1.0988 BPB - Total artifact: 15,900,191 bytes (under 16MB) - 5434 training steps + 30ep TTT + sliding eval on 8xH100 Built on PR openai#414 by @signalrush. TTT recipe from PR openai#518/@sofiabod, PR openai#672/@andrewbaggio1.
Bharath-970
added a commit
to Bharath-970/parameter-golf
that referenced
this pull request
Mar 25, 2026
…ssion Swap score-first LoRA TTT for the simpler and more effective cosine TTT approach from PR openai#672 (1.0781 BPB): fine-tune all model weights on val data for 30 epochs with cosine LR decay and per-layer LR groups (3x MLP-out, 0.5x MLP-in), followed by sliding-window stride=64 eval.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
3-seed mean val_bpb: 1.0781 (std=0.0041) | 15.62 MB artifact | 8xH100 SXM
Single change from PR #518: TTT_EPOCHS=30. All architecture identical.
Results (8xH100 SXM)
vs. Verified SOTA
Timing
Architecture
PR #518's full stack: 11L LeakyReLU(0.5)², d=512, 4 KV GQA, MLP 3x, BigramHash(2048), SmearGate, XSA4, Partial RoPE, LN Scale, EMA, SWA, Late QAT, OrthoInit, VE128. Int6+zstd-22.
Run command
Credits
PR #518, PR #481 (mrdavtan), PR #442 (sjp611), PR #398 (felipe-parodi)
Test plan
🤖 Generated with Claude Code