[WIP] Non-record: Local Ablation Pipeline — EMA + Int6 + Partial RoPE (GTX 1650)#682
Open
gthgomez wants to merge 1 commit intoopenai:mainfrom
Open
[WIP] Non-record: Local Ablation Pipeline — EMA + Int6 + Partial RoPE (GTX 1650)#682gthgomez wants to merge 1 commit intoopenai:mainfrom
gthgomez wants to merge 1 commit intoopenai:mainfrom
Conversation
d0dcb4c to
6b14fe3
Compare
records/track_non_record_16mb/2026-03-25_LocalAblation_GTX1650_EMA_Int6_PartialRoPE/ Dev-hardware (GTX 1650, SM 7.5, 4 GB VRAM, Windows 11) pipeline porting proven techniques from leaderboard entries openai#1 and openai#2 via 200-step local ablation runs. Features implemented and validated: - NO_COMPILE + math SDP fallback + MAX_VAL_SEQS (GTX 1650 compat, inert on H100) - EMA (decay sweep: 0.997 for competition-scale, 0.97 validated locally) - int6 clip-search quantizer + in-process A/B comparison - Partial RoPE (ROPE_DIMS=16) + LN Scale 1/sqrt(layer+1) - Muon decoupled weight decay (MUON_WD) + AdamW for tok/scalar - MLP_MULT float support (enables MLP_MULT=3.0) Best local result: val_bpb 2.5273 (int8 roundtrip, combined config, 200 steps) Not a leaderboard attempt. Pending: full 11L competition run on 8xH100.
6b14fe3 to
8f16ebf
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This is a non-record local validation submission intended to document implementation and ablation results on constrained hardware. It is not a leaderboard attempt.
Track:
track_non_record_16mb— dev hardware, not competition-scaleAuthor: Jonathan Gomez (gthgomez)
Hardware: NVIDIA GTX 1650 (4 GB VRAM, SM 7.5, Turing, Windows 11)
Folder:
records/track_non_record_16mb/2026-03-25_LocalAblation_GTX1650_EMA_Int6_PartialRoPE/What this includes
Features ported from leaderboard entries #1 (1.1233 bpb) and #2 (1.1248 bpb) and validated via 200-step ablation runs on local hardware:
NO_COMPILE, math SDP fallback,MAX_VAL_SEQScap. All patches are env-var-gated and inert on H100 hardware.EMA_DECAYenv var) —0.997is the intended competition-scale setting based on top public entries;0.97validated correct implementation locally (+0.167 bpb improvement over live model in 200-step test).ROPE_DIMS=16) — rotate first 16/64 head dims, passthrough the rest.LN_SCALE=1) —1/sqrt(layer_idx+1)applied to attn+mlp norms per block.MUON_WD=0.04) + AdamW (ADAM_WD=0.04) for tok/scalar optimizers.MLP_MULT=3.0(entry Update README.md little things #1 config).Local ablation results (200 steps, 9L, GTX 1650)
What is NOT in this submission
This draft will be updated with competition-scale results once compute is available, or superseded by a ranked submission.