[WIP] Non-record: Local Ablation Pipeline — EMA + Int6 + Partial RoPE (GTX 1650) by gthgomez · Pull Request #682 · openai/parameter-golf

gthgomez · 2026-03-25T05:24:49Z

Summary

This is a non-record local validation submission intended to document implementation and ablation results on constrained hardware. It is not a leaderboard attempt.

Track: track_non_record_16mb — dev hardware, not competition-scale
Author: Jonathan Gomez (gthgomez)
Hardware: NVIDIA GTX 1650 (4 GB VRAM, SM 7.5, Turing, Windows 11)

Folder: records/track_non_record_16mb/2026-03-25_LocalAblation_GTX1650_EMA_Int6_PartialRoPE/

What this includes

Features ported from leaderboard entries #1 (1.1233 bpb) and #2 (1.1248 bpb) and validated via 200-step ablation runs on local hardware:

GTX 1650 compatibility patches — NO_COMPILE, math SDP fallback, MAX_VAL_SEQS cap. All patches are env-var-gated and inert on H100 hardware.
EMA (EMA_DECAY env var) — 0.997 is the intended competition-scale setting based on top public entries; 0.97 validated correct implementation locally (+0.167 bpb improvement over live model in 200-step test).
Int6 clip-search quantizer — 5-percentile per-row search, values in [-31, 31], inline A/B comparison at export. Local result: 6.7 MB vs 11.0 MB int8 at +0.005 bpb cost. The reduction appears to come from lower dynamic range and increased weight regularity, which improves entropy coding efficiency under zlib. All measurements use the same export and compression path.
Partial RoPE (ROPE_DIMS=16) — rotate first 16/64 head dims, passthrough the rest.
LN Scale (LN_SCALE=1) — 1/sqrt(layer_idx+1) applied to attn+mlp norms per block.
Muon decoupled weight decay (MUON_WD=0.04) + AdamW (ADAM_WD=0.04) for tok/scalar optimizers.
MLP_MULT float support — enables MLP_MULT=3.0 (entry Update README.md little things #1 config).

Local ablation results (200 steps, 9L, GTX 1650)

Run	Live bpb	EMA bpb	int8 size	int6 size
Baseline	2.6964	—	~11.1 MB	~7.0 MB
EMA_DECAY=0.97	2.6333	2.4661	11.1 MB	—
Partial RoPE + LN Scale	2.6845	—	11.2 MB	7.0 MB
All features combined	2.6845	2.5273	11.0 MB	6.7 MB

What is NOT in this submission

Full competition-scale training (11L, MLP_MULT=3.0, seq_len=2048, 7000 steps) — pending 8×H100 access
XSA (Exclusive Self-Attention)
VE (Value Embedding) — not yet implemented in this script
Sliding-window eval (stride-64)

This draft will be updated with competition-scale results once compute is available, or superseded by a ranked submission.

records/track_non_record_16mb/2026-03-25_LocalAblation_GTX1650_EMA_Int6_PartialRoPE/ Dev-hardware (GTX 1650, SM 7.5, 4 GB VRAM, Windows 11) pipeline porting proven techniques from leaderboard entries openai#1 and openai#2 via 200-step local ablation runs. Features implemented and validated: - NO_COMPILE + math SDP fallback + MAX_VAL_SEQS (GTX 1650 compat, inert on H100) - EMA (decay sweep: 0.997 for competition-scale, 0.97 validated locally) - int6 clip-search quantizer + in-process A/B comparison - Partial RoPE (ROPE_DIMS=16) + LN Scale 1/sqrt(layer+1) - Muon decoupled weight decay (MUON_WD) + AdamW for tok/scalar - MLP_MULT float support (enables MLP_MULT=3.0) Best local result: val_bpb 2.5273 (int8 roundtrip, combined config, 200 steps) Not a leaderboard attempt. Pending: full 11L competition run on 8xH100.

gthgomez force-pushed the submission/local-ablation-gtx1650 branch from d0dcb4c to 6b14fe3 Compare March 25, 2026 05:41

gthgomez force-pushed the submission/local-ablation-gtx1650 branch from 6b14fe3 to 8f16ebf Compare March 25, 2026 05:44

gthgomez marked this pull request as ready for review March 25, 2026 05:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Non-record: Local Ablation Pipeline — EMA + Int6 + Partial RoPE (GTX 1650)#682

[WIP] Non-record: Local Ablation Pipeline — EMA + Int6 + Partial RoPE (GTX 1650)#682
gthgomez wants to merge 1 commit intoopenai:mainfrom
gthgomez:submission/local-ablation-gtx1650

gthgomez commented Mar 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gthgomez commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What this includes

Local ablation results (200 steps, 9L, GTX 1650)

What is NOT in this submission

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gthgomez commented Mar 25, 2026 •

edited

Loading