Skip to content

Add TRN hybrid non-record submission (1.4942 bpb, 1x RTX 5090)#669

Open
amabito wants to merge 1 commit intoopenai:mainfrom
amabito:trn-hybrid-submission
Open

Add TRN hybrid non-record submission (1.4942 bpb, 1x RTX 5090)#669
amabito wants to merge 1 commit intoopenai:mainfrom
amabito:trn-hybrid-submission

Conversation

@amabito
Copy link

@amabito amabito commented Mar 25, 2026

Non-record submission: oscillatory recurrence + attention hybrid under the 16 MB constraint.

What this is

A 10-layer hybrid model (7 TRN layers + 3 attention layers) with int5 QAT and
zstd-22 compression. The TRN layers use a Kogge-Stone parallel prefix scan over
complex-valued oscillators -- no Triton, no custom CUDA, pure PyTorch.

Score: 1.4942 bpb (int5 roundtrip, 636 steps / 600s wallclock, 1x RTX 5090).
Artifact: 15.28 MB.

What went wrong

The model reaches 1.26 bpb in fp32 at 20K steps, but int5 quantization degrades
it to 1.93 bpb. The oscillator projection weights (d_model -> 6K, encoding
frequency and phase) accumulate O(t) phase drift from quantization errors.
At 1000 steps the error is small (+0.041); at 20K steps it collapses (+0.669).

A parameter-matched 13L Transformer shows only +0.016 int5 degradation at the
same step count. The failure is specific to oscillatory recurrence parameters.

What is included

  • records/track_non_record_16mb/2026-03-25_TRN_Hybrid_Int5_1x5090/
    • README.md (architecture, ablations, quantization analysis, 13L comparison)
    • submission.json
    • train.log
    • train_gpt_trn.py (self-contained, zero external dependencies)
  • README.md root table updated (Non-Record Runs)

What is not included

  • 3-seed runs (single seed only)
  • 8xH100 results (tested on 1x RTX 5090 only)

Oscillatory recurrence + attention hybrid under 16 MB constraint.
10 layers (7 TRN + 3 Attn), int5 QAT, Kogge-Stone parallel scan.
Int5 collapses at 20K steps due to oscillator projection phase drift.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant