Skip to content

Add LLMAdvisor submission: 1.14638 BPB (track_10min_16mb)#665

Open
harborglowvintage-oss wants to merge 1 commit intoopenai:mainfrom
harborglowvintage-oss:submission/llmadvisor-1.14638
Open

Add LLMAdvisor submission: 1.14638 BPB (track_10min_16mb)#665
harborglowvintage-oss wants to merge 1 commit intoopenai:mainfrom
harborglowvintage-oss:submission/llmadvisor-1.14638

Conversation

@harborglowvintage-oss
Copy link

10L Int5/Int6 + BigramHash(10240) + SmearGate + SWA Boost

  • Mixed int5 MLP / int6 attention quantization + FP16 embeddings + zstd-22
  • Reduced batch (622592 tokens) for ~7370 steps in 600s
  • SWA boost: every=30 steps, start_frac=0.50, 49 averaged checkpoints
  • Artifact size: 15,736,555 bytes (under 16MB limit)

3-seed results:

  • seed=1337 SWA boost: val_bpb = 1.14638 (best)
  • seed=1337 standard: val_bpb = 1.14644
  • seed=2024: val_bpb = 1.14709

Note: This does not beat the current SOTA (1.1428) by the required 0.005 nats.
Submitting as a non-record leaderboard entry.

10L Int5/Int6 + BigramHash(10240) + SmearGate + SWA Boost
- Mixed int5 MLP / int6 attention quantization + FP16 embeddings + zstd-22
- Reduced batch (622592 tokens) for ~7370 steps in 600s
- SWA boost: every=30 steps, start_frac=0.50, 49 averaged checkpoints
- Best val_bpb: 1.14638 (seed=1337)
- Artifact size: 15,736,555 bytes (under 16MB limit)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant