Skip to content

Fix gptoss_from_pretrained to correctly load HuggingFace weights#4

Open
dfalbel wants to merge 3 commits intomainfrom
fix-gptoss-from-pretrained
Open

Fix gptoss_from_pretrained to correctly load HuggingFace weights#4
dfalbel wants to merge 3 commits intomainfrom
fix-gptoss-from-pretrained

Conversation

@dfalbel
Copy link
Member

@dfalbel dfalbel commented Feb 3, 2026

Summary

  • Update gptoss_normalize_config to map HuggingFace config keys (num_local_expertsnum_experts, num_experts_per_tokexperts_per_token, nested rope_scaling) to internal names
  • Rewrite gptoss_hf_weights_remap to use underscore suffix (_blocks/_scales) for MXFP4 weight detection, remap HF parameter names to model parameter names, and concatenate separate q/k/v projections into combined qkv tensors

🤖 Generated with Claude Code

dfalbel and others added 3 commits February 3, 2026 14:07
- Update gptoss_normalize_config to map HF config keys (num_local_experts,
  num_experts_per_tok, nested rope_scaling) to internal names
- Rewrite gptoss_hf_weights_remap to:
  - Use underscore suffix (_blocks/_scales) for MXFP4 weight detection
  - Remap HF parameter names to model parameter names
  - Concatenate separate q/k/v projections into combined qkv tensors

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant