feat: add MiniMax M2 (MiniMaxM2ForCausalLM) support by scottgl9 · Pull Request #19 · CerebrasResearch/reap

scottgl9 · 2026-04-15T00:59:49Z

Summary

Adds MODEL_ATTRS entry and observer config for MiniMaxM2ForCausalLM, enabling REAP expert pruning on MiniMax M2.x models (M2, M2.1, M2.5, M2.7).

Cerebras has already published REAP-pruned MiniMax M2 and M2.5 checkpoints on HuggingFace (MiniMax-M2.5-REAP-139B-A10B, MiniMax-M2.5-REAP-172B-A10B), but the corresponding model support was not upstreamed. This patch enables the community to reproduce those results and apply REAP to MiniMax M2.7.

Changes

`src/reap/model_util.py`

Adds MiniMaxM2ForCausalLM to MODEL_ATTRS:

moe_block: "block_sparse_moe" — the MoE attribute on MiniMaxM2DecoderLayer
gate_proj: "w1", up_proj: "w3", down_proj: "w2" — MiniMaxM2MLP uses non-standard weight names
router: "gate", num_experts: "num_local_experts"

`src/reap/observer.py`

Adds MiniMaxM2MoEObserverHookConfig hooking onto MiniMaxM2SparseMoeBlock. MiniMaxM2Experts exposes .num_experts and .top_k directly, matching the base class defaults — no attribute overrides needed.

Verification

Weight name mapping confirmed against:

HF modeling_minimax_m2.py (MiniMaxM2MLP class uses w1/w2/w3)
SGLang minimax_m2.py (ckpt_gate_proj_name="w1", ckpt_down_proj_name="w2", ckpt_up_proj_name="w3")
yujiepan/minimax-m2.5-tiny-random model structure printout confirming block_sparse_moe attribute name

Applies to all MiniMax M2 variants sharing the MiniMaxM2ForCausalLM architecture: M2, M2.1, M2.5, M2.7.

Add MODEL_ATTRS entry and observer config for MiniMaxM2ForCausalLM, enabling REAP expert pruning on MiniMax M2.x models (M2, M2.1, M2.5, M2.7). MiniMaxM2MLP uses non-standard weight names (w1/w2/w3) instead of gate_proj/up_proj/down_proj. The MoE block is accessed via block_sparse_moe on MiniMaxM2DecoderLayer, and the router is gate. Weight name mapping confirmed against: - HF modeling_minimax_m2.py (MiniMaxM2MLP class) - SGLang minimax_m2.py (ckpt_gate_proj_name=w1, ckpt_down_proj_name=w2, ckpt_up_proj_name=w3) - yujiepan/minimax-m2.5-tiny-random model structure printout MiniMaxM2Experts exposes .num_experts and .top_k directly, matching the MoETransformerObserverConfig base class defaults — no overrides needed. Note: Cerebras has published MiniMax-M2 and M2.5 REAP checkpoints on HuggingFace but the corresponding model support was not upstreamed. This patch enables the community to reproduce those results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add MiniMax M2 (MiniMaxM2ForCausalLM) support#19

feat: add MiniMax M2 (MiniMaxM2ForCausalLM) support#19
scottgl9 wants to merge 1 commit intoCerebrasResearch:mainfrom
scottgl9:feat/minimax-m2-support

scottgl9 commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

scottgl9 commented Apr 15, 2026

Summary

Changes

src/reap/model_util.py

src/reap/observer.py

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`src/reap/model_util.py`

`src/reap/observer.py`