feat: add MiniMax M2 (MiniMaxM2ForCausalLM) support#19
Open
scottgl9 wants to merge 1 commit intoCerebrasResearch:mainfrom
Open
feat: add MiniMax M2 (MiniMaxM2ForCausalLM) support#19scottgl9 wants to merge 1 commit intoCerebrasResearch:mainfrom
scottgl9 wants to merge 1 commit intoCerebrasResearch:mainfrom
Conversation
Add MODEL_ATTRS entry and observer config for MiniMaxM2ForCausalLM, enabling REAP expert pruning on MiniMax M2.x models (M2, M2.1, M2.5, M2.7). MiniMaxM2MLP uses non-standard weight names (w1/w2/w3) instead of gate_proj/up_proj/down_proj. The MoE block is accessed via block_sparse_moe on MiniMaxM2DecoderLayer, and the router is gate. Weight name mapping confirmed against: - HF modeling_minimax_m2.py (MiniMaxM2MLP class) - SGLang minimax_m2.py (ckpt_gate_proj_name=w1, ckpt_down_proj_name=w2, ckpt_up_proj_name=w3) - yujiepan/minimax-m2.5-tiny-random model structure printout MiniMaxM2Experts exposes .num_experts and .top_k directly, matching the MoETransformerObserverConfig base class defaults — no overrides needed. Note: Cerebras has published MiniMax-M2 and M2.5 REAP checkpoints on HuggingFace but the corresponding model support was not upstreamed. This patch enables the community to reproduce those results.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
MODEL_ATTRSentry and observer config forMiniMaxM2ForCausalLM, enabling REAP expert pruning on MiniMax M2.x models (M2, M2.1, M2.5, M2.7).Cerebras has already published REAP-pruned MiniMax M2 and M2.5 checkpoints on HuggingFace (MiniMax-M2.5-REAP-139B-A10B, MiniMax-M2.5-REAP-172B-A10B), but the corresponding model support was not upstreamed. This patch enables the community to reproduce those results and apply REAP to MiniMax M2.7.
Changes
src/reap/model_util.pyAdds
MiniMaxM2ForCausalLMtoMODEL_ATTRS:moe_block: "block_sparse_moe"— the MoE attribute onMiniMaxM2DecoderLayergate_proj: "w1",up_proj: "w3",down_proj: "w2"—MiniMaxM2MLPuses non-standard weight namesrouter: "gate",num_experts: "num_local_experts"src/reap/observer.pyAdds
MiniMaxM2MoEObserverHookConfighooking ontoMiniMaxM2SparseMoeBlock.MiniMaxM2Expertsexposes.num_expertsand.top_kdirectly, matching the base class defaults — no attribute overrides needed.Verification
Weight name mapping confirmed against:
modeling_minimax_m2.py(MiniMaxM2MLPclass usesw1/w2/w3)minimax_m2.py(ckpt_gate_proj_name="w1",ckpt_down_proj_name="w2",ckpt_up_proj_name="w3")yujiepan/minimax-m2.5-tiny-randommodel structure printout confirmingblock_sparse_moeattribute nameApplies to all MiniMax M2 variants sharing the
MiniMaxM2ForCausalLMarchitecture: M2, M2.1, M2.5, M2.7.