Skip to content

Conversation

@shawntan
Copy link
Contributor

@shawntan shawntan commented Nov 3, 2025

Modifying kernel to match transformers v5 based on: huggingface/transformers#40822 (comment)

@shawntan shawntan requested a review from MekkCyber as a code owner November 3, 2025 17:52
Copy link
Collaborator

@MekkCyber MekkCyber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment on lines +51 to 52
return layer_output

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we don't need the router_logits in the kernel ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I've understood, the new transformers v5 tracks the router_logits via some annotations to the main class: _can_record_outputs and OutputRecorder (huggingface/transformers#40822 (comment)). @ArthurZucker has confirmed those annotations will be added to GraniteMoe, making the output signature same as all the other MoE models, where the router logits are similarly tracked via those annotations.

I'm not sure if the underlying mechanism that tracks the logits for v5 will work with the kernel, but it's hard to test since for GraniteMoeHybrid, we need mamba_ssm to be compatible with v5.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifications @shawntan, I don't get though why it's hard to test ?

Copy link
Contributor Author

@shawntan shawntan Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, Mamba's been updated, I've tested it. Sorry about that.

Everything seems good to go!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants