Skip to content

Conversation

@vasqu
Copy link
Contributor

@vasqu vasqu commented Nov 28, 2025

Previous PR merged without syncing with main --> new model --> out of sync

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: nanochat

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, but see comment

self.o_proj = nn.Linear(
config.num_attention_heads * self.head_dim, config.hidden_size, bias=config.attention_bias
)
self.rotary_fn = apply_rotary_pos_emb
Copy link
Collaborator

@ydshieh ydshieh Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you are sure this should be here (i mean I understand the sync , but does this self.rotary_fn = apply_rotary_pos_emb makes sense for this model. If so, good. if not, maybe should not be a sync and the modular file should be adjusted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaik, it's due to kernels needing this but let's double check @MekkCyber

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we wanted it outside not using self.xxxx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#41147 (comment)

Looks like it's not possible? Otherwise, we need to update all models. But it's getting late, let's resolve this next week with a fresh mind 😄

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ty tho, let's move it out!

self.o_proj = nn.Linear(
config.num_attention_heads * self.head_dim, config.hidden_size, bias=config.attention_bias
)
self.rotary_fn = apply_rotary_pos_emb
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we wanted it outside not using self.xxxx

@ArthurZucker
Copy link
Collaborator

If possible

@kaixuanliu
Copy link
Contributor

Hi @ArthurZucker , As this comment: curent design for use_kernel_func_from_hub decorator need bind the function to a Module. And I think this design is excatly what we want: we need to control the kernelize replacement for function by use_kernels param in from_pretrained as well. Hence I want to keep this self.xxx if possible.

@ArthurZucker
Copy link
Collaborator

Right my bad!

@ArthurZucker ArthurZucker merged commit 01823d7 into huggingface:main Dec 1, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants