Skip to content

Conversation

@Cyrilvallez
Copy link
Member

@Cyrilvallez Cyrilvallez commented Nov 25, 2025

What does this PR do?

As per the title! We can now revert back all our custom Operation performed during weight loading, and it's the default for BC!

@Cyrilvallez Cyrilvallez changed the title Reverse all loading operations when saving [loading/saving] Reverse all loading operations when saving Nov 25, 2025
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice., missing some fast test in test_core_model_loading! With all the ops that are define in the file to make sure they are all inversible (ex: permute for rope!)

@ArthurZucker ArthurZucker added for_v5? Core: Modeling Internals of the library; Models. labels Nov 27, 2025
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice but missing tests 👁️ 👁️ !

# The inverse operation class, will be used when saving the checkpoint
reverse_op: type[ConversionOps]
def __repr__(self):
return f"{self.__class__.__name__}(dim={self.dim})"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return f"{self.__class__.__name__}(dim={self.dim})"
return f"{self.__class__.__name__}(dim={self.dim}, {self.reverse_op}"

Comment on lines +416 to +388
# Perform renaming op (for a simple WeightRenaming, `self.source_patterns` and `self.target_patterns` can
# only be of length 1, and are actually the full key names - we also have only 1 single related tensor)
target_key = self.target_patterns[0]
collected_tensors = {target_key: self.collected_tensors[self.source_patterns[0]]}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not certain this should be done here either API-wise

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much much much better but missing tests still

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: granite_speech, timm_wrapper

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ty

@Cyrilvallez Cyrilvallez merged commit d08b98b into main Nov 27, 2025
22 of 24 checks passed
@Cyrilvallez Cyrilvallez deleted the saving branch November 27, 2025 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Core: Modeling Internals of the library; Models. for_v5?

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants