-
Notifications
You must be signed in to change notification settings - Fork 31.2k
[loading/saving] Reverse all loading operations when saving #42396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice., missing some fast test in test_core_model_loading! With all the ops that are define in the file to make sure they are all inversible (ex: permute for rope!)
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice but missing tests 👁️ 👁️ !
| # The inverse operation class, will be used when saving the checkpoint | ||
| reverse_op: type[ConversionOps] | ||
| def __repr__(self): | ||
| return f"{self.__class__.__name__}(dim={self.dim})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return f"{self.__class__.__name__}(dim={self.dim})" | |
| return f"{self.__class__.__name__}(dim={self.dim}, {self.reverse_op}" |
| # Perform renaming op (for a simple WeightRenaming, `self.source_patterns` and `self.target_patterns` can | ||
| # only be of length 1, and are actually the full key names - we also have only 1 single related tensor) | ||
| target_key = self.target_patterns[0] | ||
| collected_tensors = {target_key: self.collected_tensors[self.source_patterns[0]]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not certain this should be done here either API-wise
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much much much better but missing tests still
|
[For maintainers] Suggested jobs to run (before merge) run-slow: granite_speech, timm_wrapper |
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ty
What does this PR do?
As per the title! We can now revert back all our custom Operation performed during weight loading, and it's the default for BC!