-
Notifications
You must be signed in to change notification settings - Fork 19.6k
Fix Muon optimizer to use variable.name instead of deprecated variable.path #21854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Fix Muon optimizer to use variable.name instead of deprecated variable.path #21854
Conversation
Summary of ChangesHello @MalyalaKarthik66, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical compatibility issue in the Muon optimizer by updating its internal variable referencing mechanism from a deprecated attribute to a current one. This change ensures the optimizer functions correctly with modern TensorFlow versions and includes a new test to validate the fix and prevent future regressions related to variable naming. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This PR correctly fixes an AttributeError by replacing the deprecated variable.path with variable.name, ensuring compatibility with newer TensorFlow versions. The changes are applied consistently, and the addition of a regression test is a great way to prevent this issue from recurring. I've added a couple of suggestions to refactor some duplicated code into a helper method to improve maintainability. Overall, this is a solid fix.
| if variable.name not in self.adam_momentums: | ||
| self.adam_momentums[variable.name] = ( | ||
| self.add_variable_from_reference( | ||
| reference_variable=variable, name="momentum" | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To improve maintainability and reduce code duplication, you could extract this logic for lazily initializing the momentum variable into a helper method. This same logic is repeated in _adamw_update_step.
You could define a new private method like this:
def _maybe_init_momentum(self, variable):
if variable.name not in self.adam_momentums:
self.adam_momentums[variable.name] = (
self.add_variable_from_reference(
reference_variable=variable, name="momentum"
)
)Then you can replace this block with a single call: self._maybe_init_momentum(variable).
self._maybe_init_momentum(variable)| if variable.name not in self.adam_momentums: | ||
| self.adam_momentums[variable.name] = ( | ||
| self.add_variable_from_reference( | ||
| reference_variable=variable, name="momentum" | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #21854 +/- ##
==========================================
+ Coverage 82.47% 82.49% +0.01%
==========================================
Files 577 577
Lines 59508 59514 +6
Branches 9332 9335 +3
==========================================
+ Hits 49080 49096 +16
+ Misses 8015 8009 -6
+ Partials 2413 2409 -4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
keras/src/optimizers/muon.py
Outdated
| for var in var_list: | ||
| if not self._overwrite_variable_with_gradient(var): | ||
| self.adam_momentums[var.path] = ( | ||
| self.adam_momentums[var.name] = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These code paths are intended for Keras variables, for which we mean to use the path attribute, which is different from the name attribute.
|
@fchollet |
|
There are issues with this fix:
You can easily see the difference between the import keras
model = keras.Sequential([
keras.layers.Input(shape=(10,)),
keras.layers.Dense(5),
keras.layers.Dense(10),
keras.layers.Dense(1, name="last")
])
for w in model.weights:
print(w.path,w.name) |
|
I've provided an improved bug-fix version that doesn't affect the current design of the Muon optimizer. |
Fix: #21793
This PR fixes an AttributeError in the Muon optimizer caused by the deprecated
variable.pathattribute. All references are now updated to usevariable.name, ensuring compatibility with TensorFlow 2.16+._should_use_adamw,_muon_update_step, and_adamw_update_stepto referencevariable.name.test_exclude_layers_with_variable_namein muon_test.py to ensureexclude_layersworks correctly with current TF versions.