Skip to content

Conversation

@MalyalaKarthik66
Copy link
Contributor

@MalyalaKarthik66 MalyalaKarthik66 commented Nov 16, 2025

Fix: #21793

This PR fixes an AttributeError in the Muon optimizer caused by the deprecated variable.path attribute. All references are now updated to use variable.name, ensuring compatibility with TensorFlow 2.16+.

  • Updated _should_use_adamw, _muon_update_step, and _adamw_update_step to reference variable.name.
  • Added regression test test_exclude_layers_with_variable_name in muon_test.py to ensure exclude_layers works correctly with current TF versions.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @MalyalaKarthik66, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical compatibility issue in the Muon optimizer by updating its internal variable referencing mechanism from a deprecated attribute to a current one. This change ensures the optimizer functions correctly with modern TensorFlow versions and includes a new test to validate the fix and prevent future regressions related to variable naming.

Highlights

  • Deprecated Variable Attribute Fix: Replaced all instances of the deprecated variable.path attribute with variable.name within the Muon optimizer to resolve an AttributeError and ensure compatibility with TensorFlow 2.16+.
  • Affected Functions: The update to variable.name was applied to _should_use_adamw, build, _muon_update_step, and _adamw_update_step methods of the Muon optimizer.
  • New Regression Test: A new regression test, test_exclude_layers_with_variable_name, was added to muon_test.py to verify that the exclude_layers functionality works correctly with the updated variable.name convention, specifically for the TensorFlow backend.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR correctly fixes an AttributeError by replacing the deprecated variable.path with variable.name, ensuring compatibility with newer TensorFlow versions. The changes are applied consistently, and the addition of a regression test is a great way to prevent this issue from recurring. I've added a couple of suggestions to refactor some duplicated code into a helper method to improve maintainability. Overall, this is a solid fix.

Comment on lines +186 to +191
if variable.name not in self.adam_momentums:
self.adam_momentums[variable.name] = (
self.add_variable_from_reference(
reference_variable=variable, name="momentum"
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve maintainability and reduce code duplication, you could extract this logic for lazily initializing the momentum variable into a helper method. This same logic is repeated in _adamw_update_step.

You could define a new private method like this:

def _maybe_init_momentum(self, variable):
    if variable.name not in self.adam_momentums:
        self.adam_momentums[variable.name] = (
            self.add_variable_from_reference(
                reference_variable=variable, name="momentum"
            )
        )

Then you can replace this block with a single call: self._maybe_init_momentum(variable).

        self._maybe_init_momentum(variable)

Comment on lines +210 to +215
if variable.name not in self.adam_momentums:
self.adam_momentums[variable.name] = (
self.add_variable_from_reference(
reference_variable=variable, name="momentum"
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

As mentioned in the comment for _muon_update_step, you can use the suggested _maybe_init_momentum helper method here as well to avoid duplicating the momentum initialization logic. This would make the code more concise and easier to maintain.

        self._maybe_init_momentum(variable)

@codecov-commenter
Copy link

codecov-commenter commented Nov 16, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.49%. Comparing base (edbf8f5) to head (a687b48).

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #21854      +/-   ##
==========================================
+ Coverage   82.47%   82.49%   +0.01%     
==========================================
  Files         577      577              
  Lines       59508    59514       +6     
  Branches     9332     9335       +3     
==========================================
+ Hits        49080    49096      +16     
+ Misses       8015     8009       -6     
+ Partials     2413     2409       -4     
Flag Coverage Δ
keras 82.31% <100.00%> (+0.01%) ⬆️
keras-jax 62.89% <36.36%> (-0.01%) ⬇️
keras-numpy 57.55% <36.36%> (-0.01%) ⬇️
keras-openvino 34.34% <0.00%> (-0.01%) ⬇️
keras-tensorflow 64.14% <100.00%> (+0.02%) ⬆️
keras-torch 63.60% <36.36%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

for var in var_list:
if not self._overwrite_variable_with_gradient(var):
self.adam_momentums[var.path] = (
self.adam_momentums[var.name] = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These code paths are intended for Keras variables, for which we mean to use the path attribute, which is different from the name attribute.

@MalyalaKarthik66
Copy link
Contributor Author

@fchollet
Updated the code as suggested, restoring var.path for Keras variables. Thanks for the review!

@pass-lin
Copy link
Contributor

pass-lin commented Nov 18, 2025

There are issues with this fix:

  1. There are many duplicates in variable.name. For two dense layers, their weights are both named 'kernel' and 'bias'. If we only want to use Adam for one of the dense layers, how can we distinguish between the two dense layers?

  2. Similarly, due to the naming conflict issue:

    if variable.name not in self.adam_momentums:
        self.adam_momentums[variable.name] = (
            self.add_variable_from_reference(
                reference_variable=variable, name="momentum"
            )
        )
    if variable.name not in self.adam_velocities:
        self.adam_velocities[variable.name] = (
            self.add_variable_from_reference(
                reference_variable=variable, name="velocity"
            )
        )

    Multiple variables might end up using the same cache, which would cause errors.

You can easily see the difference between the name and path properties in the following code.

import keras
model = keras.Sequential([
    keras.layers.Input(shape=(10,)),
    keras.layers.Dense(5), 
    keras.layers.Dense(10), 
    keras.layers.Dense(1, name="last")
])
for w in model.weights:
    print(w.path,w.name) 

@pass-lin
Copy link
Contributor

I've provided an improved bug-fix version that doesn't affect the current design of the Muon optimizer.
#21859

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

keras.optimizers.Muon Fails with AttributeError on variable.path in Keras 3 / TF 2.16-2.20

5 participants