fix: ensure CQL learner transitions from behavior cloning correctly#355
Open
natinew77-creator wants to merge 2 commits intogoogle-deepmind:masterfrom
Open
Conversation
This PR addresses issue google-deepmind#297 by updating the README to: 1. Add a prominent warning that the PyPI package may be outdated 2. Recommend installing from source as the primary method for running examples 3. Reorganize installation steps to emphasize source installation 4. Add troubleshooting note for import errors after pip install The PyPI package (dm-acme) was last updated in February 2022, while the GitHub repository has continued to evolve with new agents and features. This mismatch causes import errors when users try to run the examples after installing via pip. Fixes google-deepmind#297
The CQLLearner.step() method was hardcoding the check for 'learner_steps' key, but the counter stores step counts using the key from get_steps_key() which varies based on the counter's prefix configuration. This caused the behavior cloning phase to never end when the counter was initialized without the 'learner' prefix (e.g., in run_cql_jax.py example). The fix uses self._counter.get_steps_key() to dynamically retrieve the correct key, ensuring proper transition from BC to CQL training.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Fixes #305
The
CQLLearner.step()method hardcodes a check for'learner_steps'key when determining whether to continue in the behavior cloning phase:However,
self._counter.increment(steps=1)stores the count using the key determined byget_steps_key(), which is'steps'by default (or'{prefix}_steps'when a prefix is set). This mismatch causes the behavior cloning phase to never end.Solution
Use
self._counter.get_steps_key()to dynamically retrieve the correct key:This ensures proper transition from behavior cloning to CQL training regardless of how the counter is configured.