Add Implicit Q-Learning (IQL) implementation#348
Open
hwilner wants to merge 2 commits intogoogle-deepmind:masterfrom
Open
Add Implicit Q-Learning (IQL) implementation#348hwilner wants to merge 2 commits intogoogle-deepmind:masterfrom
hwilner wants to merge 2 commits intogoogle-deepmind:masterfrom
Conversation
Implements IQL offline RL algorithm with: - Expectile regression for value function - TD learning for Q-function - Advantage-weighted regression for policy - Complete learner, builder, and networks - Comprehensive documentation Addresses issue google-deepmind#329
- Example script for running IQL on D4RL datasets - Unit tests for IQL components - Follows CQL example pattern
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR implements Implicit Q-Learning (IQL), an offline reinforcement learning algorithm, addressing issue #329.
Overview
IQL is designed for learning from fixed datasets without online interaction. Unlike other offline RL methods, IQL avoids querying values of out-of-sample actions, which helps prevent overestimation and distributional shift issues.
Implementation
This implementation includes:
run_iql_jax.pyfor training on D4RL datasetsagent_test.pyfor component verificationAlgorithm Details
IQL uses three key components:
Code Quality
Testing
Unit tests verify:
References
Kostrikov, I., Nair, A., & Levine, S. (2021). Offline Reinforcement Learning with Implicit Q-Learning. arXiv preprint arXiv:2110.06169. https://arxiv.org/abs/2110.06169
Fixes #329