-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Harl integration #3890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Harl integration #3890
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR integrates the HARL (Heterogeneous-Agent Reinforcement Learning) library for multi-agent coordination tasks in IsaacLab. The implementation adds:
- HARL library dependency via customized fork from DIRECTLab
- Training/inference scripts for HAPPO, HATRPO, HAA2C, MAPPO algorithms with adversarial training modes
- Three new environments: Single-agent HAPPO Anymal-C, two-agent bar carrying (2x Anymal-C), and heterogeneous push task (H1 + Anymal-C)
- Multi-agent coordination capabilities using the
DirectMARLEnvbase class with per-agent observation/action spaces
Key Issues Found:
- Duplicate copyright headers across all new Python files (non-critical style issue)
- In-place tensor mutation in
h1_anymal_push_env.py:514that may cause gradient computation issues - Incorrect docstrings copied from template ("Ant locomotion" instead of actual environment names)
Confidence Score: 4/5
- This PR is mostly safe to merge with one logical issue requiring attention
- Score reflects well-structured multi-agent implementation following IsaacLab patterns, but docked one point for the tensor mutation bug that could cause training instability. The duplicate headers and docstring errors are style issues that don't affect functionality.
- Pay close attention to
h1_anymal_push_env.py:514- the in-place tensor mutation needs to be fixed before merge to avoid potential gradient issues during training
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| scripts/reinforcement_learning/harl/train.py | 4/5 | New HARL training script with adversarial training support, has duplicate copyright headers |
| source/isaaclab_tasks/isaaclab_tasks/direct/anymal_c/anymal_c_happo_env.py | 4/5 | HAPPO-compatible single-agent Anymal-C environment, has duplicate copyright headers and incorrect docstring |
| source/isaaclab_tasks/isaaclab_tasks/direct/multi_agent_coordination/anymal_c_multi_agent_bar_env.py | 4/5 | Multi-agent bar carrying environment with 2 Anymal-C robots, has duplicate copyright headers |
| source/isaaclab_tasks/isaaclab_tasks/direct/multi_agent_coordination/h1_anymal_push_env.py | 3/5 | Heterogeneous multi-agent push environment with H1 and Anymal-C, has in-place tensor mutation issue and duplicate copyright headers |
Sequence Diagram
sequenceDiagram
participant User
participant TrainScript as train.py/play.py
participant HARL as HARL Library
participant Env as Multi-Agent Env
participant Robots as Robot Agents
participant Object as Rigid Objects
User->>TrainScript: Launch with task config
TrainScript->>TrainScript: Parse args & load config
TrainScript->>HARL: Initialize runner with algo
HARL->>Env: Create environment
Env->>Robots: Setup Anymal-C/H1 robots
Env->>Object: Setup bar/box objects
loop Training Episodes
HARL->>Env: Reset environment
Env->>Robots: Reset robot states
Env->>Object: Reset object states
loop Episode Steps
HARL->>Env: Get observations
Env->>Robots: Read joint/velocity data
Env-->>HARL: Return per-agent obs dict
HARL->>HARL: Compute actions per agent
HARL->>Env: Step(actions dict)
Env->>Robots: Apply joint targets/torques
Env->>Env: Physics simulation
Env->>Env: Compute rewards
Env-->>HARL: Return obs, rewards, dones
end
HARL->>HARL: Update policies (HAPPO/HATRPO/etc)
end
HARL-->>User: Trained models
Additional Comments (1)
-
source/isaaclab_tasks/isaaclab_tasks/direct/anymal_c/__init__.py, line 6-8 (link)syntax: Docstring says "Ant locomotion environment" but this is for Anymal-C environments
11 files reviewed, 10 comments
source/isaaclab_tasks/isaaclab_tasks/direct/anymal_c/anymal_c_happo_env.py
Outdated
Show resolved
Hide resolved
...saaclab_tasks/isaaclab_tasks/direct/multi_agent_coordination/anymal_c_multi_agent_bar_env.py
Outdated
Show resolved
Hide resolved
source/isaaclab_tasks/isaaclab_tasks/direct/multi_agent_coordination/h1_anymal_push_env.py
Outdated
Show resolved
Hide resolved
source/isaaclab_tasks/isaaclab_tasks/direct/multi_agent_coordination/__init__.py
Outdated
Show resolved
Hide resolved
source/isaaclab_tasks/isaaclab_tasks/direct/multi_agent_coordination/agents/__init__.py
Outdated
Show resolved
Hide resolved
source/isaaclab_tasks/isaaclab_tasks/direct/multi_agent_coordination/h1_anymal_push_env.py
Outdated
Show resolved
Hide resolved
source/isaaclab_tasks/isaaclab_tasks/direct/multi_agent_coordination/__init__.py
Show resolved
Hide resolved
…fixed incorrect in-place tensor modification
…Contribution into harl_integration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
Integrates HARL (Heterogeneous-Agent Reinforcement Learning) library for multi-agent coordination in IsaacLab, adding support for algorithms like HAPPO, HATRPO, and MAPPO.
Key Changes:
- Added HARL dependency from DIRECTLab's customized fork to
setup.py - Implemented training and inference scripts (
train.py,play.py) with Hydra configuration support - Created three new environments:
- Single-agent Anymal-C with HAPPO (for policy pre-training)
- Two-agent Anymal-C bar carrying (homogeneous collaboration)
- Heterogeneous H1 humanoid + Anymal-C box pushing
- Properly handles multi-agent observations, actions, and rewards through
DirectMARLEnv - Includes HAPPO algorithm configuration with standard PPO hyperparameters
Previous Issues Addressed:
- In-place tensor mutation fixed with
.clone()in h1_anymal_push_env.py:506
Confidence Score: 5/5
- Safe to merge - well-structured multi-agent RL integration with proper environment design
- All previously identified issues have been resolved (duplicate headers, incorrect docstrings, in-place tensor mutations). Code follows IsaacLab patterns consistently, environments are properly structured with appropriate termination conditions and reward shaping, and integration with HARL library is clean through Hydra configs.
- No files require special attention
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| source/isaaclab_rl/setup.py | 5/5 | Added HARL dependency from DIRECTLab GitHub repository |
| scripts/reinforcement_learning/harl/train.py | 5/5 | Training script for HARL algorithms with video recording, adversarial training modes, and Hydra config integration |
| scripts/reinforcement_learning/harl/play.py | 5/5 | Inference script for trained HARL agents with RNN state handling and action batching |
| source/isaaclab_tasks/isaaclab_tasks/direct/anymal_c/anymal_c_happo_env.py | 5/5 | Single-agent Anymal-C environment adapted for HAPPO algorithm with velocity tracking rewards |
| source/isaaclab_tasks/isaaclab_tasks/direct/multi_agent_coordination/anymal_c_multi_agent_bar_env.py | 5/5 | Two-agent Anymal-C bar carrying environment with shared rewards and bar stability terminations |
| source/isaaclab_tasks/isaaclab_tasks/direct/multi_agent_coordination/h1_anymal_push_env.py | 5/5 | Heterogeneous multi-agent environment with H1 humanoid and Anymal-C pushing a box, includes fixed in-place tensor mutation |
Sequence Diagram
sequenceDiagram
participant User
participant TrainScript as train.py
participant AppLauncher
participant HydraConfig as Hydra Config
participant Runner as HARL Runner
participant Env as DirectMARLEnv
participant Agents as HARL Agents
User->>TrainScript: Execute with task and algorithm args
TrainScript->>AppLauncher: Initialize simulation app
AppLauncher->>TrainScript: App ready
TrainScript->>HydraConfig: Load env_cfg and agent_cfg
HydraConfig->>TrainScript: Return configs
TrainScript->>TrainScript: Merge CLI args with agent_cfg
TrainScript->>Runner: Create RUNNER_REGISTRY[algorithm](args, algo_args, env_args)
Runner->>Env: Initialize DirectMARLEnv
Env->>Env: Setup scene with robots, sensors, objects
Env-->>Runner: Environment ready
Runner->>Agents: Initialize actor networks for each agent
Agents-->>Runner: Actors ready
loop Training Loop
Runner->>Env: Collect rollout data
Env->>Agents: Get actions from actors
Agents-->>Env: Return actions
Env->>Env: Step simulation
Env-->>Runner: Return obs, rewards, dones
Runner->>Runner: Store in buffer
Runner->>Agents: Update actor/critic networks
Agents-->>Runner: Updated
alt Video Interval
Runner->>Env: Record video
end
alt Save Interval
Runner->>Runner: Save model checkpoint
end
end
Runner->>Runner: Close and cleanup
TrainScript->>AppLauncher: Close simulation
11 files reviewed, no comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
Integrates HARL (Heterogeneous-Agent Reinforcement Learning) library for multi-agent reinforcement learning into IsaacLab. Adds dependency on DIRECTLab's customized HARL library and implements two multi-agent coordination environments: Anymal-C bar carrying and heterogeneous H1-Anymal push tasks. Also includes single-agent HAPPO variant of Anymal-C for easier policy transfer.
Key Changes:
- Added HARL dependency to
setup.pyvia git reference to DIRECTLab/HARL - New training and evaluation scripts (
train.py,play.py) with adversarial training modes - Three new environments: Anymal-C HAPPO (single agent), Anymal-C bar carrying (2 agents), H1-Anymal push (heterogeneous agents)
- HAPPO configuration file with PPO hyperparameters and network architecture
- Proper use of
DirectMARLEnvbase class with multi-agent observation/action spaces - Visualization markers for debugging collaborative tasks
Issues Found:
CONTRIBUTORS.mdhas duplicate content appended (lines 170-190) that must be removed- Previous style comments about duplicate copyright headers already addressed in earlier review
Confidence Score: 4/5
- Safe to merge after fixing the duplicate content in CONTRIBUTORS.md
- Implementation is solid with proper multi-agent RL architecture, clean integration with existing codebase, and appropriate reward shaping. The only blocking issue is the duplicate content in CONTRIBUTORS.md which is trivial to fix. Code follows IsaacLab conventions and the tensor modification issue mentioned in previous comments was already fixed using
.clone() - CONTRIBUTORS.md requires removal of duplicate content (lines 170-190)
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| CONTRIBUTORS.md | 2/5 | Adds 3 contributors alphabetically but contains duplicate content (lines 170-190) that must be removed |
| source/isaaclab_rl/setup.py | 5/5 | Adds HARL library dependency from DIRECTLab GitHub repo, cleanly integrated with existing RL frameworks |
| scripts/reinforcement_learning/harl/train.py | 5/5 | New training script for HARL algorithms with comprehensive CLI arguments and adversarial training support |
| source/isaaclab_tasks/isaaclab_tasks/direct/multi_agent_coordination/anymal_c_multi_agent_bar_env.py | 5/5 | Two Anymal-C robots collaboratively carrying a bar with reward shaping for coordination and velocity tracking |
| source/isaaclab_tasks/isaaclab_tasks/direct/multi_agent_coordination/h1_anymal_push_env.py | 5/5 | Heterogeneous multi-agent environment with H1 humanoid and Anymal-C pushing objects collaboratively |
Sequence Diagram
sequenceDiagram
participant User
participant TrainScript as train.py
participant AppLauncher
participant HARLRunner as HARL Runner
participant DirectMARLEnv as Multi-Agent Env
participant Robot1 as Agent 1
participant Robot2 as Agent 2
participant Object as Shared Object
User->>TrainScript: Execute with task & algorithm
TrainScript->>AppLauncher: Launch Isaac Sim
AppLauncher-->>TrainScript: Simulation ready
TrainScript->>DirectMARLEnv: Create environment with config
DirectMARLEnv->>Robot1: Spawn robot_0 at position
DirectMARLEnv->>Robot2: Spawn robot_1 at position
DirectMARLEnv->>Object: Spawn shared bar/object
TrainScript->>HARLRunner: Initialize with algo config
loop Training Episodes
HARLRunner->>DirectMARLEnv: reset()
DirectMARLEnv-->>HARLRunner: Initial observations
loop Episode Steps
HARLRunner->>DirectMARLEnv: step(actions)
DirectMARLEnv->>Robot1: Apply actions[agent_0]
DirectMARLEnv->>Robot2: Apply actions[agent_1]
DirectMARLEnv->>DirectMARLEnv: Simulate physics
DirectMARLEnv->>DirectMARLEnv: Compute rewards (coordination)
DirectMARLEnv-->>HARLRunner: obs, rewards, dones, info
HARLRunner->>HARLRunner: Update policy (HAPPO/MAPPO)
end
HARLRunner->>HARLRunner: Log episode metrics
HARLRunner->>TrainScript: Save checkpoint (if interval)
end
TrainScript->>User: Training complete
Additional Comments (1)
-
CONTRIBUTORS.md, line 170-190 (link)logic: Duplicate content from lines 1-38 was accidentally appended to the end of the file
1 file reviewed, 1 comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR integrates the HARL (Heterogeneous-Agent Reinforcement Learning) library into IsaacLab, enabling multi-agent reinforcement learning algorithms like HAPPO, HATRPO, and MAPPO. The integration adds two new multi-agent coordination environments (Anymal-C bar carrying and H1-Anymal heterogeneous push) along with a single-agent HAPPO environment for Anymal-C locomotion.
Key Changes:
- Adds HARL dependency from DIRECTLab's customized fork in
setup.py - Implements
DirectMARLEnvsubclasses for multi-agent coordination tasks - Provides training and evaluation scripts (
train.py,play.py) with adversarial training modes - Includes comprehensive HAPPO algorithm configuration in YAML format
- Registers new Gym environments for multi-agent scenarios
Previous Issues Addressed:
- Duplicate copyright headers in all new files (style issue)
- In-place tensor mutation fixed in
h1_anymal_push_env.py:514with.clone() - Incorrect docstring in
__init__.pycorrected from "Ant locomotion" to appropriate descriptions
Remaining Minor Issues:
setup.py:49has a comma after an inline comment, which is valid Python but inconsistent with the style on line 48
Confidence Score: 5/5
- This PR is safe to merge with minimal risk
- The implementation is well-structured and follows IsaacLab's patterns for RL environments. All critical issues from previous reviews have been addressed, including the in-place tensor mutation and copyright headers. The code properly uses
.clone()for tensor operations, correctly registers environments with Gymnasium, and integrates cleanly with the existing codebase. The only remaining issue is a minor style inconsistency insetup.pywith comma placement after comments, which doesn't affect functionality. - No files require special attention - all previously identified issues have been resolved
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| source/isaaclab_rl/setup.py | 4/5 | Adds HARL dependency from DIRECTLab's fork, minor syntax issue with comma placement after comment |
| source/isaaclab_tasks/isaaclab_tasks/direct/anymal_c/anymal_c_happo_env.py | 5/5 | Implements single-agent HAPPO environment for Anymal-C locomotion with velocity tracking |
| scripts/reinforcement_learning/harl/train.py | 5/5 | Training script for HARL algorithms with adversarial training modes and experiment management |
| scripts/reinforcement_learning/harl/play.py | 5/5 | Inference/evaluation script for trained HARL models with video recording support |
| source/isaaclab_tasks/isaaclab_tasks/direct/multi_agent_coordination/anymal_c_multi_agent_bar_env.py | 5/5 | Multi-agent environment where two Anymal-C robots cooperatively carry a bar to a target location |
| source/isaaclab_tasks/isaaclab_tasks/direct/multi_agent_coordination/h1_anymal_push_env.py | 4/5 | Heterogeneous multi-agent environment with H1 humanoid and Anymal-C pushing objects, has one in-place tensor mutation issue |
Sequence Diagram
sequenceDiagram
participant User
participant TrainScript as train.py
participant AppLauncher
participant Gym as gymnasium
participant Env as DirectMARLEnv
participant HARL as HARL Runner
participant Config as YAML Config
User->>TrainScript: Execute with task & algorithm params
TrainScript->>AppLauncher: Initialize Isaac Sim environment
AppLauncher-->>TrainScript: Simulation ready
TrainScript->>Gym: gym.make(task_name)
Gym->>Env: Create environment instance
Note over Env: Anymal-C HAPPO / Multi-Agent Bar / H1-Anymal Push
Env->>Env: Initialize robots, sensors, scene
Env-->>Gym: Environment ready
Gym-->>TrainScript: Environment handle
TrainScript->>Config: Load harl_happo_cfg.yaml
Config-->>TrainScript: Training hyperparameters
TrainScript->>HARL: Initialize runner with env & config
loop Training Episodes
HARL->>Env: reset()
Env-->>HARL: Initial observations per agent
loop Episode Steps
HARL->>HARL: Compute actions (HAPPO/HATRPO/etc)
HARL->>Env: step(actions_dict)
Env->>Env: Apply actions to agents
Env->>Env: Simulate physics
Env->>Env: Compute rewards & observations
Env-->>HARL: obs, rewards, dones, infos (per agent)
HARL->>HARL: Store transitions in buffer
end
HARL->>HARL: Update policy (PPO/TRPO epochs)
HARL->>TrainScript: Log metrics
end
TrainScript->>TrainScript: Save trained models
TrainScript-->>User: Training complete
1 file reviewed, no comments
|
hey!! @Isaacwilliam4 this is really cool! I will try to run it and review before next week! Thanks a lot for the PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR integrates HARL (Hierarchical Accelerated Reinforcement Learning) multi-agent coordination capabilities into IsaacLab. It adds two new multi-agent coordination environments (Anymal-C bar carrying and H1-Anymal push) and a HAPPO-compatible version of the Anymal-C walking environment.
Key Changes:
- Added HARL library dependency from DIRECTLab GitHub repository
- Implemented three new environments for multi-agent collaboration
- Created training and inference scripts for HARL algorithms
- Added three new contributors to CONTRIBUTORS.md
Issues Already Flagged:
- Duplicate copyright headers across multiple new files (style)
- In-place tensor mutation in
h1_anymal_push_env.py:514that may affect gradient computation (logic) - Incorrect docstring in multi-agent coordination
__init__.py(syntax)
The integration follows IsaacLab patterns and properly registers new environments. The main concern is the in-place tensor mutation issue which has been flagged as a logic error.
Confidence Score: 4/5
- This PR is safe to merge with minor issues that should be addressed
- The integration is well-structured and follows IsaacLab conventions. Most issues are non-critical style concerns (duplicate copyright headers). The one logic issue with in-place tensor mutation has been flagged and should be fixed. The dependency addition is properly configured as an optional install.
- Pay attention to
h1_anymal_push_env.pyfor the in-place tensor mutation fix. Other files mainly need copyright header cleanup.
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| CONTRIBUTORS.md | 5/5 | Added three new contributors in correct alphabetical order, removed trailing newline |
| source/isaaclab_rl/setup.py | 5/5 | Added HARL dependency from DIRECTLab GitHub repository as optional install |
| source/isaaclab_tasks/isaaclab_tasks/direct/anymal_c/anymal_c_happo_env.py | 4/5 | New HAPPO environment implementation for Anymal-C, includes duplicate copyright headers |
| source/isaaclab_tasks/isaaclab_tasks/direct/multi_agent_coordination/anymal_c_multi_agent_bar_env.py | 4/5 | Multi-agent bar carrying environment implementation, has duplicate copyright headers |
| source/isaaclab_tasks/isaaclab_tasks/direct/multi_agent_coordination/h1_anymal_push_env.py | 3/5 | H1-Anymal push environment with duplicate headers and in-place tensor mutation issue |
Sequence Diagram
sequenceDiagram
participant User
participant TrainScript as train.py
participant GymRegistry as Gym Registry
participant HappoEnv as HAPPO/MultiAgent Env
participant HARLLib as HARL Library
participant IsaacSim as Isaac Sim
User->>TrainScript: Execute training command
TrainScript->>GymRegistry: Register environments
GymRegistry->>GymRegistry: Isaac-Velocity-Flat-Anymal-C-Happo-Direct-v0
GymRegistry->>GymRegistry: Isaac-Multi-Agent-Anymal-Bar-Direct-v0
GymRegistry->>GymRegistry: Isaac-Multi-Agent-H1-Anymal-Push-Direct-v0
TrainScript->>GymRegistry: gym.make(env_id)
GymRegistry->>HappoEnv: Create environment instance
HappoEnv->>IsaacSim: Initialize simulation
IsaacSim-->>HappoEnv: Simulation ready
HappoEnv-->>TrainScript: Environment ready
TrainScript->>HARLLib: Initialize HAPPO agent
HARLLib-->>TrainScript: Agent ready
loop Training Episodes
TrainScript->>HappoEnv: reset()
HappoEnv->>IsaacSim: Reset scene
IsaacSim-->>HappoEnv: Initial observations
HappoEnv-->>TrainScript: Multi-agent observations
loop Episode Steps
TrainScript->>HARLLib: Get actions for all agents
HARLLib-->>TrainScript: Multi-agent actions
TrainScript->>HappoEnv: step(actions)
HappoEnv->>IsaacSim: Apply actions & simulate
IsaacSim-->>HappoEnv: State updates
HappoEnv->>HappoEnv: Calculate rewards & observations
HappoEnv-->>TrainScript: obs, rewards, dones, info
TrainScript->>HARLLib: Store experience & update
end
end
TrainScript->>User: Training complete
1 file reviewed, no comments
|
The work looks good to me from high level, Ill get back to you once heard a resolution! |
Description
Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context.
List any dependencies that are required for this change.
The one dependency added is that of our customized HARL library for isaaclab, found here -> https://github.com/DIRECTLab/HARL. This pull request updates the setup.py to include it.
This is the integration of the HARL algorithms for multi-agent collaboration as outline in the discussion post here -> #2418 (comment). It includes the 2 multi-agent coordination environments we show in the paper, along with a happo version of the anymal c walking env for easier integration of the anymal c walking policy with the multi-agent anymal c bar carrying env.
Type of change
Screenshots
Please attach before and after screenshots of the change if applicable.

Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists there