Harl integration #3890

Isaacwilliam4 · 2025-10-30T00:57:47Z

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context.
List any dependencies that are required for this change.

The one dependency added is that of our customized HARL library for isaaclab, found here -> https://github.com/DIRECTLab/HARL. This pull request updates the setup.py to include it.
This is the integration of the HARL algorithms for multi-agent collaboration as outline in the discussion post here -> #2418 (comment). It includes the 2 multi-agent coordination environments we show in the paper, along with a happo version of the anymal c walking env for easier integration of the anymal c walking policy with the multi-agent anymal c bar carrying env.

Type of change

New feature (non-breaking change which adds functionality)

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist

I have read and understood the contribution guidelines
I have run the pre-commit checks with ./isaaclab.sh --format
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have updated the changelog and the corresponding version in the extension's config/extension.toml file
I have added my name to the CONTRIBUTORS.md or my name already exists there

…nation

…bar env

greptile-apps

Greptile Overview

Greptile Summary

This PR integrates the HARL (Heterogeneous-Agent Reinforcement Learning) library for multi-agent coordination tasks in IsaacLab. The implementation adds:

HARL library dependency via customized fork from DIRECTLab
Training/inference scripts for HAPPO, HATRPO, HAA2C, MAPPO algorithms with adversarial training modes
Three new environments: Single-agent HAPPO Anymal-C, two-agent bar carrying (2x Anymal-C), and heterogeneous push task (H1 + Anymal-C)
Multi-agent coordination capabilities using the DirectMARLEnv base class with per-agent observation/action spaces

Key Issues Found:

Duplicate copyright headers across all new Python files (non-critical style issue)
In-place tensor mutation in h1_anymal_push_env.py:514 that may cause gradient computation issues
Incorrect docstrings copied from template ("Ant locomotion" instead of actual environment names)

Confidence Score: 4/5

This PR is mostly safe to merge with one logical issue requiring attention
Score reflects well-structured multi-agent implementation following IsaacLab patterns, but docked one point for the tensor mutation bug that could cause training instability. The duplicate headers and docstring errors are style issues that don't affect functionality.
Pay close attention to h1_anymal_push_env.py:514 - the in-place tensor mutation needs to be fixed before merge to avoid potential gradient issues during training

Important Files Changed

File Analysis

Filename	Score	Overview
scripts/reinforcement_learning/harl/train.py	4/5	New HARL training script with adversarial training support, has duplicate copyright headers
source/isaaclab_tasks/isaaclab_tasks/direct/anymal_c/anymal_c_happo_env.py	4/5	HAPPO-compatible single-agent Anymal-C environment, has duplicate copyright headers and incorrect docstring
source/isaaclab_tasks/isaaclab_tasks/direct/multi_agent_coordination/anymal_c_multi_agent_bar_env.py	4/5	Multi-agent bar carrying environment with 2 Anymal-C robots, has duplicate copyright headers
source/isaaclab_tasks/isaaclab_tasks/direct/multi_agent_coordination/h1_anymal_push_env.py	3/5	Heterogeneous multi-agent push environment with H1 and Anymal-C, has in-place tensor mutation issue and duplicate copyright headers

Sequence Diagram

sequenceDiagram
    participant User
    participant TrainScript as train.py/play.py
    participant HARL as HARL Library
    participant Env as Multi-Agent Env
    participant Robots as Robot Agents
    participant Object as Rigid Objects

    User->>TrainScript: Launch with task config
    TrainScript->>TrainScript: Parse args & load config
    TrainScript->>HARL: Initialize runner with algo
    HARL->>Env: Create environment
    Env->>Robots: Setup Anymal-C/H1 robots
    Env->>Object: Setup bar/box objects
    
    loop Training Episodes
        HARL->>Env: Reset environment
        Env->>Robots: Reset robot states
        Env->>Object: Reset object states
        
        loop Episode Steps
            HARL->>Env: Get observations
            Env->>Robots: Read joint/velocity data
            Env-->>HARL: Return per-agent obs dict
            HARL->>HARL: Compute actions per agent
            HARL->>Env: Step(actions dict)
            Env->>Robots: Apply joint targets/torques
            Env->>Env: Physics simulation
            Env->>Env: Compute rewards
            Env-->>HARL: Return obs, rewards, dones
        end
        
        HARL->>HARL: Update policies (HAPPO/HATRPO/etc)
    end
    
    HARL-->>User: Trained models

Additional Comments (1)

source/isaaclab_tasks/isaaclab_tasks/direct/anymal_c/__init__.py, line 6-8 (link)

syntax: Docstring says "Ant locomotion environment" but this is for Anymal-C environments

_{11 files reviewed, 10 comments}

_{Edit Code Review Agent Settings | Greptile}

scripts/reinforcement_learning/harl/train.py

scripts/reinforcement_learning/harl/play.py

source/isaaclab_tasks/isaaclab_tasks/direct/anymal_c/anymal_c_happo_env.py

...saaclab_tasks/isaaclab_tasks/direct/multi_agent_coordination/anymal_c_multi_agent_bar_env.py

source/isaaclab_tasks/isaaclab_tasks/direct/multi_agent_coordination/h1_anymal_push_env.py

source/isaaclab_tasks/isaaclab_tasks/direct/multi_agent_coordination/__init__.py

source/isaaclab_tasks/isaaclab_tasks/direct/multi_agent_coordination/agents/__init__.py

source/isaaclab_tasks/isaaclab_tasks/direct/multi_agent_coordination/h1_anymal_push_env.py

source/isaaclab_tasks/isaaclab_tasks/direct/multi_agent_coordination/__init__.py

…fixed incorrect in-place tensor modification

…Contribution into harl_integration

greptile-apps

Greptile Overview

Greptile Summary

Integrates HARL (Heterogeneous-Agent Reinforcement Learning) library for multi-agent coordination in IsaacLab, adding support for algorithms like HAPPO, HATRPO, and MAPPO.

Key Changes:

Added HARL dependency from DIRECTLab's customized fork to setup.py
Implemented training and inference scripts (train.py, play.py) with Hydra configuration support
Created three new environments:
- Single-agent Anymal-C with HAPPO (for policy pre-training)
- Two-agent Anymal-C bar carrying (homogeneous collaboration)
- Heterogeneous H1 humanoid + Anymal-C box pushing
Properly handles multi-agent observations, actions, and rewards through DirectMARLEnv
Includes HAPPO algorithm configuration with standard PPO hyperparameters

Previous Issues Addressed:

In-place tensor mutation fixed with .clone() in h1_anymal_push_env.py:506

Confidence Score: 5/5

Safe to merge - well-structured multi-agent RL integration with proper environment design
All previously identified issues have been resolved (duplicate headers, incorrect docstrings, in-place tensor mutations). Code follows IsaacLab patterns consistently, environments are properly structured with appropriate termination conditions and reward shaping, and integration with HARL library is clean through Hydra configs.
No files require special attention

Important Files Changed

File Analysis

Filename	Score	Overview
source/isaaclab_rl/setup.py	5/5	Added HARL dependency from DIRECTLab GitHub repository
scripts/reinforcement_learning/harl/train.py	5/5	Training script for HARL algorithms with video recording, adversarial training modes, and Hydra config integration
scripts/reinforcement_learning/harl/play.py	5/5	Inference script for trained HARL agents with RNN state handling and action batching
source/isaaclab_tasks/isaaclab_tasks/direct/anymal_c/anymal_c_happo_env.py	5/5	Single-agent Anymal-C environment adapted for HAPPO algorithm with velocity tracking rewards
source/isaaclab_tasks/isaaclab_tasks/direct/multi_agent_coordination/anymal_c_multi_agent_bar_env.py	5/5	Two-agent Anymal-C bar carrying environment with shared rewards and bar stability terminations
source/isaaclab_tasks/isaaclab_tasks/direct/multi_agent_coordination/h1_anymal_push_env.py	5/5	Heterogeneous multi-agent environment with H1 humanoid and Anymal-C pushing a box, includes fixed in-place tensor mutation

Sequence Diagram

sequenceDiagram
    participant User
    participant TrainScript as train.py
    participant AppLauncher
    participant HydraConfig as Hydra Config
    participant Runner as HARL Runner
    participant Env as DirectMARLEnv
    participant Agents as HARL Agents

    User->>TrainScript: Execute with task and algorithm args
    TrainScript->>AppLauncher: Initialize simulation app
    AppLauncher->>TrainScript: App ready
    TrainScript->>HydraConfig: Load env_cfg and agent_cfg
    HydraConfig->>TrainScript: Return configs
    TrainScript->>TrainScript: Merge CLI args with agent_cfg
    TrainScript->>Runner: Create RUNNER_REGISTRY[algorithm](args, algo_args, env_args)
    Runner->>Env: Initialize DirectMARLEnv
    Env->>Env: Setup scene with robots, sensors, objects
    Env-->>Runner: Environment ready
    Runner->>Agents: Initialize actor networks for each agent
    Agents-->>Runner: Actors ready
    
    loop Training Loop
        Runner->>Env: Collect rollout data
        Env->>Agents: Get actions from actors
        Agents-->>Env: Return actions
        Env->>Env: Step simulation
        Env-->>Runner: Return obs, rewards, dones
        Runner->>Runner: Store in buffer
        Runner->>Agents: Update actor/critic networks
        Agents-->>Runner: Updated
        alt Video Interval
            Runner->>Env: Record video
        end
        alt Save Interval
            Runner->>Runner: Save model checkpoint
        end
    end
    
    Runner->>Runner: Close and cleanup
    TrainScript->>AppLauncher: Close simulation

_{11 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps

Greptile Overview

Greptile Summary

Integrates HARL (Heterogeneous-Agent Reinforcement Learning) library for multi-agent reinforcement learning into IsaacLab. Adds dependency on DIRECTLab's customized HARL library and implements two multi-agent coordination environments: Anymal-C bar carrying and heterogeneous H1-Anymal push tasks. Also includes single-agent HAPPO variant of Anymal-C for easier policy transfer.

Key Changes:

Added HARL dependency to setup.py via git reference to DIRECTLab/HARL
New training and evaluation scripts (train.py, play.py) with adversarial training modes
Three new environments: Anymal-C HAPPO (single agent), Anymal-C bar carrying (2 agents), H1-Anymal push (heterogeneous agents)
HAPPO configuration file with PPO hyperparameters and network architecture
Proper use of DirectMARLEnv base class with multi-agent observation/action spaces
Visualization markers for debugging collaborative tasks

Issues Found:

CONTRIBUTORS.md has duplicate content appended (lines 170-190) that must be removed
Previous style comments about duplicate copyright headers already addressed in earlier review

Confidence Score: 4/5

Safe to merge after fixing the duplicate content in CONTRIBUTORS.md
Implementation is solid with proper multi-agent RL architecture, clean integration with existing codebase, and appropriate reward shaping. The only blocking issue is the duplicate content in CONTRIBUTORS.md which is trivial to fix. Code follows IsaacLab conventions and the tensor modification issue mentioned in previous comments was already fixed using .clone()
CONTRIBUTORS.md requires removal of duplicate content (lines 170-190)

Important Files Changed

File Analysis

Filename	Score	Overview
CONTRIBUTORS.md	2/5	Adds 3 contributors alphabetically but contains duplicate content (lines 170-190) that must be removed
source/isaaclab_rl/setup.py	5/5	Adds HARL library dependency from DIRECTLab GitHub repo, cleanly integrated with existing RL frameworks
scripts/reinforcement_learning/harl/train.py	5/5	New training script for HARL algorithms with comprehensive CLI arguments and adversarial training support
source/isaaclab_tasks/isaaclab_tasks/direct/multi_agent_coordination/anymal_c_multi_agent_bar_env.py	5/5	Two Anymal-C robots collaboratively carrying a bar with reward shaping for coordination and velocity tracking
source/isaaclab_tasks/isaaclab_tasks/direct/multi_agent_coordination/h1_anymal_push_env.py	5/5	Heterogeneous multi-agent environment with H1 humanoid and Anymal-C pushing objects collaboratively

Sequence Diagram

sequenceDiagram
    participant User
    participant TrainScript as train.py
    participant AppLauncher
    participant HARLRunner as HARL Runner
    participant DirectMARLEnv as Multi-Agent Env
    participant Robot1 as Agent 1
    participant Robot2 as Agent 2
    participant Object as Shared Object

    User->>TrainScript: Execute with task & algorithm
    TrainScript->>AppLauncher: Launch Isaac Sim
    AppLauncher-->>TrainScript: Simulation ready
    TrainScript->>DirectMARLEnv: Create environment with config
    DirectMARLEnv->>Robot1: Spawn robot_0 at position
    DirectMARLEnv->>Robot2: Spawn robot_1 at position
    DirectMARLEnv->>Object: Spawn shared bar/object
    TrainScript->>HARLRunner: Initialize with algo config
    
    loop Training Episodes
        HARLRunner->>DirectMARLEnv: reset()
        DirectMARLEnv-->>HARLRunner: Initial observations
        
        loop Episode Steps
            HARLRunner->>DirectMARLEnv: step(actions)
            DirectMARLEnv->>Robot1: Apply actions[agent_0]
            DirectMARLEnv->>Robot2: Apply actions[agent_1]
            DirectMARLEnv->>DirectMARLEnv: Simulate physics
            DirectMARLEnv->>DirectMARLEnv: Compute rewards (coordination)
            DirectMARLEnv-->>HARLRunner: obs, rewards, dones, info
            HARLRunner->>HARLRunner: Update policy (HAPPO/MAPPO)
        end
        
        HARLRunner->>HARLRunner: Log episode metrics
        HARLRunner->>TrainScript: Save checkpoint (if interval)
    end
    
    TrainScript->>User: Training complete

Additional Comments (1)

CONTRIBUTORS.md, line 170-190 (link)

logic: Duplicate content from lines 1-38 was accidentally appended to the end of the file

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps

Greptile Overview

Greptile Summary

This PR integrates the HARL (Heterogeneous-Agent Reinforcement Learning) library into IsaacLab, enabling multi-agent reinforcement learning algorithms like HAPPO, HATRPO, and MAPPO. The integration adds two new multi-agent coordination environments (Anymal-C bar carrying and H1-Anymal heterogeneous push) along with a single-agent HAPPO environment for Anymal-C locomotion.

Key Changes:

Adds HARL dependency from DIRECTLab's customized fork in setup.py
Implements DirectMARLEnv subclasses for multi-agent coordination tasks
Provides training and evaluation scripts (train.py, play.py) with adversarial training modes
Includes comprehensive HAPPO algorithm configuration in YAML format
Registers new Gym environments for multi-agent scenarios

Previous Issues Addressed:

Duplicate copyright headers in all new files (style issue)
In-place tensor mutation fixed in h1_anymal_push_env.py:514 with .clone()
Incorrect docstring in __init__.py corrected from "Ant locomotion" to appropriate descriptions

Remaining Minor Issues:

setup.py:49 has a comma after an inline comment, which is valid Python but inconsistent with the style on line 48

Confidence Score: 5/5

This PR is safe to merge with minimal risk
The implementation is well-structured and follows IsaacLab's patterns for RL environments. All critical issues from previous reviews have been addressed, including the in-place tensor mutation and copyright headers. The code properly uses .clone() for tensor operations, correctly registers environments with Gymnasium, and integrates cleanly with the existing codebase. The only remaining issue is a minor style inconsistency in setup.py with comma placement after comments, which doesn't affect functionality.
No files require special attention - all previously identified issues have been resolved

Important Files Changed

File Analysis

Filename	Score	Overview
source/isaaclab_rl/setup.py	4/5	Adds HARL dependency from DIRECTLab's fork, minor syntax issue with comma placement after comment
source/isaaclab_tasks/isaaclab_tasks/direct/anymal_c/anymal_c_happo_env.py	5/5	Implements single-agent HAPPO environment for Anymal-C locomotion with velocity tracking
scripts/reinforcement_learning/harl/train.py	5/5	Training script for HARL algorithms with adversarial training modes and experiment management
scripts/reinforcement_learning/harl/play.py	5/5	Inference/evaluation script for trained HARL models with video recording support
source/isaaclab_tasks/isaaclab_tasks/direct/multi_agent_coordination/anymal_c_multi_agent_bar_env.py	5/5	Multi-agent environment where two Anymal-C robots cooperatively carry a bar to a target location
source/isaaclab_tasks/isaaclab_tasks/direct/multi_agent_coordination/h1_anymal_push_env.py	4/5	Heterogeneous multi-agent environment with H1 humanoid and Anymal-C pushing objects, has one in-place tensor mutation issue

Sequence Diagram

sequenceDiagram
    participant User
    participant TrainScript as train.py
    participant AppLauncher
    participant Gym as gymnasium
    participant Env as DirectMARLEnv
    participant HARL as HARL Runner
    participant Config as YAML Config

    User->>TrainScript: Execute with task & algorithm params
    TrainScript->>AppLauncher: Initialize Isaac Sim environment
    AppLauncher-->>TrainScript: Simulation ready
    
    TrainScript->>Gym: gym.make(task_name)
    Gym->>Env: Create environment instance
    Note over Env: Anymal-C HAPPO / Multi-Agent Bar / H1-Anymal Push
    Env->>Env: Initialize robots, sensors, scene
    Env-->>Gym: Environment ready
    Gym-->>TrainScript: Environment handle
    
    TrainScript->>Config: Load harl_happo_cfg.yaml
    Config-->>TrainScript: Training hyperparameters
    
    TrainScript->>HARL: Initialize runner with env & config
    
    loop Training Episodes
        HARL->>Env: reset()
        Env-->>HARL: Initial observations per agent
        
        loop Episode Steps
            HARL->>HARL: Compute actions (HAPPO/HATRPO/etc)
            HARL->>Env: step(actions_dict)
            Env->>Env: Apply actions to agents
            Env->>Env: Simulate physics
            Env->>Env: Compute rewards & observations
            Env-->>HARL: obs, rewards, dones, infos (per agent)
            HARL->>HARL: Store transitions in buffer
        end
        
        HARL->>HARL: Update policy (PPO/TRPO epochs)
        HARL->>TrainScript: Log metrics
    end
    
    TrainScript->>TrainScript: Save trained models
    TrainScript-->>User: Training complete

_{1 file reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

ooctipus · 2025-10-30T04:35:32Z

hey!! @Isaacwilliam4 this is really cool! I will try to run it and review before next week! Thanks a lot for the PR

greptile-apps

Greptile Overview

Greptile Summary

This PR integrates HARL (Hierarchical Accelerated Reinforcement Learning) multi-agent coordination capabilities into IsaacLab. It adds two new multi-agent coordination environments (Anymal-C bar carrying and H1-Anymal push) and a HAPPO-compatible version of the Anymal-C walking environment.

Key Changes:

Added HARL library dependency from DIRECTLab GitHub repository
Implemented three new environments for multi-agent collaboration
Created training and inference scripts for HARL algorithms
Added three new contributors to CONTRIBUTORS.md

Issues Already Flagged:

Duplicate copyright headers across multiple new files (style)
In-place tensor mutation in h1_anymal_push_env.py:514 that may affect gradient computation (logic)
Incorrect docstring in multi-agent coordination __init__.py (syntax)

The integration follows IsaacLab patterns and properly registers new environments. The main concern is the in-place tensor mutation issue which has been flagged as a logic error.

Confidence Score: 4/5

This PR is safe to merge with minor issues that should be addressed
The integration is well-structured and follows IsaacLab conventions. Most issues are non-critical style concerns (duplicate copyright headers). The one logic issue with in-place tensor mutation has been flagged and should be fixed. The dependency addition is properly configured as an optional install.
Pay attention to h1_anymal_push_env.py for the in-place tensor mutation fix. Other files mainly need copyright header cleanup.

Important Files Changed

File Analysis

Filename	Score	Overview
CONTRIBUTORS.md	5/5	Added three new contributors in correct alphabetical order, removed trailing newline
source/isaaclab_rl/setup.py	5/5	Added HARL dependency from DIRECTLab GitHub repository as optional install
source/isaaclab_tasks/isaaclab_tasks/direct/anymal_c/anymal_c_happo_env.py	4/5	New HAPPO environment implementation for Anymal-C, includes duplicate copyright headers
source/isaaclab_tasks/isaaclab_tasks/direct/multi_agent_coordination/anymal_c_multi_agent_bar_env.py	4/5	Multi-agent bar carrying environment implementation, has duplicate copyright headers
source/isaaclab_tasks/isaaclab_tasks/direct/multi_agent_coordination/h1_anymal_push_env.py	3/5	H1-Anymal push environment with duplicate headers and in-place tensor mutation issue

Sequence Diagram

sequenceDiagram
    participant User
    participant TrainScript as train.py
    participant GymRegistry as Gym Registry
    participant HappoEnv as HAPPO/MultiAgent Env
    participant HARLLib as HARL Library
    participant IsaacSim as Isaac Sim

    User->>TrainScript: Execute training command
    TrainScript->>GymRegistry: Register environments
    GymRegistry->>GymRegistry: Isaac-Velocity-Flat-Anymal-C-Happo-Direct-v0
    GymRegistry->>GymRegistry: Isaac-Multi-Agent-Anymal-Bar-Direct-v0
    GymRegistry->>GymRegistry: Isaac-Multi-Agent-H1-Anymal-Push-Direct-v0
    TrainScript->>GymRegistry: gym.make(env_id)
    GymRegistry->>HappoEnv: Create environment instance
    HappoEnv->>IsaacSim: Initialize simulation
    IsaacSim-->>HappoEnv: Simulation ready
    HappoEnv-->>TrainScript: Environment ready
    TrainScript->>HARLLib: Initialize HAPPO agent
    HARLLib-->>TrainScript: Agent ready
    
    loop Training Episodes
        TrainScript->>HappoEnv: reset()
        HappoEnv->>IsaacSim: Reset scene
        IsaacSim-->>HappoEnv: Initial observations
        HappoEnv-->>TrainScript: Multi-agent observations
        
        loop Episode Steps
            TrainScript->>HARLLib: Get actions for all agents
            HARLLib-->>TrainScript: Multi-agent actions
            TrainScript->>HappoEnv: step(actions)
            HappoEnv->>IsaacSim: Apply actions & simulate
            IsaacSim-->>HappoEnv: State updates
            HappoEnv->>HappoEnv: Calculate rewards & observations
            HappoEnv-->>TrainScript: obs, rewards, dones, info
            TrainScript->>HARLLib: Store experience & update
        end
    end
    
    TrainScript->>User: Training complete

_{1 file reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

ooctipus · 2025-11-03T06:07:56Z

The work looks good to me from high level,
Though I think IsaacLab is considering the alternative way to accept community contributed environment rather than directly merge to main.

Ill get back to you once heard a resolution!

Isaac Peterson added 7 commits October 29, 2025 15:12

code integrated for multi agent coordination

a880b59

ran ./isaaclab.sh --format

978a2d1

formatting

a7ee966

more formatting

2576036

added contributors and removed script unrelated to multi agent coordi…

e186feb

…nation

added env for training anymal c with happo to use in the multi agent …

c23468a

…bar env

formatting

9960fb8

Isaacwilliam4 requested review from kellyguo11 and ooctipus as code owners October 30, 2025 00:57

github-actions bot added the isaac-lab Related to Isaac Lab team label Oct 30, 2025

Merge branch 'main' into harl_integration

5e32e0b