Skip to content

JIAlonglong/REAL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

9 Commits
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿค– REAL: Robust Extreme Agility Learning

Robust Extreme Agility via Spatio-Temporal Policy Learning and Physics-Guided Filtering

Paper License Platform


REAL enables a quadrupedal robot to chain highly dynamic parkour maneuvers across complex terrains
with nominal vision (green box), and maintain stable locomotion even under severe visual degradation (red box).

๐Ÿ“‹ Table of Contents


๐Ÿ“ฐ News

Date Update
๐Ÿ”ฅ 2026/03 Paper submitted. Under review.
๐ŸŽ‰ 2026/03 Repository created. Code will be released upon acceptance.

โœจ Highlights

๐Ÿง  Spatio-Temporal Policy Learning

A privileged teacher learns structured proprioceptionโ€“terrain associations via cross-modal attention. The distilled student uses a FiLM-modulated Mamba backbone to suppress visual noise and build short-term terrain memory.

โš›๏ธ Physics-Guided Filtering

An uncertainty-aware neural velocity estimator is fused with rigid-body dynamics through an Extended Kalman Filter (EKF), ensuring physically consistent state estimation during impacts and slippage.

๐ŸŽฏ Consistency-Aware Loss Gating

Adaptive gating between behavioral cloning and RL stabilizes policy distillation and improves sim-to-real transfer, preventing policy collapse under aggressive domain randomization.

โšก Real-Time Onboard Deployment

Bounded O(1) inference at ~13.1 ms/step on a Unitree Go2 with zero-shot sim-to-real transfer โ€” no fine-tuning required on the real robot.


๐Ÿ—๏ธ Architecture


๐ŸŽ“ Stage 1 โ€” Privileged Teacher Policy Learning: The teacher policy learns precise proprioceptionโ€“terrain associations through cross-modal attention. Proprioceptive states serve as Queries to selectively retrieve relevant terrain features encoded as Keys and Values from terrain scan dots.

๐ŸŽ’ Stage 2 โ€” Distilling Student Policy with Spatio-Temporal Reasoning: The deployable student integrates FiLM-based visualโ€“proprioceptive fusion with a Mamba temporal backbone. A physics-guided Bayesian estimator and consistency-aware loss gating further stabilize training and deployment.


๐Ÿ“Š Results

๐Ÿ”๏ธ Extreme Terrain Traversability



REAL achieves 2x the overall success rate of the best prior baseline across hurdles, steps, and gaps:

Method Hurdles SR โ†‘ Steps SR โ†‘ Gaps SR โ†‘ Overall SR โ†‘ Overall MXD โ†‘ MEV โ†“
Extreme Parkour 0.18 0.14 0.10 0.16 0.21 34.24
RPL 0.05 0.04 0.03 0.04 0.10 1.56
SoloParkour 0.42 0.49 0.36 0.39 0.34 96.93
REAL (Ours) ๐Ÿ† 0.82 0.94 0.28 0.78 0.45 18.41

๐Ÿ“Œ SR: Success Rate โ€” how often the robot reaches all target goals. MXD: Mean X-Displacement โˆˆ [0, 1], showing normalized forward progress. MEV: Mean Edge Violations โ€” average number of unsafe foot-edge contacts per episode.


๐Ÿ›ก๏ธ Robustness Under Perceptual Degradation

We evaluate policy robustness under three simulated sensor degradation conditions: frame drops, Gaussian noise, and spatial FoV occlusion.

Method Nominal SR Frame Drop SR Gaussian Noise SR FoV Occlusion SR
Extreme Parkour 0.16 0.16 (โ†“0.00) 0.11 (โ†“0.05) 0.13 (โ†“0.03)
RPL 0.04 0.01 (โ†“0.04) 0.01 (โ†“0.03) 0.01 (โ†“0.03)
SoloParkour 0.39 0.20 (โ†“0.19) 0.37 (โ†“0.03) 0.41 (โ†‘0.02)
REAL (Ours) ๐Ÿ† 0.78 0.61 (โ†“0.17) 0.51 (โ†“0.27) 0.72 (โ†“0.06)

๐Ÿ’ก Under severe FoV occlusion, REAL retains 92% of its nominal performance (0.72 vs 0.78), while vision-reliant baselines suffer catastrophic failures.


๐Ÿ™ˆ Blind-Zone Maneuvers

Vision is completely masked 1 meter before each obstacle, forcing the policy to rely on spatio-temporal memory:



Method SR โ†‘ MXD โ†‘ MEV โ†“
Extreme Parkour 0.11 0.20 44.03
SoloParkour 0.36 0.34 103.50
REAL (Ours) ๐Ÿ† 0.55 0.39 24.84

๐Ÿ‘€ Real-World Extreme Blind Test

โŒ Baseline โœ… REAL (Ours)
Fails immediately upon losing visual input Maintains robust blind traversal across obstacles

๐ŸŒ Real-World Deployment


Zero-shot sim-to-real transfer on a physical Unitree Go2 quadruped using only onboard perception and computing:

Scenario Description
๐Ÿฆ˜ (a) High Platform Leap The robot dynamically jumps onto an elevated surface
๐Ÿ“ฆ (b) Scattered Box Navigation Traversing irregularly placed obstacles
๐Ÿชœ (c) Steep Staircase Climb Ascending a steep staircase with precise foot placement

โฑ๏ธ Inference Latency


Backbone Avg. Latency Meets 20 ms Budget?
Transformer 23.07 ms โŒ No
Mamba (Ours) 13.14 ms โœ… Yes

โšก Mamba's bounded O(1) complexity eliminates the sequence-scaling bottleneck of Transformers, enabling the high-frequency reactivity required for aggressive parkour.


๐Ÿ”ฌ Ablation Study

๐Ÿงฉ Component-Level Ablation

Variant SR โ†‘ MXD โ†‘ MEV โ†“ Time โ†“ Coll. โ†“
REAL (Full) ๐Ÿ† 0.78 0.45 18.41 0.02 0.06
w/ MLP Estimator 0.73 0.43 19.34 0.02 0.06
w/o FiLM 0.44 0.51 93.43 0.28 0.06
w/o Mamba 0.51 0.47 89.96 0.26 0.05

๐Ÿ“Œ Removing Mamba causes MEV to increase nearly 5x (18โ†’90). Disabling FiLM drops SR by 44%. Both are critical for robust spatio-temporal reasoning.

๐Ÿ“ Velocity Estimation

Estimator RMSE โ†“
MLP (Baseline) 0.52
MLP + EKF 0.40
1D ResNet (Single frame) 0.33
1D ResNet (10 frames) 0.28
1D ResNet + EKF (10 frames, Ours) ๐Ÿ† 0.23

๐Ÿ“ˆ Training Convergence

๐Ÿ’ก Our consistency-aware loss gating accelerates early-stage convergence and achieves a lower final training loss compared to a fixed-weight baseline.


โš™๏ธ Training Details

Item Detail
๐Ÿ–ฅ๏ธ Simulator Isaac Gym
๐Ÿ• Robot Platform Unitree Go2
๐Ÿ”„ Control Frequency 50 Hz (policy) / 1 kHz (PD controller)
๐ŸŽฎ Training Hardware Single NVIDIA RTX 4080 GPU
โณ Training Time ~30 hours (from scratch)
๐Ÿ“ท Depth Camera Intel RealSense D435i
๐Ÿ’ป Onboard Compute NVIDIA Jetson
๐Ÿš€ Deployment Custom C++ + ONNX Runtime
๐ŸŽฏ Reward Formulation Same as Extreme Parkour

๐Ÿ“ TODO

We plan to release the full codebase upon paper acceptance. The following items are on our roadmap:

๐Ÿ‹๏ธ Training

  • Privileged teacher policy training code (Stage 1)
  • Student distillation training code (Stage 2)
  • Consistency-aware loss gating implementation
  • Isaac Gym terrain environment and curriculum configs
  • Domain randomization parameters and reward formulation

๐Ÿงฎ Models & Estimation

  • Physics-guided filtering (EKF) module
  • Uncertainty-aware velocity estimator (1D ResNet)
  • Pre-trained model checkpoints (teacher & student)

๐Ÿ”– Citation

If you find this work useful, please consider citing:

@article{real2026,
  title   = {REAL: Robust Extreme Agility via Spatio-Temporal
             Policy Learning and Physics-Guided Filtering},
  author  = {Jialong Liu, Dehan Shen, Yanbo Wen,
             Zeyu Jiang and Changhao Chen},
  year    = {2026}
}

๐Ÿ™ Acknowledgements

This work builds upon the simulation infrastructure of Isaac Gym and the terrain setup from Extreme Parkour. We thank the authors for their open-source contributions.

๐Ÿ“„ License

This project will be released under the MIT License.

About

REAL: Robust Extreme Agility via Spatio-Temporal Policy Learning and Physics-Guided Filtering

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors