← Back to Blog
locomotionlocomotionrlparkour

Robot Parkour: Jumping and Climbing with RL

Analyzing Extreme Parkour and SoloParkour — teaching robots to jump, climb stairs and overcome complex terrain with RL.

Nguyen Anh Tuan21 tháng 2, 20269 min read
Robot Parkour: Jumping and Climbing with RL

Parkour — The Hardest Test for Robot Locomotion

In the world of robot locomotion, parkour is the most demanding comprehensive test. Unlike walking or running on flat ground, parkour requires robots to jump over gaps, climb high obstacles, crawl under barriers, and maintain balance on narrow surfaces — all in real-time with perception and control synchronized.

Why is parkour so difficult? Three main reasons:

  1. Diverse Skills in One Policy: Robot must know when to jump, when to climb, when to crawl — and transition smoothly between skills
  2. Vision-Based Decision Making: Relying solely on proprioception isn't enough — robot must "see" terrain ahead for planning
  3. Precise Timing: A jump off by 5cm or delayed by 50ms can cause the robot to fall

In this post, I'll analyze 3 important works on robot parkour with reinforcement learning: Extreme Parkour, Robot Parkour Learning, and SoloParkour — from training architecture to real-world results.

Robot overcoming obstacles with reinforcement learning

Extreme Parkour (Cheng et al., ICRA 2024)

Paper Extreme Parkour with Legged Robots by Xuxin Cheng, Kexin Shi, Ananye Agarwal and Deepak Pathak at Carnegie Mellon University is a major breakthrough. Impressive results: train robot parkour in under 20 hours, zero-shot deploy to Unitree A1 with single front-facing depth camera.

Teacher-Student Framework

Core architecture uses teacher-student distillation:

Phase 1 — Teacher Policy (privileged information):

Phase 2 — Student Policy (vision-only):

Terrain Curriculum

Key: automatic terrain curriculum:

Level 1: Flat terrain → basic walking
Level 2: Small steps (10cm) → stepping
Level 3: Medium gaps (30cm) → jumping
Level 4: High boxes (40cm) → climbing
Level 5: Mixed obstacles → full parkour

Difficulty increases when robot achieves >80% success at current level. This progressive learning prevents overwhelming the policy with obstacles that are too hard from the start.

Results on Unitree A1

Robot Unitree A1 (low-cost, ~15kg) with single Intel RealSense D435 camera:

Notably, the robot uses a depth camera that is low-frequency, jittery, and artifact-prone — yet the single neural network policy still outputs highly precise control. This proves that large-scale RL in simulation can overcome imprecise sensing.

Robot Parkour Learning (Zhuang et al., CoRL 2023)

Paper Robot Parkour Learning by Ziwen Zhuang, Zipeng Fu et al. (Stanford, CMU) addresses a different problem: learning diverse parkour skills in single end-to-end policy, without reference motion data.

Explicit vs Implicit Depth Encoding

Main contribution compares two approaches to processing depth information:

Explicit depth encoder:

Implicit depth encoder:

Paper concludes that implicit encoding works better for complex parkour tasks, because the network can learn features relevant to each specific skill rather than being constrained by heightmap reconstruction.

Diverse Skills from Simple Reward

Instead of designing separate reward for each skill (jump reward, climb reward, crawl reward), the paper uses single simple reward function:

reward = forward_velocity + alive_bonus - energy_penalty - contact_penalty

Diverse skills emerge naturally from terrain curriculum:

This is a beautiful example of emergent behavior — complex skills arising from simple objectives combined with diverse environments.

SoloParkour (Chane-Sane et al., CoRL 2024)

SoloParkour from LAAS-CNRS (France) introduces a new approach: constrained reinforcement learning for visual parkour, demonstrated on the Solo-12 robot.

Constrained RL Formulation

Instead of complex reward shaping, SoloParkour formulates parkour as constrained optimization problem:

This approach has major advantage: robot is encouraged to try aggressive maneuvers (high jumps, fast running) but constrained to not exceed physical limits — reducing hardware damage risk when deploying.

Privileged Experience Warm-Start

Training pipeline has 2 phases:

Phase 1 — Privileged policy (no vision needed):

Phase 2 — Visual policy (from depth images):

Single Policy, Multiple Terrains

SoloParkour trains one single policy on a curriculum of multiple terrain types:

Result: single policy can walk, climb, leap, and crawl — all from depth pixels, no switching between specialized policies.

Training pipeline for robot parkour with terrain curriculum

Comparison of Three Methods

Criterion Extreme Parkour Robot Parkour Learning SoloParkour
Robot Unitree A1 Unitree A1 Solo-12
Vision Depth camera Depth camera Depth camera
Framework Teacher-student End-to-end Constrained RL
Depth encoding Implicit Explicit + Implicit Implicit
Training PPO + distillation PPO + curriculum Constrained RL + warm-start
Skills Jump, climb, stairs Climb, leap, crawl, squeeze Walk, climb, leap, crawl
Reference motion No No No
Conference ICRA 2024 CoRL 2023 CoRL 2024
Max obstacle height 0.6m 0.4m 0.3m

ANYmal Parkour — Industrial Scale

Beyond research on small robots, ETH Zurich also demonstrated parkour capabilities on ANYmal — an industrial-grade 50kg quadruped. Paper Learning Agile Locomotion on Risky Terrains shows ANYmal-D achieving peak velocity 2.5 m/s on stepping stones and navigating narrow balance beams.

ETH's approach differs by formulating parkour as a navigation task rather than velocity tracking: the robot autonomously decides its speed based on terrain difficulty. Easy terrain — run fast; difficult obstacle — slow down and be more careful.

This is philosophically closer to how an experienced human parkour athlete thinks: not "run at 3 m/s" but "adapt speed to terrain complexity."

General Training Pipeline for Robot Parkour

Based on the 3 papers above, the common pipeline for robot parkour includes:

Step 1: Terrain Generation

Create diverse terrain in simulation (Isaac Gym or MuJoCo):

Step 2: Privileged Teacher

Train teacher policy with complete state information:

Step 3: Visual Student (Distillation)

Transfer knowledge to vision-based student:

Step 4: Domain Randomization

Randomize for real-world robustness:

Step 5: Real-World Deployment

Deploy and iterate:

Remaining Challenges

Despite impressive results, robot parkour still has many challenges:

  1. Long-horizon planning: Current policies react short-term. Real parkour needs to plan many steps ahead (looking 5-10m forward)
  2. Recovery from failure: When robot falls mid-parkour course, needs recovery policy to stand up and continue
  3. Generalization: Trained in sim with specific terrains, but real world has infinite variety
  4. Bipedal parkour: Most research is quadruped. Bipedal parkour is much harder (see Humanoid Parkour Learning)

Conclusion

Robot parkour is a rapidly evolving field in 2023-2024. From Extreme Parkour (teacher-student distillation) to Robot Parkour Learning (emergent skills from simple rewards) to SoloParkour (constrained RL), we see that RL + vision + terrain curriculum is the winning formula for agile locomotion.

The next frontier: bipedal parkour for humanoid robots. When a robot like Optimus or Digit can navigate the same obstacle courses as in these papers, that marks a truly significant milestone for robotics.

Read the other parts in this series for the full picture:


Related Posts

Related Posts

TutorialNVIDIA Isaac Lab: GPU-accelerated RL training từ zero
simulationisaac-simrlPart 3

NVIDIA Isaac Lab: GPU-accelerated RL training từ zero

Setup Isaac Lab, train locomotion policy với 4096 parallel environments và domain randomization trên GPU.

1/4/202611 min read
Deep DiveDomain Randomization: Chìa khóa Sim-to-Real Transfer
simulationsim2realrlPart 4

Domain Randomization: Chìa khóa Sim-to-Real Transfer

Lý thuyết và thực hành domain randomization — visual, dynamics, sensor randomization với case studies thành công.

26/3/202610 min read
TutorialRL cho Robotics: PPO, SAC và cách chọn algorithm
ai-perceptionrlprogrammingPart 1

RL cho Robotics: PPO, SAC và cách chọn algorithm

Tổng quan RL algorithms cho robotics — PPO, SAC, TD3 và hướng dẫn chọn algorithm phù hợp cho từng bài toán robot.

5/3/202610 min read