Robot Parkour: Jumping and Climbing with RL

Parkour — The Hardest Test for Robot Locomotion

In the world of robot locomotion, parkour is the most demanding comprehensive test. Unlike walking or running on flat ground, parkour requires robots to jump over gaps, climb high obstacles, crawl under barriers, and maintain balance on narrow surfaces — all in real-time with perception and control synchronized.

Why is parkour so difficult? Three main reasons:

Diverse Skills in One Policy: Robot must know when to jump, when to climb, when to crawl — and transition smoothly between skills
Vision-Based Decision Making: Relying solely on proprioception isn't enough — robot must "see" terrain ahead for planning
Precise Timing: A jump off by 5cm or delayed by 50ms can cause the robot to fall

In this post, I'll analyze 3 important works on robot parkour with reinforcement learning: Extreme Parkour, Robot Parkour Learning, and SoloParkour — from training architecture to real-world results.

Robot overcoming obstacles with reinforcement learning

Extreme Parkour (Cheng et al., ICRA 2024)

Paper Extreme Parkour with Legged Robots by Xuxin Cheng, Kexin Shi, Ananye Agarwal and Deepak Pathak at Carnegie Mellon University is a major breakthrough. Impressive results: train robot parkour in under 20 hours, zero-shot deploy to Unitree A1 with single front-facing depth camera.

Teacher-Student Framework

Core architecture uses teacher-student distillation:

Phase 1 — Teacher Policy (privileged information):

Teacher has access to ground truth terrain heightmap around robot
Train with PPO in Isaac Gym with thousands of parallel environments
Teacher learns all parkour skills: jump onto 0.6m boxes, jump over gaps, climb stairs
Reward combines: forward velocity + survival bonus + energy penalty

Phase 2 — Student Policy (vision-only):

Student only receives depth image from camera (no privileged info)
Distill knowledge from teacher: student tries to reproduce teacher's behavior from visual input only
Depth encoder (CNN) processes depth image into latent representation
Student policy outputs joint position targets like teacher

Terrain Curriculum

Key: automatic terrain curriculum:

Level 1: Flat terrain → basic walking
Level 2: Small steps (10cm) → stepping
Level 3: Medium gaps (30cm) → jumping
Level 4: High boxes (40cm) → climbing
Level 5: Mixed obstacles → full parkour

Difficulty increases when robot achieves >80% success at current level. This progressive learning prevents overwhelming the policy with obstacles that are too hard from the start.

Results on Unitree A1

Robot Unitree A1 (low-cost, ~15kg) with single Intel RealSense D435 camera:

Climb onto 0.6m boxes (2.3x robot height)
Jump over 0.8m gaps
Climb stairs continuously
All in real-time, no fine-tuning required on real robot

Notably, the robot uses a depth camera that is low-frequency, jittery, and artifact-prone — yet the single neural network policy still outputs highly precise control. This proves that large-scale RL in simulation can overcome imprecise sensing.

Robot Parkour Learning (Zhuang et al., CoRL 2023)

Paper Robot Parkour Learning by Ziwen Zhuang, Zipeng Fu et al. (Stanford, CMU) addresses a different problem: learning diverse parkour skills in single end-to-end policy, without reference motion data.

Explicit vs Implicit Depth Encoding

Main contribution compares two approaches to processing depth information:

Explicit depth encoder:

Depth image → CNN → explicit heightmap prediction
Robot knows exact geometry of obstacle ahead
Pros: interpretable, easy to debug
Cons: reconstruction error accumulates

Implicit depth encoder:

Depth image → CNN → latent embedding (no heightmap reconstruction)
Network decides what features to extract from depth
Pros: can capture information that explicit method misses
Cons: black-box, hard to debug

Paper concludes that implicit encoding works better for complex parkour tasks, because the network can learn features relevant to each specific skill rather than being constrained by heightmap reconstruction.

Diverse Skills from Simple Reward

Instead of designing separate reward for each skill (jump reward, climb reward, crawl reward), the paper uses single simple reward function:

reward = forward_velocity + alive_bonus - energy_penalty - contact_penalty

Diverse skills emerge naturally from terrain curriculum:

Climbing: 0.40m obstacles (1.53x robot height)
Leaping: 0.60m gaps (1.5x robot length)
Crawling: 0.20m barriers (0.76x robot height)
Squeezing: 0.28m slits (narrower than robot width — robot must tilt sideways)

This is a beautiful example of emergent behavior — complex skills arising from simple objectives combined with diverse environments.

SoloParkour (Chane-Sane et al., CoRL 2024)

SoloParkour from LAAS-CNRS (France) introduces a new approach: constrained reinforcement learning for visual parkour, demonstrated on the Solo-12 robot.

Constrained RL Formulation

Instead of complex reward shaping, SoloParkour formulates parkour as constrained optimization problem:

Objective: Maximize agile locomotion skills
Constraints: Stay within robot's physical limits (torque limits, joint limits, stability)

This approach has major advantage: robot is encouraged to try aggressive maneuvers (high jumps, fast running) but constrained to not exceed physical limits — reducing hardware damage risk when deploying.

Privileged Experience Warm-Start

Training pipeline has 2 phases:

Phase 1 — Privileged policy (no vision needed):

Train policy with privileged information (terrain heightmap, exact robot state)
Policy achieves high performance because it has complete state information

Phase 2 — Visual policy (from depth images):

Use experience from privileged policy to warm-start off-policy RL algorithm
Instead of training from scratch (expensive), visual policy starts from good initial behaviors
Much more sample-efficient compared to on-policy methods like PPO

Single Policy, Multiple Terrains

SoloParkour trains one single policy on a curriculum of multiple terrain types:

Crawl parkour: floating objects, robot must crawl underneath
Step/hurdle parkour: obstacles to climb and jump over
Leap parkour: gaps to jump across
Difficulty increases progressively during training

Result: single policy can walk, climb, leap, and crawl — all from depth pixels, no switching between specialized policies.

Training pipeline for robot parkour with terrain curriculum

Comparison of Three Methods

Criterion	Extreme Parkour	Robot Parkour Learning	SoloParkour
Robot	Unitree A1	Unitree A1	Solo-12
Vision	Depth camera	Depth camera	Depth camera
Framework	Teacher-student	End-to-end	Constrained RL
Depth encoding	Implicit	Explicit + Implicit	Implicit
Training	PPO + distillation	PPO + curriculum	Constrained RL + warm-start
Skills	Jump, climb, stairs	Climb, leap, crawl, squeeze	Walk, climb, leap, crawl
Reference motion	No	No	No
Conference	ICRA 2024	CoRL 2023	CoRL 2024
Max obstacle height	0.6m	0.4m	0.3m

ANYmal Parkour — Industrial Scale

Beyond research on small robots, ETH Zurich also demonstrated parkour capabilities on ANYmal — an industrial-grade 50kg quadruped. Paper Learning Agile Locomotion on Risky Terrains shows ANYmal-D achieving peak velocity 2.5 m/s on stepping stones and navigating narrow balance beams.

ETH's approach differs by formulating parkour as a navigation task rather than velocity tracking: the robot autonomously decides its speed based on terrain difficulty. Easy terrain — run fast; difficult obstacle — slow down and be more careful.

This is philosophically closer to how an experienced human parkour athlete thinks: not "run at 3 m/s" but "adapt speed to terrain complexity."

General Training Pipeline for Robot Parkour

Based on the 3 papers above, the common pipeline for robot parkour includes:

Step 1: Terrain Generation

Create diverse terrain in simulation (Isaac Gym or MuJoCo):

Flat ground, slopes, stairs
Random boxes, gaps, barriers
Combination terrains (parkour courses)
Parametric difficulty control

Step 2: Privileged Teacher

Train teacher policy with complete state information:

Ground truth heightmap or terrain parameters
Perfect robot state (no sensor noise)
PPO with large batch size (4096+ environments)

Step 3: Visual Student (Distillation)

Transfer knowledge to vision-based student:

Depth image input → CNN encoder → latent → policy
Behavior cloning or DAgger from teacher
Or constrained RL warm-start (SoloParkour approach)

Step 4: Domain Randomization

Randomize for real-world robustness:

Camera noise, latency, field of view
Terrain friction, compliance
Robot mass, COM position
Actuator strength, delay

Step 5: Real-World Deployment

Deploy and iterate:

Zero-shot transfer (preferred)
Fine-tune if necessary (minimal)
Monitor failure cases, add to training curriculum

Remaining Challenges

Despite impressive results, robot parkour still has many challenges:

Long-horizon planning: Current policies react short-term. Real parkour needs to plan many steps ahead (looking 5-10m forward)
Recovery from failure: When robot falls mid-parkour course, needs recovery policy to stand up and continue
Generalization: Trained in sim with specific terrains, but real world has infinite variety
Bipedal parkour: Most research is quadruped. Bipedal parkour is much harder (see Humanoid Parkour Learning)

Conclusion

Robot parkour is a rapidly evolving field in 2023-2024. From Extreme Parkour (teacher-student distillation) to Robot Parkour Learning (emergent skills from simple rewards) to SoloParkour (constrained RL), we see that RL + vision + terrain curriculum is the winning formula for agile locomotion.

The next frontier: bipedal parkour for humanoid robots. When a robot like Optimus or Digit can navigate the same obstacle courses as in these papers, that marks a truly significant milestone for robotics.

Read the other parts in this series for the full picture:

Part 1: Locomotion Basics — ZMP, CPG, IK theory
Part 2: RL for Locomotion — reward design, curriculum
Part 3: Quadruped Hands-On — legged_gym, Unitree Go2
Part 4: Walk These Ways — multi-gait learning
Part 6: Bipedal Walking — 2-legged robot challenges, from Cassie to humanoid

Bipedal Walking: Controlling 2-Legged Robots with RL — Bipedal locomotion challenges from Cassie to Digit
Sim-to-Real for Locomotion: Reality and Experience — Domain randomization and actuator networks
Simulation for Robotics: MuJoCo vs Isaac Sim vs Gazebo — Simulator comparison for training
RL Basics for Robotics: From Markov to PPO — RL foundation
Sim-to-Real Transfer: Train Simulation, Run Reality — Best practices for sim-to-real

Parkour — The Hardest Test for Robot Locomotion

Why is parkour so difficult? Three main reasons:

Diverse Skills in One Policy: Robot must know when to jump, when to climb, when to crawl — and transition smoothly between skills
Vision-Based Decision Making: Relying solely on proprioception isn't enough — robot must "see" terrain ahead for planning
Precise Timing: A jump off by 5cm or delayed by 50ms can cause the robot to fall

Robot overcoming obstacles with reinforcement learning

Extreme Parkour (Cheng et al., ICRA 2024)

Teacher-Student Framework

Core architecture uses teacher-student distillation:

Phase 1 — Teacher Policy (privileged information):

Teacher has access to ground truth terrain heightmap around robot
Train with PPO in Isaac Gym with thousands of parallel environments
Teacher learns all parkour skills: jump onto 0.6m boxes, jump over gaps, climb stairs
Reward combines: forward velocity + survival bonus + energy penalty

Phase 2 — Student Policy (vision-only):

Student only receives depth image from camera (no privileged info)
Distill knowledge from teacher: student tries to reproduce teacher's behavior from visual input only
Depth encoder (CNN) processes depth image into latent representation
Student policy outputs joint position targets like teacher

Terrain Curriculum

Key: automatic terrain curriculum:

Level 1: Flat terrain → basic walking
Level 2: Small steps (10cm) → stepping
Level 3: Medium gaps (30cm) → jumping
Level 4: High boxes (40cm) → climbing
Level 5: Mixed obstacles → full parkour

Difficulty increases when robot achieves >80% success at current level. This progressive learning prevents overwhelming the policy with obstacles that are too hard from the start.

Results on Unitree A1

Robot Unitree A1 (low-cost, ~15kg) with single Intel RealSense D435 camera:

Climb onto 0.6m boxes (2.3x robot height)
Jump over 0.8m gaps
Climb stairs continuously
All in real-time, no fine-tuning required on real robot

Robot Parkour Learning (Zhuang et al., CoRL 2023)

Explicit vs Implicit Depth Encoding

Main contribution compares two approaches to processing depth information:

Explicit depth encoder:

Depth image → CNN → explicit heightmap prediction
Robot knows exact geometry of obstacle ahead
Pros: interpretable, easy to debug
Cons: reconstruction error accumulates

Implicit depth encoder:

Depth image → CNN → latent embedding (no heightmap reconstruction)
Network decides what features to extract from depth
Pros: can capture information that explicit method misses
Cons: black-box, hard to debug

Diverse Skills from Simple Reward

Instead of designing separate reward for each skill (jump reward, climb reward, crawl reward), the paper uses single simple reward function:

reward = forward_velocity + alive_bonus - energy_penalty - contact_penalty

Diverse skills emerge naturally from terrain curriculum:

Climbing: 0.40m obstacles (1.53x robot height)
Leaping: 0.60m gaps (1.5x robot length)
Crawling: 0.20m barriers (0.76x robot height)
Squeezing: 0.28m slits (narrower than robot width — robot must tilt sideways)

This is a beautiful example of emergent behavior — complex skills arising from simple objectives combined with diverse environments.

SoloParkour (Chane-Sane et al., CoRL 2024)

SoloParkour from LAAS-CNRS (France) introduces a new approach: constrained reinforcement learning for visual parkour, demonstrated on the Solo-12 robot.

Constrained RL Formulation

Instead of complex reward shaping, SoloParkour formulates parkour as constrained optimization problem:

Objective: Maximize agile locomotion skills
Constraints: Stay within robot's physical limits (torque limits, joint limits, stability)

Privileged Experience Warm-Start

Training pipeline has 2 phases:

Phase 1 — Privileged policy (no vision needed):

Train policy with privileged information (terrain heightmap, exact robot state)
Policy achieves high performance because it has complete state information

Phase 2 — Visual policy (from depth images):

Use experience from privileged policy to warm-start off-policy RL algorithm
Instead of training from scratch (expensive), visual policy starts from good initial behaviors
Much more sample-efficient compared to on-policy methods like PPO

Single Policy, Multiple Terrains

SoloParkour trains one single policy on a curriculum of multiple terrain types:

Crawl parkour: floating objects, robot must crawl underneath
Step/hurdle parkour: obstacles to climb and jump over
Leap parkour: gaps to jump across
Difficulty increases progressively during training

Result: single policy can walk, climb, leap, and crawl — all from depth pixels, no switching between specialized policies.

Training pipeline for robot parkour with terrain curriculum

Comparison of Three Methods

Criterion	Extreme Parkour	Robot Parkour Learning	SoloParkour
Robot	Unitree A1	Unitree A1	Solo-12
Vision	Depth camera	Depth camera	Depth camera
Framework	Teacher-student	End-to-end	Constrained RL
Depth encoding	Implicit	Explicit + Implicit	Implicit
Training	PPO + distillation	PPO + curriculum	Constrained RL + warm-start
Skills	Jump, climb, stairs	Climb, leap, crawl, squeeze	Walk, climb, leap, crawl
Reference motion	No	No	No
Conference	ICRA 2024	CoRL 2023	CoRL 2024
Max obstacle height	0.6m	0.4m	0.3m

ANYmal Parkour — Industrial Scale

This is philosophically closer to how an experienced human parkour athlete thinks: not "run at 3 m/s" but "adapt speed to terrain complexity."

General Training Pipeline for Robot Parkour

Based on the 3 papers above, the common pipeline for robot parkour includes:

Step 1: Terrain Generation

Create diverse terrain in simulation (Isaac Gym or MuJoCo):

Flat ground, slopes, stairs
Random boxes, gaps, barriers
Combination terrains (parkour courses)
Parametric difficulty control

Step 2: Privileged Teacher

Train teacher policy with complete state information:

Ground truth heightmap or terrain parameters
Perfect robot state (no sensor noise)
PPO with large batch size (4096+ environments)

Step 3: Visual Student (Distillation)

Transfer knowledge to vision-based student:

Depth image input → CNN encoder → latent → policy
Behavior cloning or DAgger from teacher
Or constrained RL warm-start (SoloParkour approach)

Step 4: Domain Randomization

Randomize for real-world robustness:

Camera noise, latency, field of view
Terrain friction, compliance
Robot mass, COM position
Actuator strength, delay

Step 5: Real-World Deployment

Deploy and iterate:

Zero-shot transfer (preferred)
Fine-tune if necessary (minimal)
Monitor failure cases, add to training curriculum

Remaining Challenges

Despite impressive results, robot parkour still has many challenges:

Long-horizon planning: Current policies react short-term. Real parkour needs to plan many steps ahead (looking 5-10m forward)
Recovery from failure: When robot falls mid-parkour course, needs recovery policy to stand up and continue
Generalization: Trained in sim with specific terrains, but real world has infinite variety
Bipedal parkour: Most research is quadruped. Bipedal parkour is much harder (see Humanoid Parkour Learning)

Conclusion

Read the other parts in this series for the full picture:

Part 1: Locomotion Basics — ZMP, CPG, IK theory
Part 2: RL for Locomotion — reward design, curriculum
Part 3: Quadruped Hands-On — legged_gym, Unitree Go2
Part 4: Walk These Ways — multi-gait learning
Part 6: Bipedal Walking — 2-legged robot challenges, from Cassie to humanoid

Bipedal Walking: Controlling 2-Legged Robots with RL — Bipedal locomotion challenges from Cassie to Digit
Sim-to-Real for Locomotion: Reality and Experience — Domain randomization and actuator networks
Simulation for Robotics: MuJoCo vs Isaac Sim vs Gazebo — Simulator comparison for training
RL Basics for Robotics: From Markov to PPO — RL foundation
Sim-to-Real Transfer: Train Simulation, Run Reality — Best practices for sim-to-real

Parkour — The Hardest Test for Robot Locomotion

Extreme Parkour (Cheng et al., ICRA 2024)

Teacher-Student Framework

Terrain Curriculum

Results on Unitree A1

Robot Parkour Learning (Zhuang et al., CoRL 2023)

Explicit vs Implicit Depth Encoding

Diverse Skills from Simple Reward

SoloParkour (Chane-Sane et al., CoRL 2024)

Constrained RL Formulation

Privileged Experience Warm-Start

Single Policy, Multiple Terrains

Comparison of Three Methods

ANYmal Parkour — Industrial Scale

General Training Pipeline for Robot Parkour

Step 1: Terrain Generation

Step 2: Privileged Teacher

Step 3: Visual Student (Distillation)

Step 4: Domain Randomization

Step 5: Real-World Deployment

Remaining Challenges

Conclusion

Related Posts

Nguyễn Anh Tuấn

Related Posts

Sim-to-Real cho Locomotion: Thực tế và kinh nghiệm

Walk These Ways: Adaptive locomotion một policy

Quadruped Locomotion: legged_gym đến Unitree Go2

Parkour — The Hardest Test for Robot Locomotion

Extreme Parkour (Cheng et al., ICRA 2024)

Teacher-Student Framework

Terrain Curriculum

Results on Unitree A1

Robot Parkour Learning (Zhuang et al., CoRL 2023)

Explicit vs Implicit Depth Encoding

Diverse Skills from Simple Reward

SoloParkour (Chane-Sane et al., CoRL 2024)

Constrained RL Formulation

Privileged Experience Warm-Start

Single Policy, Multiple Terrains

Comparison of Three Methods

ANYmal Parkour — Industrial Scale

General Training Pipeline for Robot Parkour

Step 1: Terrain Generation

Step 2: Privileged Teacher

Step 3: Visual Student (Distillation)

Step 4: Domain Randomization

Step 5: Real-World Deployment

Remaining Challenges

Conclusion

Related Posts

Nguyễn Anh Tuấn

Related Posts

Sim-to-Real cho Locomotion: Thực tế và kinh nghiệm

Walk These Ways: Adaptive locomotion một policy

Quadruped Locomotion: legged_gym đến Unitree Go2