Parkour — The Hardest Test for Robot Locomotion
In the world of robot locomotion, parkour is the most demanding comprehensive test. Unlike walking or running on flat ground, parkour requires robots to jump over gaps, climb high obstacles, crawl under barriers, and maintain balance on narrow surfaces — all in real-time with perception and control synchronized.
Why is parkour so difficult? Three main reasons:
- Diverse Skills in One Policy: Robot must know when to jump, when to climb, when to crawl — and transition smoothly between skills
- Vision-Based Decision Making: Relying solely on proprioception isn't enough — robot must "see" terrain ahead for planning
- Precise Timing: A jump off by 5cm or delayed by 50ms can cause the robot to fall
In this post, I'll analyze 3 important works on robot parkour with reinforcement learning: Extreme Parkour, Robot Parkour Learning, and SoloParkour — from training architecture to real-world results.
Extreme Parkour (Cheng et al., ICRA 2024)
Paper Extreme Parkour with Legged Robots by Xuxin Cheng, Kexin Shi, Ananye Agarwal and Deepak Pathak at Carnegie Mellon University is a major breakthrough. Impressive results: train robot parkour in under 20 hours, zero-shot deploy to Unitree A1 with single front-facing depth camera.
Teacher-Student Framework
Core architecture uses teacher-student distillation:
Phase 1 — Teacher Policy (privileged information):
- Teacher has access to ground truth terrain heightmap around robot
- Train with PPO in Isaac Gym with thousands of parallel environments
- Teacher learns all parkour skills: jump onto 0.6m boxes, jump over gaps, climb stairs
- Reward combines: forward velocity + survival bonus + energy penalty
Phase 2 — Student Policy (vision-only):
- Student only receives depth image from camera (no privileged info)
- Distill knowledge from teacher: student tries to reproduce teacher's behavior from visual input only
- Depth encoder (CNN) processes depth image into latent representation
- Student policy outputs joint position targets like teacher
Terrain Curriculum
Key: automatic terrain curriculum:
Level 1: Flat terrain → basic walking
Level 2: Small steps (10cm) → stepping
Level 3: Medium gaps (30cm) → jumping
Level 4: High boxes (40cm) → climbing
Level 5: Mixed obstacles → full parkour
Difficulty increases when robot achieves >80% success at current level. This progressive learning prevents overwhelming the policy with obstacles that are too hard from the start.
Results on Unitree A1
Robot Unitree A1 (low-cost, ~15kg) with single Intel RealSense D435 camera:
- Climb onto 0.6m boxes (2.3x robot height)
- Jump over 0.8m gaps
- Climb stairs continuously
- All in real-time, no fine-tuning required on real robot
Notably, the robot uses a depth camera that is low-frequency, jittery, and artifact-prone — yet the single neural network policy still outputs highly precise control. This proves that large-scale RL in simulation can overcome imprecise sensing.
Robot Parkour Learning (Zhuang et al., CoRL 2023)
Paper Robot Parkour Learning by Ziwen Zhuang, Zipeng Fu et al. (Stanford, CMU) addresses a different problem: learning diverse parkour skills in single end-to-end policy, without reference motion data.
Explicit vs Implicit Depth Encoding
Main contribution compares two approaches to processing depth information:
Explicit depth encoder:
- Depth image → CNN → explicit heightmap prediction
- Robot knows exact geometry of obstacle ahead
- Pros: interpretable, easy to debug
- Cons: reconstruction error accumulates
Implicit depth encoder:
- Depth image → CNN → latent embedding (no heightmap reconstruction)
- Network decides what features to extract from depth
- Pros: can capture information that explicit method misses
- Cons: black-box, hard to debug
Paper concludes that implicit encoding works better for complex parkour tasks, because the network can learn features relevant to each specific skill rather than being constrained by heightmap reconstruction.
Diverse Skills from Simple Reward
Instead of designing separate reward for each skill (jump reward, climb reward, crawl reward), the paper uses single simple reward function:
reward = forward_velocity + alive_bonus - energy_penalty - contact_penalty
Diverse skills emerge naturally from terrain curriculum:
- Climbing: 0.40m obstacles (1.53x robot height)
- Leaping: 0.60m gaps (1.5x robot length)
- Crawling: 0.20m barriers (0.76x robot height)
- Squeezing: 0.28m slits (narrower than robot width — robot must tilt sideways)
This is a beautiful example of emergent behavior — complex skills arising from simple objectives combined with diverse environments.
SoloParkour (Chane-Sane et al., CoRL 2024)
SoloParkour from LAAS-CNRS (France) introduces a new approach: constrained reinforcement learning for visual parkour, demonstrated on the Solo-12 robot.
Constrained RL Formulation
Instead of complex reward shaping, SoloParkour formulates parkour as constrained optimization problem:
- Objective: Maximize agile locomotion skills
- Constraints: Stay within robot's physical limits (torque limits, joint limits, stability)
This approach has major advantage: robot is encouraged to try aggressive maneuvers (high jumps, fast running) but constrained to not exceed physical limits — reducing hardware damage risk when deploying.
Privileged Experience Warm-Start
Training pipeline has 2 phases:
Phase 1 — Privileged policy (no vision needed):
- Train policy with privileged information (terrain heightmap, exact robot state)
- Policy achieves high performance because it has complete state information
Phase 2 — Visual policy (from depth images):
- Use experience from privileged policy to warm-start off-policy RL algorithm
- Instead of training from scratch (expensive), visual policy starts from good initial behaviors
- Much more sample-efficient compared to on-policy methods like PPO
Single Policy, Multiple Terrains
SoloParkour trains one single policy on a curriculum of multiple terrain types:
- Crawl parkour: floating objects, robot must crawl underneath
- Step/hurdle parkour: obstacles to climb and jump over
- Leap parkour: gaps to jump across
- Difficulty increases progressively during training
Result: single policy can walk, climb, leap, and crawl — all from depth pixels, no switching between specialized policies.
Comparison of Three Methods
| Criterion | Extreme Parkour | Robot Parkour Learning | SoloParkour |
|---|---|---|---|
| Robot | Unitree A1 | Unitree A1 | Solo-12 |
| Vision | Depth camera | Depth camera | Depth camera |
| Framework | Teacher-student | End-to-end | Constrained RL |
| Depth encoding | Implicit | Explicit + Implicit | Implicit |
| Training | PPO + distillation | PPO + curriculum | Constrained RL + warm-start |
| Skills | Jump, climb, stairs | Climb, leap, crawl, squeeze | Walk, climb, leap, crawl |
| Reference motion | No | No | No |
| Conference | ICRA 2024 | CoRL 2023 | CoRL 2024 |
| Max obstacle height | 0.6m | 0.4m | 0.3m |
ANYmal Parkour — Industrial Scale
Beyond research on small robots, ETH Zurich also demonstrated parkour capabilities on ANYmal — an industrial-grade 50kg quadruped. Paper Learning Agile Locomotion on Risky Terrains shows ANYmal-D achieving peak velocity 2.5 m/s on stepping stones and navigating narrow balance beams.
ETH's approach differs by formulating parkour as a navigation task rather than velocity tracking: the robot autonomously decides its speed based on terrain difficulty. Easy terrain — run fast; difficult obstacle — slow down and be more careful.
This is philosophically closer to how an experienced human parkour athlete thinks: not "run at 3 m/s" but "adapt speed to terrain complexity."
General Training Pipeline for Robot Parkour
Based on the 3 papers above, the common pipeline for robot parkour includes:
Step 1: Terrain Generation
Create diverse terrain in simulation (Isaac Gym or MuJoCo):
- Flat ground, slopes, stairs
- Random boxes, gaps, barriers
- Combination terrains (parkour courses)
- Parametric difficulty control
Step 2: Privileged Teacher
Train teacher policy with complete state information:
- Ground truth heightmap or terrain parameters
- Perfect robot state (no sensor noise)
- PPO with large batch size (4096+ environments)
Step 3: Visual Student (Distillation)
Transfer knowledge to vision-based student:
- Depth image input → CNN encoder → latent → policy
- Behavior cloning or DAgger from teacher
- Or constrained RL warm-start (SoloParkour approach)
Step 4: Domain Randomization
Randomize for real-world robustness:
- Camera noise, latency, field of view
- Terrain friction, compliance
- Robot mass, COM position
- Actuator strength, delay
Step 5: Real-World Deployment
Deploy and iterate:
- Zero-shot transfer (preferred)
- Fine-tune if necessary (minimal)
- Monitor failure cases, add to training curriculum
Remaining Challenges
Despite impressive results, robot parkour still has many challenges:
- Long-horizon planning: Current policies react short-term. Real parkour needs to plan many steps ahead (looking 5-10m forward)
- Recovery from failure: When robot falls mid-parkour course, needs recovery policy to stand up and continue
- Generalization: Trained in sim with specific terrains, but real world has infinite variety
- Bipedal parkour: Most research is quadruped. Bipedal parkour is much harder (see Humanoid Parkour Learning)
Conclusion
Robot parkour is a rapidly evolving field in 2023-2024. From Extreme Parkour (teacher-student distillation) to Robot Parkour Learning (emergent skills from simple rewards) to SoloParkour (constrained RL), we see that RL + vision + terrain curriculum is the winning formula for agile locomotion.
The next frontier: bipedal parkour for humanoid robots. When a robot like Optimus or Digit can navigate the same obstacle courses as in these papers, that marks a truly significant milestone for robotics.
Read the other parts in this series for the full picture:
- Part 1: Locomotion Basics — ZMP, CPG, IK theory
- Part 2: RL for Locomotion — reward design, curriculum
- Part 3: Quadruped Hands-On — legged_gym, Unitree Go2
- Part 4: Walk These Ways — multi-gait learning
- Part 6: Bipedal Walking — 2-legged robot challenges, from Cassie to humanoid
Related Posts
- Bipedal Walking: Controlling 2-Legged Robots with RL — Bipedal locomotion challenges from Cassie to Digit
- Sim-to-Real for Locomotion: Reality and Experience — Domain randomization and actuator networks
- Simulation for Robotics: MuJoCo vs Isaac Sim vs Gazebo — Simulator comparison for training
- RL Basics for Robotics: From Markov to PPO — RL foundation
- Sim-to-Real Transfer: Train Simulation, Run Reality — Best practices for sim-to-real