From Walking to Parkour: The Quantum Leap
In previous posts, we trained humanoids to walk on flat ground, traverse rough terrain, run fast, and carry objects. But all of these involved continuous locomotion — feet always contacting the ground in consistent patterns.
Parkour is a quantum leap: the robot must jump over gaps, climb tall steps, and vault over obstacles — motions requiring a flight phase (no feet touching the ground) and impact management (absorbing landing forces). This is the hardest frontier of humanoid locomotion.
Parkour Skills Taxonomy
| Skill | Description | Difficulty | Key Challenge |
|---|---|---|---|
| Gap crossing | Jump over gaps/crevasses | ★★★ | Timing, flight trajectory |
| Box jumping | Jump onto/off boxes | ★★★ | Impact absorption, height estimation |
| Stair climbing | Climb stairs up/down | ★★ | Step detection, balance |
| Vaulting | Arm-assisted obstacle clearing | ★★★★ | Contact switching, arm-leg coordination |
| Wall climbing | Grip walls and climb | ★★★★★ | Multi-contact, friction, strength |
| Running jump | Sprint + long jump | ★★★ | Velocity → takeoff transition |
| Landing | Safe impact absorption | ★★★ | Impact distribution, recovery |
Terrain Perception: Height Maps
Parkour requires seeing the terrain ahead to plan motions. Two main approaches:
Approach 1: Privileged Learning (Teacher-Student)
class ParkourTeacherEnv:
"""Teacher policy with privileged terrain information."""
def __init__(self, terrain_resolution=0.05, scan_range=2.0):
self.resolution = terrain_resolution
self.scan_range = scan_range
self.n_points = int(scan_range / terrain_resolution)
def get_privileged_obs(self, robot_pos, robot_heading):
"""Height scan ahead of the robot."""
scan_points = []
for i in range(self.n_points):
dist = (i + 1) * self.resolution
x = robot_pos[0] + dist * np.cos(robot_heading)
y = robot_pos[1] + dist * np.sin(robot_heading)
height = self.terrain.get_height(x, y) - robot_pos[2]
scan_points.append(height)
return np.array(scan_points) # (n_points,)
class ParkourStudentPolicy:
"""Student policy learns from proprioception only."""
def __init__(self, teacher_policy, obs_history_len=50):
self.teacher = teacher_policy
self.history_len = obs_history_len
# Student network: obs_history → estimated terrain
self.terrain_estimator = nn.Sequential(
nn.Linear(obs_history_len * obs_dim, 256),
nn.ReLU(),
nn.Linear(256, 128),
nn.ReLU(),
nn.Linear(128, n_terrain_points),
)
def distill(self, env, n_episodes=10000):
"""Distill teacher knowledge into student."""
for ep in range(n_episodes):
obs = env.reset()
self.obs_history.clear()
while not done:
teacher_obs = env.get_teacher_obs(obs)
teacher_action = self.teacher.predict(teacher_obs)
self.obs_history.append(obs)
if len(self.obs_history) > self.history_len:
self.obs_history.pop(0)
history_tensor = torch.tensor(
np.stack(self.obs_history)
).flatten()
estimated_terrain = self.terrain_estimator(history_tensor)
true_terrain = env.get_privileged_obs(
env.robot_pos, env.robot_heading
)
terrain_loss = F.mse_loss(
estimated_terrain, torch.tensor(true_terrain)
)
student_obs = torch.cat([
torch.tensor(obs), estimated_terrain
])
student_action = self.student_policy(student_obs)
action_loss = F.mse_loss(student_action, teacher_action)
total_loss = terrain_loss + action_loss
total_loss.backward()
self.optimizer.step()
Approach 2: Depth Camera
class DepthBasedParkour:
"""Parkour with depth camera observation."""
def __init__(self):
self.depth_encoder = nn.Sequential(
nn.Conv2d(1, 32, 5, stride=2), # (64,64) → (30,30)
nn.ReLU(),
nn.Conv2d(32, 64, 3, stride=2), # → (14,14)
nn.ReLU(),
nn.Conv2d(64, 64, 3, stride=2), # → (6,6)
nn.ReLU(),
nn.Flatten(),
nn.Linear(64 * 6 * 6, 128),
nn.ReLU(),
)
def process_depth(self, depth_image):
"""Encode depth image into latent vector."""
depth_norm = (depth_image - depth_image.mean()) / (depth_image.std() + 1e-6)
depth_tensor = torch.tensor(depth_norm).unsqueeze(0).unsqueeze(0).float()
return self.depth_encoder(depth_tensor)
Reward Design for Parkour
Multi-Phase Reward
class ParkourReward:
"""Reward function for parkour tasks."""
def __init__(self, config):
self.config = config
self.weights = {
# Phase 1: Approach
'approach_velocity': 2.0,
'heading_alignment': 0.5,
# Phase 2: Takeoff
'takeoff_velocity': 3.0,
'takeoff_angle': 1.0,
'leg_extension': 1.0,
# Phase 3: Flight
'height_clearance': 5.0,
'body_orientation': 2.0,
# Phase 4: Landing
'landing_stability': 5.0,
'impact_reduction': 2.0,
'recovery_speed': 1.0,
# General
'alive_bonus': 0.5,
'energy_penalty': -0.01,
'torque_penalty': -0.001,
}
def compute(self, state, prev_state):
reward = 0.0
if not state['fallen']:
reward += self.weights['alive_bonus']
# Forward velocity tracking
vel_error = abs(state['forward_vel'] - self.config.target_vel)
reward += self.weights['approach_velocity'] * np.exp(-vel_error)
# Height clearance (obstacle traversal)
if state['over_obstacle']:
clearance = state['foot_height'] - state['obstacle_height']
if clearance > 0:
reward += self.weights['height_clearance'] * min(clearance, 0.3)
# Landing stability
if state['just_landed']:
impact = state['ground_reaction_force']
max_safe_force = 3.0 * state['robot_mass'] * 9.81
if impact < max_safe_force:
reward += self.weights['landing_stability']
else:
reward -= self.weights['impact_reduction'] * (impact / max_safe_force - 1)
if state['steps_to_stable'] < 10:
reward += self.weights['recovery_speed']
reward += self.weights['energy_penalty'] * state['total_power']
reward += self.weights['torque_penalty'] * np.sum(state['torques'] ** 2)
return reward
Training Pipeline for Parkour
Obstacle Course Generation
class ParkourTerrainGenerator:
"""Generate random obstacle courses for training."""
def __init__(self, num_envs=4096, terrain_size=20.0):
self.num_envs = num_envs
self.terrain_size = terrain_size
def generate_gap_terrain(self, difficulty):
"""Generate terrain with gaps to jump over."""
heightfield = np.zeros((200, 200))
gap_width = 0.3 + difficulty * 0.7 # 0.3m → 1.0m
gap_depth = 0.5 + difficulty * 1.5 # 0.5m → 2.0m
n_gaps = int(3 + difficulty * 5)
for i in range(n_gaps):
x_start = np.random.randint(30, 170)
x_width = int(gap_width / 0.1)
heightfield[x_start:x_start+x_width, :] = -gap_depth
return heightfield
def generate_stepping_stones(self, difficulty):
"""Stepping stones with increasing gaps."""
heightfield = np.ones((200, 200)) * -2.0 # Deep pit
stone_size = max(0.4, 0.8 - difficulty * 0.4)
gap = 0.3 + difficulty * 0.5
x = 0
while x < 20:
x_idx = int(x / 0.1)
w_idx = int(stone_size / 0.1)
y_offset = int(np.random.uniform(-0.3, 0.3) / 0.1)
y_center = 100 + y_offset
heightfield[x_idx:x_idx+w_idx,
y_center-w_idx:y_center+w_idx] = 0.0
x += stone_size + gap
return heightfield
def generate_boxes(self, difficulty):
"""Boxes to jump on/off."""
heightfield = np.zeros((200, 200))
n_boxes = int(3 + difficulty * 5)
for i in range(n_boxes):
x = np.random.randint(20, 180)
y = np.random.randint(60, 140)
w = np.random.randint(5, 15)
h = 0.2 + difficulty * 0.6
heightfield[x:x+w, y:y+w] = h
return heightfield
Multi-Skill Composition
Unified vs Separate Policies
| Approach | Pros | Cons |
|---|---|---|
| Unified policy | Smooth transitions, end-to-end | Hard to train, slower convergence |
| Separate policies | Easier to train, modular | Transition jitter, skill selection needed |
| Mixture of Experts | Best of both — shared backbone, expert heads | Complex architecture |
Mixture of Experts Implementation
class ParkourMoE(nn.Module):
"""Mixture of Experts for parkour skills."""
def __init__(self, obs_dim, action_dim, n_experts=4):
super().__init__()
self.n_experts = n_experts
# Shared backbone
self.backbone = nn.Sequential(
nn.Linear(obs_dim, 512),
nn.ELU(),
nn.Linear(512, 256),
nn.ELU(),
)
# Expert heads (walk, jump, climb, land)
self.experts = nn.ModuleList([
nn.Sequential(
nn.Linear(256, 128),
nn.ELU(),
nn.Linear(128, action_dim),
) for _ in range(n_experts)
])
# Gating network
self.gate = nn.Sequential(
nn.Linear(256, 64),
nn.ELU(),
nn.Linear(64, n_experts),
nn.Softmax(dim=-1),
)
def forward(self, obs):
features = self.backbone(obs)
expert_outputs = torch.stack([
expert(features) for expert in self.experts
], dim=1) # (B, n_experts, action_dim)
weights = self.gate(features) # (B, n_experts)
action = torch.sum(
expert_outputs * weights.unsqueeze(-1), dim=1
) # (B, action_dim)
return action, weights
Current State-of-the-Art and Limitations
Achieved (2026)
- Gap crossing: Humanoid jumping over 1m+ gaps in sim (Robot Parkour Learning, Zhuang et al. 2023)
- Stair climbing: Smooth up/down on real robots (Extreme Parkour, Cheng et al. 2024)
- Box jumping: Jumping onto 0.5m boxes in sim (Humanoid-Gym)
- Running: Humanoid running 3+ m/s on flat terrain (Radosavovic et al. 2024)
Unsolved
- Vaulting: Arm-assisted obstacle clearing — too complex for current RL
- Wall climbing: Requires grippers + full-body contact planning
- Fall recovery: Getting back up after falling — active research area
- Outdoor parkour: Unstructured environments, variable surfaces
References
- Robot Parkour Learning — Zhuang et al., CoRL 2023
- Extreme Parkour with Legged Robots — Cheng et al., ICRA 2024
- Agile and Versatile Robot Locomotion via Agility Training — Radosavovic et al., 2024
- Humanoid-Gym: Reinforcement Learning for Humanoid Robot — Gu et al., 2024