Humanoid Parkour: Jumping, Climbing & Obstacle Courses

From Walking to Parkour: The Quantum Leap

In previous posts, we trained humanoids to walk on flat ground, traverse rough terrain, run fast, and carry objects. But all of these involved continuous locomotion — feet always contacting the ground in consistent patterns.

Parkour is a quantum leap: the robot must jump over gaps, climb tall steps, and vault over obstacles — motions requiring a flight phase (no feet touching the ground) and impact management (absorbing landing forces). This is the hardest frontier of humanoid locomotion.

Parkour Skills Taxonomy

Skill	Description	Difficulty	Key Challenge
Gap crossing	Jump over gaps/crevasses	★★★	Timing, flight trajectory
Box jumping	Jump onto/off boxes	★★★	Impact absorption, height estimation
Stair climbing	Climb stairs up/down	★★	Step detection, balance
Vaulting	Arm-assisted obstacle clearing	★★★★	Contact switching, arm-leg coordination
Wall climbing	Grip walls and climb	★★★★★	Multi-contact, friction, strength
Running jump	Sprint + long jump	★★★	Velocity → takeoff transition
Landing	Safe impact absorption	★★★	Impact distribution, recovery

Terrain Perception: Height Maps

Parkour requires seeing the terrain ahead to plan motions. Two main approaches:

Approach 1: Privileged Learning (Teacher-Student)

class ParkourTeacherEnv:
    """Teacher policy with privileged terrain information."""
    
    def __init__(self, terrain_resolution=0.05, scan_range=2.0):
        self.resolution = terrain_resolution
        self.scan_range = scan_range
        self.n_points = int(scan_range / terrain_resolution)
        
    def get_privileged_obs(self, robot_pos, robot_heading):
        """Height scan ahead of the robot."""
        scan_points = []
        
        for i in range(self.n_points):
            dist = (i + 1) * self.resolution
            x = robot_pos[0] + dist * np.cos(robot_heading)
            y = robot_pos[1] + dist * np.sin(robot_heading)
            
            height = self.terrain.get_height(x, y) - robot_pos[2]
            scan_points.append(height)
        
        return np.array(scan_points)  # (n_points,)


class ParkourStudentPolicy:
    """Student policy learns from proprioception only."""
    
    def __init__(self, teacher_policy, obs_history_len=50):
        self.teacher = teacher_policy
        self.history_len = obs_history_len
        
        # Student network: obs_history → estimated terrain
        self.terrain_estimator = nn.Sequential(
            nn.Linear(obs_history_len * obs_dim, 256),
            nn.ReLU(),
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Linear(128, n_terrain_points),
        )
    
    def distill(self, env, n_episodes=10000):
        """Distill teacher knowledge into student."""
        for ep in range(n_episodes):
            obs = env.reset()
            self.obs_history.clear()
            
            while not done:
                teacher_obs = env.get_teacher_obs(obs)
                teacher_action = self.teacher.predict(teacher_obs)
                
                self.obs_history.append(obs)
                if len(self.obs_history) > self.history_len:
                    self.obs_history.pop(0)
                
                history_tensor = torch.tensor(
                    np.stack(self.obs_history)
                ).flatten()
                estimated_terrain = self.terrain_estimator(history_tensor)
                
                true_terrain = env.get_privileged_obs(
                    env.robot_pos, env.robot_heading
                )
                terrain_loss = F.mse_loss(
                    estimated_terrain, torch.tensor(true_terrain)
                )
                
                student_obs = torch.cat([
                    torch.tensor(obs), estimated_terrain
                ])
                student_action = self.student_policy(student_obs)
                action_loss = F.mse_loss(student_action, teacher_action)
                
                total_loss = terrain_loss + action_loss
                total_loss.backward()
                self.optimizer.step()

Approach 2: Depth Camera

class DepthBasedParkour:
    """Parkour with depth camera observation."""
    
    def __init__(self):
        self.depth_encoder = nn.Sequential(
            nn.Conv2d(1, 32, 5, stride=2),   # (64,64) → (30,30)
            nn.ReLU(),
            nn.Conv2d(32, 64, 3, stride=2),  # → (14,14)
            nn.ReLU(),
            nn.Conv2d(64, 64, 3, stride=2),  # → (6,6)
            nn.ReLU(),
            nn.Flatten(),
            nn.Linear(64 * 6 * 6, 128),
            nn.ReLU(),
        )
        
    def process_depth(self, depth_image):
        """Encode depth image into latent vector."""
        depth_norm = (depth_image - depth_image.mean()) / (depth_image.std() + 1e-6)
        depth_tensor = torch.tensor(depth_norm).unsqueeze(0).unsqueeze(0).float()
        return self.depth_encoder(depth_tensor)

Reward Design for Parkour

Multi-Phase Reward

class ParkourReward:
    """Reward function for parkour tasks."""
    
    def __init__(self, config):
        self.config = config
        self.weights = {
            # Phase 1: Approach
            'approach_velocity': 2.0,
            'heading_alignment': 0.5,
            
            # Phase 2: Takeoff
            'takeoff_velocity': 3.0,
            'takeoff_angle': 1.0,
            'leg_extension': 1.0,
            
            # Phase 3: Flight
            'height_clearance': 5.0,
            'body_orientation': 2.0,
            
            # Phase 4: Landing
            'landing_stability': 5.0,
            'impact_reduction': 2.0,
            'recovery_speed': 1.0,
            
            # General
            'alive_bonus': 0.5,
            'energy_penalty': -0.01,
            'torque_penalty': -0.001,
        }
    
    def compute(self, state, prev_state):
        reward = 0.0
        
        if not state['fallen']:
            reward += self.weights['alive_bonus']
        
        # Forward velocity tracking
        vel_error = abs(state['forward_vel'] - self.config.target_vel)
        reward += self.weights['approach_velocity'] * np.exp(-vel_error)
        
        # Height clearance (obstacle traversal)
        if state['over_obstacle']:
            clearance = state['foot_height'] - state['obstacle_height']
            if clearance > 0:
                reward += self.weights['height_clearance'] * min(clearance, 0.3)
        
        # Landing stability
        if state['just_landed']:
            impact = state['ground_reaction_force']
            max_safe_force = 3.0 * state['robot_mass'] * 9.81
            if impact < max_safe_force:
                reward += self.weights['landing_stability']
            else:
                reward -= self.weights['impact_reduction'] * (impact / max_safe_force - 1)
            
            if state['steps_to_stable'] < 10:
                reward += self.weights['recovery_speed']
        
        reward += self.weights['energy_penalty'] * state['total_power']
        reward += self.weights['torque_penalty'] * np.sum(state['torques'] ** 2)
        
        return reward

Training Pipeline for Parkour

Obstacle Course Generation

class ParkourTerrainGenerator:
    """Generate random obstacle courses for training."""
    
    def __init__(self, num_envs=4096, terrain_size=20.0):
        self.num_envs = num_envs
        self.terrain_size = terrain_size
        
    def generate_gap_terrain(self, difficulty):
        """Generate terrain with gaps to jump over."""
        heightfield = np.zeros((200, 200))
        gap_width = 0.3 + difficulty * 0.7    # 0.3m → 1.0m
        gap_depth = 0.5 + difficulty * 1.5    # 0.5m → 2.0m
        
        n_gaps = int(3 + difficulty * 5)
        for i in range(n_gaps):
            x_start = np.random.randint(30, 170)
            x_width = int(gap_width / 0.1)
            heightfield[x_start:x_start+x_width, :] = -gap_depth
        
        return heightfield
    
    def generate_stepping_stones(self, difficulty):
        """Stepping stones with increasing gaps."""
        heightfield = np.ones((200, 200)) * -2.0  # Deep pit
        
        stone_size = max(0.4, 0.8 - difficulty * 0.4)
        gap = 0.3 + difficulty * 0.5
        
        x = 0
        while x < 20:
            x_idx = int(x / 0.1)
            w_idx = int(stone_size / 0.1)
            y_offset = int(np.random.uniform(-0.3, 0.3) / 0.1)
            y_center = 100 + y_offset
            
            heightfield[x_idx:x_idx+w_idx, 
                        y_center-w_idx:y_center+w_idx] = 0.0
            x += stone_size + gap
        
        return heightfield
    
    def generate_boxes(self, difficulty):
        """Boxes to jump on/off."""
        heightfield = np.zeros((200, 200))
        
        n_boxes = int(3 + difficulty * 5)
        for i in range(n_boxes):
            x = np.random.randint(20, 180)
            y = np.random.randint(60, 140)
            w = np.random.randint(5, 15)
            h = 0.2 + difficulty * 0.6
            heightfield[x:x+w, y:y+w] = h
        
        return heightfield

Multi-Skill Composition

Unified vs Separate Policies

Approach	Pros	Cons
Unified policy	Smooth transitions, end-to-end	Hard to train, slower convergence
Separate policies	Easier to train, modular	Transition jitter, skill selection needed
Mixture of Experts	Best of both — shared backbone, expert heads	Complex architecture

Mixture of Experts Implementation

class ParkourMoE(nn.Module):
    """Mixture of Experts for parkour skills."""
    
    def __init__(self, obs_dim, action_dim, n_experts=4):
        super().__init__()
        self.n_experts = n_experts
        
        # Shared backbone
        self.backbone = nn.Sequential(
            nn.Linear(obs_dim, 512),
            nn.ELU(),
            nn.Linear(512, 256),
            nn.ELU(),
        )
        
        # Expert heads (walk, jump, climb, land)
        self.experts = nn.ModuleList([
            nn.Sequential(
                nn.Linear(256, 128),
                nn.ELU(),
                nn.Linear(128, action_dim),
            ) for _ in range(n_experts)
        ])
        
        # Gating network
        self.gate = nn.Sequential(
            nn.Linear(256, 64),
            nn.ELU(),
            nn.Linear(64, n_experts),
            nn.Softmax(dim=-1),
        )
    
    def forward(self, obs):
        features = self.backbone(obs)
        
        expert_outputs = torch.stack([
            expert(features) for expert in self.experts
        ], dim=1)  # (B, n_experts, action_dim)
        
        weights = self.gate(features)  # (B, n_experts)
        
        action = torch.sum(
            expert_outputs * weights.unsqueeze(-1), dim=1
        )  # (B, action_dim)
        
        return action, weights

Current State-of-the-Art and Limitations

Achieved (2026)

Gap crossing: Humanoid jumping over 1m+ gaps in sim (Robot Parkour Learning, Zhuang et al. 2023)
Stair climbing: Smooth up/down on real robots (Extreme Parkour, Cheng et al. 2024)
Box jumping: Jumping onto 0.5m boxes in sim (Humanoid-Gym)
Running: Humanoid running 3+ m/s on flat terrain (Radosavovic et al. 2024)

Unsolved

Vaulting: Arm-assisted obstacle clearing — too complex for current RL
Wall climbing: Requires grippers + full-body contact planning
Fall recovery: Getting back up after falling — active research area
Outdoor parkour: Unstructured environments, variable surfaces

References

Robot Parkour Learning — Zhuang et al., CoRL 2023
Extreme Parkour with Legged Robots — Cheng et al., ICRA 2024
Agile and Versatile Robot Locomotion via Agility Training — Radosavovic et al., 2024
Humanoid-Gym: Reinforcement Learning for Humanoid Robot — Gu et al., 2024

Unitree G1: Terrain Adaptation & Robust Walking
Robot Parkour: Jumping and Climbing with RL — Locomotion series
Sim-to-Real for Humanoids: Deployment Best Practices — Next post

Humanoid Parkour: Jumping, Climbing & Obstacle Courses

From Walking to Parkour: The Quantum Leap

Parkour Skills Taxonomy

Terrain Perception: Height Maps

Approach 1: Privileged Learning (Teacher-Student)

Approach 2: Depth Camera

Reward Design for Parkour

Multi-Phase Reward

Training Pipeline for Parkour

Obstacle Course Generation

Multi-Skill Composition

Unified vs Separate Policies

Mixture of Experts Implementation

Current State-of-the-Art and Limitations

Achieved (2026)

Unsolved

References

Nguyễn Anh Tuấn

Bài viết liên quan

WholeBodyVLA Tutorial: Teleop → Train → Deploy Humanoid

Booster Gym ICRA 2026: Train Humanoid T1 Sim-to-Real với Isaac Gym

LIFT humanoid ICLR 2026: pretrain SAC + world model finetune

From Walking to Parkour: The Quantum Leap

Parkour Skills Taxonomy

Terrain Perception: Height Maps

Approach 1: Privileged Learning (Teacher-Student)

Approach 2: Depth Camera

Reward Design for Parkour

Multi-Phase Reward

Training Pipeline for Parkour

Obstacle Course Generation

Multi-Skill Composition

Unified vs Separate Policies

Mixture of Experts Implementation

Current State-of-the-Art and Limitations

Achieved (2026)

Unsolved

References

Related Posts

Nguyễn Anh Tuấn

Bài viết liên quan

WholeBodyVLA Tutorial: Teleop → Train → Deploy Humanoid

Booster Gym ICRA 2026: Train Humanoid T1 Sim-to-Real với Isaac Gym

LIFT humanoid ICLR 2026: pretrain SAC + world model finetune