← Back to Blog
humanoidparkourhumanoidrljumpingclimbing

Humanoid Parkour: Jumping, Climbing & Obstacle Courses

Parkour for humanoid robots — jumping obstacles, climbing stairs, navigating rough terrain with RL and terrain perception.

Nguyễn Anh Tuấn6 tháng 4, 20267 min read
Humanoid Parkour: Jumping, Climbing & Obstacle Courses

From Walking to Parkour: The Quantum Leap

In previous posts, we trained humanoids to walk on flat ground, traverse rough terrain, run fast, and carry objects. But all of these involved continuous locomotion — feet always contacting the ground in consistent patterns.

Parkour is a quantum leap: the robot must jump over gaps, climb tall steps, and vault over obstacles — motions requiring a flight phase (no feet touching the ground) and impact management (absorbing landing forces). This is the hardest frontier of humanoid locomotion.

Humanoid robot navigating obstacles

Parkour Skills Taxonomy

Skill Description Difficulty Key Challenge
Gap crossing Jump over gaps/crevasses ★★★ Timing, flight trajectory
Box jumping Jump onto/off boxes ★★★ Impact absorption, height estimation
Stair climbing Climb stairs up/down ★★ Step detection, balance
Vaulting Arm-assisted obstacle clearing ★★★★ Contact switching, arm-leg coordination
Wall climbing Grip walls and climb ★★★★★ Multi-contact, friction, strength
Running jump Sprint + long jump ★★★ Velocity → takeoff transition
Landing Safe impact absorption ★★★ Impact distribution, recovery

Terrain Perception: Height Maps

Parkour requires seeing the terrain ahead to plan motions. Two main approaches:

Approach 1: Privileged Learning (Teacher-Student)

class ParkourTeacherEnv:
    """Teacher policy with privileged terrain information."""
    
    def __init__(self, terrain_resolution=0.05, scan_range=2.0):
        self.resolution = terrain_resolution
        self.scan_range = scan_range
        self.n_points = int(scan_range / terrain_resolution)
        
    def get_privileged_obs(self, robot_pos, robot_heading):
        """Height scan ahead of the robot."""
        scan_points = []
        
        for i in range(self.n_points):
            dist = (i + 1) * self.resolution
            x = robot_pos[0] + dist * np.cos(robot_heading)
            y = robot_pos[1] + dist * np.sin(robot_heading)
            
            height = self.terrain.get_height(x, y) - robot_pos[2]
            scan_points.append(height)
        
        return np.array(scan_points)  # (n_points,)


class ParkourStudentPolicy:
    """Student policy learns from proprioception only."""
    
    def __init__(self, teacher_policy, obs_history_len=50):
        self.teacher = teacher_policy
        self.history_len = obs_history_len
        
        # Student network: obs_history → estimated terrain
        self.terrain_estimator = nn.Sequential(
            nn.Linear(obs_history_len * obs_dim, 256),
            nn.ReLU(),
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Linear(128, n_terrain_points),
        )
    
    def distill(self, env, n_episodes=10000):
        """Distill teacher knowledge into student."""
        for ep in range(n_episodes):
            obs = env.reset()
            self.obs_history.clear()
            
            while not done:
                teacher_obs = env.get_teacher_obs(obs)
                teacher_action = self.teacher.predict(teacher_obs)
                
                self.obs_history.append(obs)
                if len(self.obs_history) > self.history_len:
                    self.obs_history.pop(0)
                
                history_tensor = torch.tensor(
                    np.stack(self.obs_history)
                ).flatten()
                estimated_terrain = self.terrain_estimator(history_tensor)
                
                true_terrain = env.get_privileged_obs(
                    env.robot_pos, env.robot_heading
                )
                terrain_loss = F.mse_loss(
                    estimated_terrain, torch.tensor(true_terrain)
                )
                
                student_obs = torch.cat([
                    torch.tensor(obs), estimated_terrain
                ])
                student_action = self.student_policy(student_obs)
                action_loss = F.mse_loss(student_action, teacher_action)
                
                total_loss = terrain_loss + action_loss
                total_loss.backward()
                self.optimizer.step()

Approach 2: Depth Camera

class DepthBasedParkour:
    """Parkour with depth camera observation."""
    
    def __init__(self):
        self.depth_encoder = nn.Sequential(
            nn.Conv2d(1, 32, 5, stride=2),   # (64,64) → (30,30)
            nn.ReLU(),
            nn.Conv2d(32, 64, 3, stride=2),  # → (14,14)
            nn.ReLU(),
            nn.Conv2d(64, 64, 3, stride=2),  # → (6,6)
            nn.ReLU(),
            nn.Flatten(),
            nn.Linear(64 * 6 * 6, 128),
            nn.ReLU(),
        )
        
    def process_depth(self, depth_image):
        """Encode depth image into latent vector."""
        depth_norm = (depth_image - depth_image.mean()) / (depth_image.std() + 1e-6)
        depth_tensor = torch.tensor(depth_norm).unsqueeze(0).unsqueeze(0).float()
        return self.depth_encoder(depth_tensor)

Reward Design for Parkour

Multi-Phase Reward

class ParkourReward:
    """Reward function for parkour tasks."""
    
    def __init__(self, config):
        self.config = config
        self.weights = {
            # Phase 1: Approach
            'approach_velocity': 2.0,
            'heading_alignment': 0.5,
            
            # Phase 2: Takeoff
            'takeoff_velocity': 3.0,
            'takeoff_angle': 1.0,
            'leg_extension': 1.0,
            
            # Phase 3: Flight
            'height_clearance': 5.0,
            'body_orientation': 2.0,
            
            # Phase 4: Landing
            'landing_stability': 5.0,
            'impact_reduction': 2.0,
            'recovery_speed': 1.0,
            
            # General
            'alive_bonus': 0.5,
            'energy_penalty': -0.01,
            'torque_penalty': -0.001,
        }
    
    def compute(self, state, prev_state):
        reward = 0.0
        
        if not state['fallen']:
            reward += self.weights['alive_bonus']
        
        # Forward velocity tracking
        vel_error = abs(state['forward_vel'] - self.config.target_vel)
        reward += self.weights['approach_velocity'] * np.exp(-vel_error)
        
        # Height clearance (obstacle traversal)
        if state['over_obstacle']:
            clearance = state['foot_height'] - state['obstacle_height']
            if clearance > 0:
                reward += self.weights['height_clearance'] * min(clearance, 0.3)
        
        # Landing stability
        if state['just_landed']:
            impact = state['ground_reaction_force']
            max_safe_force = 3.0 * state['robot_mass'] * 9.81
            if impact < max_safe_force:
                reward += self.weights['landing_stability']
            else:
                reward -= self.weights['impact_reduction'] * (impact / max_safe_force - 1)
            
            if state['steps_to_stable'] < 10:
                reward += self.weights['recovery_speed']
        
        reward += self.weights['energy_penalty'] * state['total_power']
        reward += self.weights['torque_penalty'] * np.sum(state['torques'] ** 2)
        
        return reward

Training Pipeline for Parkour

Obstacle Course Generation

class ParkourTerrainGenerator:
    """Generate random obstacle courses for training."""
    
    def __init__(self, num_envs=4096, terrain_size=20.0):
        self.num_envs = num_envs
        self.terrain_size = terrain_size
        
    def generate_gap_terrain(self, difficulty):
        """Generate terrain with gaps to jump over."""
        heightfield = np.zeros((200, 200))
        gap_width = 0.3 + difficulty * 0.7    # 0.3m → 1.0m
        gap_depth = 0.5 + difficulty * 1.5    # 0.5m → 2.0m
        
        n_gaps = int(3 + difficulty * 5)
        for i in range(n_gaps):
            x_start = np.random.randint(30, 170)
            x_width = int(gap_width / 0.1)
            heightfield[x_start:x_start+x_width, :] = -gap_depth
        
        return heightfield
    
    def generate_stepping_stones(self, difficulty):
        """Stepping stones with increasing gaps."""
        heightfield = np.ones((200, 200)) * -2.0  # Deep pit
        
        stone_size = max(0.4, 0.8 - difficulty * 0.4)
        gap = 0.3 + difficulty * 0.5
        
        x = 0
        while x < 20:
            x_idx = int(x / 0.1)
            w_idx = int(stone_size / 0.1)
            y_offset = int(np.random.uniform(-0.3, 0.3) / 0.1)
            y_center = 100 + y_offset
            
            heightfield[x_idx:x_idx+w_idx, 
                        y_center-w_idx:y_center+w_idx] = 0.0
            x += stone_size + gap
        
        return heightfield
    
    def generate_boxes(self, difficulty):
        """Boxes to jump on/off."""
        heightfield = np.zeros((200, 200))
        
        n_boxes = int(3 + difficulty * 5)
        for i in range(n_boxes):
            x = np.random.randint(20, 180)
            y = np.random.randint(60, 140)
            w = np.random.randint(5, 15)
            h = 0.2 + difficulty * 0.6
            heightfield[x:x+w, y:y+w] = h
        
        return heightfield

Training visualization for parkour

Multi-Skill Composition

Unified vs Separate Policies

Approach Pros Cons
Unified policy Smooth transitions, end-to-end Hard to train, slower convergence
Separate policies Easier to train, modular Transition jitter, skill selection needed
Mixture of Experts Best of both — shared backbone, expert heads Complex architecture

Mixture of Experts Implementation

class ParkourMoE(nn.Module):
    """Mixture of Experts for parkour skills."""
    
    def __init__(self, obs_dim, action_dim, n_experts=4):
        super().__init__()
        self.n_experts = n_experts
        
        # Shared backbone
        self.backbone = nn.Sequential(
            nn.Linear(obs_dim, 512),
            nn.ELU(),
            nn.Linear(512, 256),
            nn.ELU(),
        )
        
        # Expert heads (walk, jump, climb, land)
        self.experts = nn.ModuleList([
            nn.Sequential(
                nn.Linear(256, 128),
                nn.ELU(),
                nn.Linear(128, action_dim),
            ) for _ in range(n_experts)
        ])
        
        # Gating network
        self.gate = nn.Sequential(
            nn.Linear(256, 64),
            nn.ELU(),
            nn.Linear(64, n_experts),
            nn.Softmax(dim=-1),
        )
    
    def forward(self, obs):
        features = self.backbone(obs)
        
        expert_outputs = torch.stack([
            expert(features) for expert in self.experts
        ], dim=1)  # (B, n_experts, action_dim)
        
        weights = self.gate(features)  # (B, n_experts)
        
        action = torch.sum(
            expert_outputs * weights.unsqueeze(-1), dim=1
        )  # (B, action_dim)
        
        return action, weights

Current State-of-the-Art and Limitations

Achieved (2026)

Unsolved

References

  1. Robot Parkour Learning — Zhuang et al., CoRL 2023
  2. Extreme Parkour with Legged Robots — Cheng et al., ICRA 2024
  3. Agile and Versatile Robot Locomotion via Agility Training — Radosavovic et al., 2024
  4. Humanoid-Gym: Reinforcement Learning for Humanoid Robot — Gu et al., 2024

Related Posts

Related Posts

TutorialSim-to-Real cho Humanoid: Deployment Best Practices
sim2realhumanoiddeploymentrlunitreePart 10

Sim-to-Real cho Humanoid: Deployment Best Practices

Pipeline hoàn chỉnh deploy RL locomotion policy lên robot humanoid thật — domain randomization, system identification, safety, và Unitree SDK.

9/4/202611 min read
Deep DiveMulti-Step Manipulation: Curriculum Learning cho Long-Horizon
rlcurriculum-learninglong-horizonmanipulationPart 8

Multi-Step Manipulation: Curriculum Learning cho Long-Horizon

Giải quyết manipulation dài hơi bằng RL — curriculum learning, hierarchical RL, skill chaining, và benchmark IKEA furniture assembly.

7/4/202610 min read
Deep DiveHumanoid Manipulation: Whole-body Control với LeRobot
lerobothumanoidwhole-bodyunitreePart 9

Humanoid Manipulation: Whole-body Control với LeRobot

Robot hình người là nền tảng manipulation tối thượng — high-DOF control, egocentric vision, và humanoid foundation models.

5/4/20269 min read