humanoidparkourhumanoidrljumpingclimbing

Humanoid Parkour: Jumping, Climbing & Obstacle Courses

Parkour for humanoid robots — jumping obstacles, climbing stairs, navigating rough terrain with RL and terrain perception.

Nguyễn Anh Tuấn6 tháng 4, 20267 phút đọc
Humanoid Parkour: Jumping, Climbing & Obstacle Courses

From Walking to Parkour: The Quantum Leap

In previous posts, we trained humanoids to walk on flat ground, traverse rough terrain, run fast, and carry objects. But all of these involved continuous locomotion — feet always contacting the ground in consistent patterns.

Parkour is a quantum leap: the robot must jump over gaps, climb tall steps, and vault over obstacles — motions requiring a flight phase (no feet touching the ground) and impact management (absorbing landing forces). This is the hardest frontier of humanoid locomotion.

Humanoid robot navigating obstacles

Parkour Skills Taxonomy

Skill Description Difficulty Key Challenge
Gap crossing Jump over gaps/crevasses ★★★ Timing, flight trajectory
Box jumping Jump onto/off boxes ★★★ Impact absorption, height estimation
Stair climbing Climb stairs up/down ★★ Step detection, balance
Vaulting Arm-assisted obstacle clearing ★★★★ Contact switching, arm-leg coordination
Wall climbing Grip walls and climb ★★★★★ Multi-contact, friction, strength
Running jump Sprint + long jump ★★★ Velocity → takeoff transition
Landing Safe impact absorption ★★★ Impact distribution, recovery

Terrain Perception: Height Maps

Parkour requires seeing the terrain ahead to plan motions. Two main approaches:

Approach 1: Privileged Learning (Teacher-Student)

class ParkourTeacherEnv:
    """Teacher policy with privileged terrain information."""
    
    def __init__(self, terrain_resolution=0.05, scan_range=2.0):
        self.resolution = terrain_resolution
        self.scan_range = scan_range
        self.n_points = int(scan_range / terrain_resolution)
        
    def get_privileged_obs(self, robot_pos, robot_heading):
        """Height scan ahead of the robot."""
        scan_points = []
        
        for i in range(self.n_points):
            dist = (i + 1) * self.resolution
            x = robot_pos[0] + dist * np.cos(robot_heading)
            y = robot_pos[1] + dist * np.sin(robot_heading)
            
            height = self.terrain.get_height(x, y) - robot_pos[2]
            scan_points.append(height)
        
        return np.array(scan_points)  # (n_points,)


class ParkourStudentPolicy:
    """Student policy learns from proprioception only."""
    
    def __init__(self, teacher_policy, obs_history_len=50):
        self.teacher = teacher_policy
        self.history_len = obs_history_len
        
        # Student network: obs_history → estimated terrain
        self.terrain_estimator = nn.Sequential(
            nn.Linear(obs_history_len * obs_dim, 256),
            nn.ReLU(),
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Linear(128, n_terrain_points),
        )
    
    def distill(self, env, n_episodes=10000):
        """Distill teacher knowledge into student."""
        for ep in range(n_episodes):
            obs = env.reset()
            self.obs_history.clear()
            
            while not done:
                teacher_obs = env.get_teacher_obs(obs)
                teacher_action = self.teacher.predict(teacher_obs)
                
                self.obs_history.append(obs)
                if len(self.obs_history) > self.history_len:
                    self.obs_history.pop(0)
                
                history_tensor = torch.tensor(
                    np.stack(self.obs_history)
                ).flatten()
                estimated_terrain = self.terrain_estimator(history_tensor)
                
                true_terrain = env.get_privileged_obs(
                    env.robot_pos, env.robot_heading
                )
                terrain_loss = F.mse_loss(
                    estimated_terrain, torch.tensor(true_terrain)
                )
                
                student_obs = torch.cat([
                    torch.tensor(obs), estimated_terrain
                ])
                student_action = self.student_policy(student_obs)
                action_loss = F.mse_loss(student_action, teacher_action)
                
                total_loss = terrain_loss + action_loss
                total_loss.backward()
                self.optimizer.step()

Approach 2: Depth Camera

class DepthBasedParkour:
    """Parkour with depth camera observation."""
    
    def __init__(self):
        self.depth_encoder = nn.Sequential(
            nn.Conv2d(1, 32, 5, stride=2),   # (64,64) → (30,30)
            nn.ReLU(),
            nn.Conv2d(32, 64, 3, stride=2),  # → (14,14)
            nn.ReLU(),
            nn.Conv2d(64, 64, 3, stride=2),  # → (6,6)
            nn.ReLU(),
            nn.Flatten(),
            nn.Linear(64 * 6 * 6, 128),
            nn.ReLU(),
        )
        
    def process_depth(self, depth_image):
        """Encode depth image into latent vector."""
        depth_norm = (depth_image - depth_image.mean()) / (depth_image.std() + 1e-6)
        depth_tensor = torch.tensor(depth_norm).unsqueeze(0).unsqueeze(0).float()
        return self.depth_encoder(depth_tensor)

Reward Design for Parkour

Multi-Phase Reward

class ParkourReward:
    """Reward function for parkour tasks."""
    
    def __init__(self, config):
        self.config = config
        self.weights = {
            # Phase 1: Approach
            'approach_velocity': 2.0,
            'heading_alignment': 0.5,
            
            # Phase 2: Takeoff
            'takeoff_velocity': 3.0,
            'takeoff_angle': 1.0,
            'leg_extension': 1.0,
            
            # Phase 3: Flight
            'height_clearance': 5.0,
            'body_orientation': 2.0,
            
            # Phase 4: Landing
            'landing_stability': 5.0,
            'impact_reduction': 2.0,
            'recovery_speed': 1.0,
            
            # General
            'alive_bonus': 0.5,
            'energy_penalty': -0.01,
            'torque_penalty': -0.001,
        }
    
    def compute(self, state, prev_state):
        reward = 0.0
        
        if not state['fallen']:
            reward += self.weights['alive_bonus']
        
        # Forward velocity tracking
        vel_error = abs(state['forward_vel'] - self.config.target_vel)
        reward += self.weights['approach_velocity'] * np.exp(-vel_error)
        
        # Height clearance (obstacle traversal)
        if state['over_obstacle']:
            clearance = state['foot_height'] - state['obstacle_height']
            if clearance > 0:
                reward += self.weights['height_clearance'] * min(clearance, 0.3)
        
        # Landing stability
        if state['just_landed']:
            impact = state['ground_reaction_force']
            max_safe_force = 3.0 * state['robot_mass'] * 9.81
            if impact < max_safe_force:
                reward += self.weights['landing_stability']
            else:
                reward -= self.weights['impact_reduction'] * (impact / max_safe_force - 1)
            
            if state['steps_to_stable'] < 10:
                reward += self.weights['recovery_speed']
        
        reward += self.weights['energy_penalty'] * state['total_power']
        reward += self.weights['torque_penalty'] * np.sum(state['torques'] ** 2)
        
        return reward

Training Pipeline for Parkour

Obstacle Course Generation

class ParkourTerrainGenerator:
    """Generate random obstacle courses for training."""
    
    def __init__(self, num_envs=4096, terrain_size=20.0):
        self.num_envs = num_envs
        self.terrain_size = terrain_size
        
    def generate_gap_terrain(self, difficulty):
        """Generate terrain with gaps to jump over."""
        heightfield = np.zeros((200, 200))
        gap_width = 0.3 + difficulty * 0.7    # 0.3m → 1.0m
        gap_depth = 0.5 + difficulty * 1.5    # 0.5m → 2.0m
        
        n_gaps = int(3 + difficulty * 5)
        for i in range(n_gaps):
            x_start = np.random.randint(30, 170)
            x_width = int(gap_width / 0.1)
            heightfield[x_start:x_start+x_width, :] = -gap_depth
        
        return heightfield
    
    def generate_stepping_stones(self, difficulty):
        """Stepping stones with increasing gaps."""
        heightfield = np.ones((200, 200)) * -2.0  # Deep pit
        
        stone_size = max(0.4, 0.8 - difficulty * 0.4)
        gap = 0.3 + difficulty * 0.5
        
        x = 0
        while x < 20:
            x_idx = int(x / 0.1)
            w_idx = int(stone_size / 0.1)
            y_offset = int(np.random.uniform(-0.3, 0.3) / 0.1)
            y_center = 100 + y_offset
            
            heightfield[x_idx:x_idx+w_idx, 
                        y_center-w_idx:y_center+w_idx] = 0.0
            x += stone_size + gap
        
        return heightfield
    
    def generate_boxes(self, difficulty):
        """Boxes to jump on/off."""
        heightfield = np.zeros((200, 200))
        
        n_boxes = int(3 + difficulty * 5)
        for i in range(n_boxes):
            x = np.random.randint(20, 180)
            y = np.random.randint(60, 140)
            w = np.random.randint(5, 15)
            h = 0.2 + difficulty * 0.6
            heightfield[x:x+w, y:y+w] = h
        
        return heightfield

Training visualization for parkour

Multi-Skill Composition

Unified vs Separate Policies

Approach Pros Cons
Unified policy Smooth transitions, end-to-end Hard to train, slower convergence
Separate policies Easier to train, modular Transition jitter, skill selection needed
Mixture of Experts Best of both — shared backbone, expert heads Complex architecture

Mixture of Experts Implementation

class ParkourMoE(nn.Module):
    """Mixture of Experts for parkour skills."""
    
    def __init__(self, obs_dim, action_dim, n_experts=4):
        super().__init__()
        self.n_experts = n_experts
        
        # Shared backbone
        self.backbone = nn.Sequential(
            nn.Linear(obs_dim, 512),
            nn.ELU(),
            nn.Linear(512, 256),
            nn.ELU(),
        )
        
        # Expert heads (walk, jump, climb, land)
        self.experts = nn.ModuleList([
            nn.Sequential(
                nn.Linear(256, 128),
                nn.ELU(),
                nn.Linear(128, action_dim),
            ) for _ in range(n_experts)
        ])
        
        # Gating network
        self.gate = nn.Sequential(
            nn.Linear(256, 64),
            nn.ELU(),
            nn.Linear(64, n_experts),
            nn.Softmax(dim=-1),
        )
    
    def forward(self, obs):
        features = self.backbone(obs)
        
        expert_outputs = torch.stack([
            expert(features) for expert in self.experts
        ], dim=1)  # (B, n_experts, action_dim)
        
        weights = self.gate(features)  # (B, n_experts)
        
        action = torch.sum(
            expert_outputs * weights.unsqueeze(-1), dim=1
        )  # (B, action_dim)
        
        return action, weights

Current State-of-the-Art and Limitations

Achieved (2026)

Unsolved

  • Vaulting: Arm-assisted obstacle clearing — too complex for current RL
  • Wall climbing: Requires grippers + full-body contact planning
  • Fall recovery: Getting back up after falling — active research area
  • Outdoor parkour: Unstructured environments, variable surfaces

References

  1. Robot Parkour Learning — Zhuang et al., CoRL 2023
  2. Extreme Parkour with Legged Robots — Cheng et al., ICRA 2024
  3. Agile and Versatile Robot Locomotion via Agility Training — Radosavovic et al., 2024
  4. Humanoid-Gym: Reinforcement Learning for Humanoid Robot — Gu et al., 2024
NT

Nguyễn Anh Tuấn

Robotics & AI Engineer. Building VnRobo — sharing knowledge about robot learning, VLA models, and automation.

Khám phá VnRobo

Bài viết liên quan

Tutorial
WholeBodyVLA Tutorial: Teleop → Train → Deploy Humanoid
wholebodyvlavlahumanoidloco-manipulationiclr-2026agibot-x2teleoprl

WholeBodyVLA Tutorial: Teleop → Train → Deploy Humanoid

ICLR 2026 — pipeline thực chiến từ thu thập teleop, train unified latent VLA đến deploy whole-body loco-manipulation trên AgiBot X2.

11/5/202611 phút đọc
Tutorial
Booster Gym ICRA 2026: Train Humanoid T1 Sim-to-Real với Isaac Gym
humanoidisaac-gymreinforcement-learningsim2realbooster-t1icra-2026

Booster Gym ICRA 2026: Train Humanoid T1 Sim-to-Real với Isaac Gym

Hướng dẫn chi tiết Booster Gym — RL framework end-to-end open-source train humanoid Booster T1 walking từ teleop đến deploy thực tế.

6/5/202611 phút đọc
Tutorial
LIFT humanoid ICLR 2026: pretrain SAC + world model finetune
humanoidreinforcement-learningsacworld-modelmujoco-playgroundbraxsim2simiclr-2026booster-t1unitree-g1

LIFT humanoid ICLR 2026: pretrain SAC + world model finetune

Reproduce LIFT ICLR 2026: pretrain SAC trong MuJoCo Playground rồi sim2sim finetune world model trên Brax cho Booster T1, Unitree G1.

24/4/202612 phút đọc