humanoidparkourhumanoidrljumpingclimbing

Humanoid Parkour: Jumping, Climbing & Obstacle Courses

Parkour for humanoid robots — jumping obstacles, climbing stairs, navigating rough terrain with RL and terrain perception.

Nguyễn Anh Tuấn6 tháng 4, 20267 phút đọc
Humanoid Parkour: Jumping, Climbing & Obstacle Courses

From Walking to Parkour: The Quantum Leap

In previous posts, we trained humanoids to walk on flat ground, traverse rough terrain, run fast, and carry objects. But all of these involved continuous locomotion — feet always contacting the ground in consistent patterns.

Parkour is a quantum leap: the robot must jump over gaps, climb tall steps, and vault over obstacles — motions requiring a flight phase (no feet touching the ground) and impact management (absorbing landing forces). This is the hardest frontier of humanoid locomotion.

Humanoid robot navigating obstacles

Parkour Skills Taxonomy

Skill Description Difficulty Key Challenge
Gap crossing Jump over gaps/crevasses ★★★ Timing, flight trajectory
Box jumping Jump onto/off boxes ★★★ Impact absorption, height estimation
Stair climbing Climb stairs up/down ★★ Step detection, balance
Vaulting Arm-assisted obstacle clearing ★★★★ Contact switching, arm-leg coordination
Wall climbing Grip walls and climb ★★★★★ Multi-contact, friction, strength
Running jump Sprint + long jump ★★★ Velocity → takeoff transition
Landing Safe impact absorption ★★★ Impact distribution, recovery

Terrain Perception: Height Maps

Parkour requires seeing the terrain ahead to plan motions. Two main approaches:

Approach 1: Privileged Learning (Teacher-Student)

class ParkourTeacherEnv:
    """Teacher policy with privileged terrain information."""
    
    def __init__(self, terrain_resolution=0.05, scan_range=2.0):
        self.resolution = terrain_resolution
        self.scan_range = scan_range
        self.n_points = int(scan_range / terrain_resolution)
        
    def get_privileged_obs(self, robot_pos, robot_heading):
        """Height scan ahead of the robot."""
        scan_points = []
        
        for i in range(self.n_points):
            dist = (i + 1) * self.resolution
            x = robot_pos[0] + dist * np.cos(robot_heading)
            y = robot_pos[1] + dist * np.sin(robot_heading)
            
            height = self.terrain.get_height(x, y) - robot_pos[2]
            scan_points.append(height)
        
        return np.array(scan_points)  # (n_points,)


class ParkourStudentPolicy:
    """Student policy learns from proprioception only."""
    
    def __init__(self, teacher_policy, obs_history_len=50):
        self.teacher = teacher_policy
        self.history_len = obs_history_len
        
        # Student network: obs_history → estimated terrain
        self.terrain_estimator = nn.Sequential(
            nn.Linear(obs_history_len * obs_dim, 256),
            nn.ReLU(),
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Linear(128, n_terrain_points),
        )
    
    def distill(self, env, n_episodes=10000):
        """Distill teacher knowledge into student."""
        for ep in range(n_episodes):
            obs = env.reset()
            self.obs_history.clear()
            
            while not done:
                teacher_obs = env.get_teacher_obs(obs)
                teacher_action = self.teacher.predict(teacher_obs)
                
                self.obs_history.append(obs)
                if len(self.obs_history) > self.history_len:
                    self.obs_history.pop(0)
                
                history_tensor = torch.tensor(
                    np.stack(self.obs_history)
                ).flatten()
                estimated_terrain = self.terrain_estimator(history_tensor)
                
                true_terrain = env.get_privileged_obs(
                    env.robot_pos, env.robot_heading
                )
                terrain_loss = F.mse_loss(
                    estimated_terrain, torch.tensor(true_terrain)
                )
                
                student_obs = torch.cat([
                    torch.tensor(obs), estimated_terrain
                ])
                student_action = self.student_policy(student_obs)
                action_loss = F.mse_loss(student_action, teacher_action)
                
                total_loss = terrain_loss + action_loss
                total_loss.backward()
                self.optimizer.step()

Approach 2: Depth Camera

class DepthBasedParkour:
    """Parkour with depth camera observation."""
    
    def __init__(self):
        self.depth_encoder = nn.Sequential(
            nn.Conv2d(1, 32, 5, stride=2),   # (64,64) → (30,30)
            nn.ReLU(),
            nn.Conv2d(32, 64, 3, stride=2),  # → (14,14)
            nn.ReLU(),
            nn.Conv2d(64, 64, 3, stride=2),  # → (6,6)
            nn.ReLU(),
            nn.Flatten(),
            nn.Linear(64 * 6 * 6, 128),
            nn.ReLU(),
        )
        
    def process_depth(self, depth_image):
        """Encode depth image into latent vector."""
        depth_norm = (depth_image - depth_image.mean()) / (depth_image.std() + 1e-6)
        depth_tensor = torch.tensor(depth_norm).unsqueeze(0).unsqueeze(0).float()
        return self.depth_encoder(depth_tensor)

Reward Design for Parkour

Multi-Phase Reward

class ParkourReward:
    """Reward function for parkour tasks."""
    
    def __init__(self, config):
        self.config = config
        self.weights = {
            # Phase 1: Approach
            'approach_velocity': 2.0,
            'heading_alignment': 0.5,
            
            # Phase 2: Takeoff
            'takeoff_velocity': 3.0,
            'takeoff_angle': 1.0,
            'leg_extension': 1.0,
            
            # Phase 3: Flight
            'height_clearance': 5.0,
            'body_orientation': 2.0,
            
            # Phase 4: Landing
            'landing_stability': 5.0,
            'impact_reduction': 2.0,
            'recovery_speed': 1.0,
            
            # General
            'alive_bonus': 0.5,
            'energy_penalty': -0.01,
            'torque_penalty': -0.001,
        }
    
    def compute(self, state, prev_state):
        reward = 0.0
        
        if not state['fallen']:
            reward += self.weights['alive_bonus']
        
        # Forward velocity tracking
        vel_error = abs(state['forward_vel'] - self.config.target_vel)
        reward += self.weights['approach_velocity'] * np.exp(-vel_error)
        
        # Height clearance (obstacle traversal)
        if state['over_obstacle']:
            clearance = state['foot_height'] - state['obstacle_height']
            if clearance > 0:
                reward += self.weights['height_clearance'] * min(clearance, 0.3)
        
        # Landing stability
        if state['just_landed']:
            impact = state['ground_reaction_force']
            max_safe_force = 3.0 * state['robot_mass'] * 9.81
            if impact < max_safe_force:
                reward += self.weights['landing_stability']
            else:
                reward -= self.weights['impact_reduction'] * (impact / max_safe_force - 1)
            
            if state['steps_to_stable'] < 10:
                reward += self.weights['recovery_speed']
        
        reward += self.weights['energy_penalty'] * state['total_power']
        reward += self.weights['torque_penalty'] * np.sum(state['torques'] ** 2)
        
        return reward

Training Pipeline for Parkour

Obstacle Course Generation

class ParkourTerrainGenerator:
    """Generate random obstacle courses for training."""
    
    def __init__(self, num_envs=4096, terrain_size=20.0):
        self.num_envs = num_envs
        self.terrain_size = terrain_size
        
    def generate_gap_terrain(self, difficulty):
        """Generate terrain with gaps to jump over."""
        heightfield = np.zeros((200, 200))
        gap_width = 0.3 + difficulty * 0.7    # 0.3m → 1.0m
        gap_depth = 0.5 + difficulty * 1.5    # 0.5m → 2.0m
        
        n_gaps = int(3 + difficulty * 5)
        for i in range(n_gaps):
            x_start = np.random.randint(30, 170)
            x_width = int(gap_width / 0.1)
            heightfield[x_start:x_start+x_width, :] = -gap_depth
        
        return heightfield
    
    def generate_stepping_stones(self, difficulty):
        """Stepping stones with increasing gaps."""
        heightfield = np.ones((200, 200)) * -2.0  # Deep pit
        
        stone_size = max(0.4, 0.8 - difficulty * 0.4)
        gap = 0.3 + difficulty * 0.5
        
        x = 0
        while x < 20:
            x_idx = int(x / 0.1)
            w_idx = int(stone_size / 0.1)
            y_offset = int(np.random.uniform(-0.3, 0.3) / 0.1)
            y_center = 100 + y_offset
            
            heightfield[x_idx:x_idx+w_idx, 
                        y_center-w_idx:y_center+w_idx] = 0.0
            x += stone_size + gap
        
        return heightfield
    
    def generate_boxes(self, difficulty):
        """Boxes to jump on/off."""
        heightfield = np.zeros((200, 200))
        
        n_boxes = int(3 + difficulty * 5)
        for i in range(n_boxes):
            x = np.random.randint(20, 180)
            y = np.random.randint(60, 140)
            w = np.random.randint(5, 15)
            h = 0.2 + difficulty * 0.6
            heightfield[x:x+w, y:y+w] = h
        
        return heightfield

Training visualization for parkour

Multi-Skill Composition

Unified vs Separate Policies

Approach Pros Cons
Unified policy Smooth transitions, end-to-end Hard to train, slower convergence
Separate policies Easier to train, modular Transition jitter, skill selection needed
Mixture of Experts Best of both — shared backbone, expert heads Complex architecture

Mixture of Experts Implementation

class ParkourMoE(nn.Module):
    """Mixture of Experts for parkour skills."""
    
    def __init__(self, obs_dim, action_dim, n_experts=4):
        super().__init__()
        self.n_experts = n_experts
        
        # Shared backbone
        self.backbone = nn.Sequential(
            nn.Linear(obs_dim, 512),
            nn.ELU(),
            nn.Linear(512, 256),
            nn.ELU(),
        )
        
        # Expert heads (walk, jump, climb, land)
        self.experts = nn.ModuleList([
            nn.Sequential(
                nn.Linear(256, 128),
                nn.ELU(),
                nn.Linear(128, action_dim),
            ) for _ in range(n_experts)
        ])
        
        # Gating network
        self.gate = nn.Sequential(
            nn.Linear(256, 64),
            nn.ELU(),
            nn.Linear(64, n_experts),
            nn.Softmax(dim=-1),
        )
    
    def forward(self, obs):
        features = self.backbone(obs)
        
        expert_outputs = torch.stack([
            expert(features) for expert in self.experts
        ], dim=1)  # (B, n_experts, action_dim)
        
        weights = self.gate(features)  # (B, n_experts)
        
        action = torch.sum(
            expert_outputs * weights.unsqueeze(-1), dim=1
        )  # (B, action_dim)
        
        return action, weights

Current State-of-the-Art and Limitations

Achieved (2026)

Unsolved

  • Vaulting: Arm-assisted obstacle clearing — too complex for current RL
  • Wall climbing: Requires grippers + full-body contact planning
  • Fall recovery: Getting back up after falling — active research area
  • Outdoor parkour: Unstructured environments, variable surfaces

References

  1. Robot Parkour Learning — Zhuang et al., CoRL 2023
  2. Extreme Parkour with Legged Robots — Cheng et al., ICRA 2024
  3. Agile and Versatile Robot Locomotion via Agility Training — Radosavovic et al., 2024
  4. Humanoid-Gym: Reinforcement Learning for Humanoid Robot — Gu et al., 2024
NT

Nguyễn Anh Tuấn

Robotics & AI Engineer. Building VnRobo — sharing knowledge about robot learning, VLA models, and automation.

Bài viết liên quan

NEWTutorial
Fine-Tune GR00T N1.6 với Cosmos Reason 2
grootnvidiavlacosmosfine-tuninghumanoidisaac

Fine-Tune GR00T N1.6 với Cosmos Reason 2

Hướng dẫn chi tiết fine-tune NVIDIA GR00T N1.6 — VLA model 3B tham số kết hợp Cosmos Reason 2 để điều khiển humanoid robot từ ảnh và ngôn ngữ.

15/4/202611 phút đọc
NEWTutorial
GEAR-SONIC: Whole-Body Control cho Humanoid Robot
humanoidwhole-body-controlnvidiareinforcement-learningmotion-trackingvr-teleoperationisaac-lab

GEAR-SONIC: Whole-Body Control cho Humanoid Robot

Hướng dẫn chi tiết GEAR-SONIC của NVIDIA — huấn luyện whole-body controller cho humanoid robot với dataset BONES-SEED và VR teleoperation.

13/4/202612 phút đọc
NEWTutorial
Genie Sim 3.0: Huấn luyện Humanoid với AGIBOT
simulationhumanoidisaac-simgenie-simagibotsim-to-realreinforcement-learning

Genie Sim 3.0: Huấn luyện Humanoid với AGIBOT

Hướng dẫn chi tiết dựng môi trường simulation với Genie Sim 3.0 — nền tảng open-source từ AGIBOT trên Isaac Sim để huấn luyện robot humanoid.

12/4/202611 phút đọc