← Quay lại Blog
humanoidparkourhumanoidrljumpingclimbing

Humanoid Parkour: Jumping, Climbing và Obstacle Course

Parkour cho humanoid robot — nhảy chướng ngại vật, leo cầu thang, vượt địa hình khó bằng RL với terrain perception.

Nguyễn Anh Tuấn6 tháng 4, 20269 phút đọc
Humanoid Parkour: Jumping, Climbing và Obstacle Course

Từ Walking đến Parkour: Bước nhảy vọt

Trong các bài trước, chúng ta đã train humanoid đi trên mặt phẳng, vượt terrain khó, chạy nhanh, và bê vật thể. Nhưng tất cả đều là continuous locomotion — chân luôn tiếp xúc mặt đất theo pattern nhất quán.

Parkour là bước nhảy vọt: robot phải nhảy qua hố, leo lên bậc cao, vượt chướng ngại vật — những motion đòi hỏi flight phase (không chân nào chạm đất) và impact management (hấp thụ va đập khi tiếp đất). Đây là frontier khó nhất của humanoid locomotion.

Humanoid robot navigating obstacles

Parkour Skills Taxonomy

Skill Mô tả Độ khó Key Challenge
Gap crossing Nhảy qua hố/khe ★★★ Timing, flight trajectory
Box jumping Nhảy lên/xuống hộp ★★★ Impact absorption, height estimation
Stair climbing Leo cầu thang lên/xuống ★★ Step detection, balance
Vaulting Chống tay nhảy qua ★★★★ Contact switching, arm-leg coordination
Wall climbing Bám tường và leo ★★★★★ Multi-contact, friction, strength
Running jump Chạy đà + nhảy xa ★★★ Velocity → takeoff transition
Landing Tiếp đất an toàn ★★★ Impact distribution, recovery

Terrain Perception: Height Map

Parkour cần nhìn thấy terrain phía trước để plan motions. Hai approach chính:

Approach 1: Privileged Learning (Teacher-Student)

class ParkourTeacherEnv:
    """Teacher policy with privileged terrain information."""
    
    def __init__(self, terrain_resolution=0.05, scan_range=2.0):
        self.resolution = terrain_resolution
        self.scan_range = scan_range
        self.n_points = int(scan_range / terrain_resolution)
        
    def get_privileged_obs(self, robot_pos, robot_heading):
        """Height scan từ phía trước robot."""
        scan_points = []
        
        for i in range(self.n_points):
            # Quét từ chân robot ra phía trước
            dist = (i + 1) * self.resolution
            x = robot_pos[0] + dist * np.cos(robot_heading)
            y = robot_pos[1] + dist * np.sin(robot_heading)
            
            # Lấy height từ terrain heightfield
            height = self.terrain.get_height(x, y) - robot_pos[2]
            scan_points.append(height)
        
        return np.array(scan_points)  # (n_points,)
    
    def get_teacher_obs(self, base_obs):
        """Full observation cho teacher policy."""
        privileged = self.get_privileged_obs(
            self.robot_pos, self.robot_heading
        )
        return np.concatenate([base_obs, privileged])


class ParkourStudentPolicy:
    """Student policy learns from proprioception only."""
    
    def __init__(self, teacher_policy, obs_history_len=50):
        self.teacher = teacher_policy
        self.history_len = obs_history_len
        self.obs_history = []
        
        # Student network: obs_history → estimated terrain
        self.terrain_estimator = nn.Sequential(
            nn.Linear(obs_history_len * obs_dim, 256),
            nn.ReLU(),
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Linear(128, n_terrain_points),
        )
    
    def distill(self, env, n_episodes=10000):
        """Distill teacher knowledge into student."""
        for ep in range(n_episodes):
            obs = env.reset()
            self.obs_history.clear()
            
            while not done:
                # Teacher action
                teacher_obs = env.get_teacher_obs(obs)
                teacher_action = self.teacher.predict(teacher_obs)
                
                # Student: estimate terrain from history
                self.obs_history.append(obs)
                if len(self.obs_history) > self.history_len:
                    self.obs_history.pop(0)
                
                history_tensor = torch.tensor(
                    np.stack(self.obs_history)
                ).flatten()
                estimated_terrain = self.terrain_estimator(history_tensor)
                
                # Loss: match teacher's terrain knowledge
                true_terrain = env.get_privileged_obs(
                    env.robot_pos, env.robot_heading
                )
                terrain_loss = F.mse_loss(
                    estimated_terrain, 
                    torch.tensor(true_terrain)
                )
                
                # Action distillation loss
                student_obs = torch.cat([
                    torch.tensor(obs), estimated_terrain
                ])
                student_action = self.student_policy(student_obs)
                action_loss = F.mse_loss(student_action, teacher_action)
                
                total_loss = terrain_loss + action_loss
                total_loss.backward()
                self.optimizer.step()

Approach 2: Depth Camera

class DepthBasedParkour:
    """Parkour với depth camera observation."""
    
    def __init__(self):
        self.depth_encoder = nn.Sequential(
            # Input: depth image (64, 64)
            nn.Conv2d(1, 32, 5, stride=2),   # → (30, 30)
            nn.ReLU(),
            nn.Conv2d(32, 64, 3, stride=2),  # → (14, 14)
            nn.ReLU(),
            nn.Conv2d(64, 64, 3, stride=2),  # → (6, 6)
            nn.ReLU(),
            nn.Flatten(),
            nn.Linear(64 * 6 * 6, 128),
            nn.ReLU(),
        )
        
    def process_depth(self, depth_image):
        """Encode depth image thành latent vector."""
        # Normalize depth
        depth_norm = (depth_image - depth_image.mean()) / (depth_image.std() + 1e-6)
        depth_tensor = torch.tensor(depth_norm).unsqueeze(0).unsqueeze(0).float()
        
        return self.depth_encoder(depth_tensor)

Reward Design cho Parkour

Multi-Phase Reward

class ParkourReward:
    """Reward function cho parkour tasks."""
    
    def __init__(self, config):
        self.config = config
        self.weights = {
            # Phase 1: Approach
            'approach_velocity': 2.0,      # Chạy đến obstacle
            'heading_alignment': 0.5,      # Hướng đúng
            
            # Phase 2: Takeoff
            'takeoff_velocity': 3.0,       # Tốc độ nhảy
            'takeoff_angle': 1.0,          # Góc nhảy
            'leg_extension': 1.0,          # Duỗi chân
            
            # Phase 3: Flight
            'height_clearance': 5.0,       # Bay qua obstacle
            'body_orientation': 2.0,       # Giữ thẳng người
            
            # Phase 4: Landing
            'landing_stability': 5.0,      # Tiếp đất ổn định
            'impact_reduction': 2.0,       # Giảm va đập
            'recovery_speed': 1.0,         # Nhanh chóng ổn định
            
            # General
            'alive_bonus': 0.5,
            'energy_penalty': -0.01,
            'torque_penalty': -0.001,
        }
    
    def compute(self, state, prev_state):
        reward = 0.0
        
        # Alive bonus
        if not state['fallen']:
            reward += self.weights['alive_bonus']
        
        # Forward velocity tracking
        vel_error = abs(state['forward_vel'] - self.config.target_vel)
        reward += self.weights['approach_velocity'] * np.exp(-vel_error)
        
        # Height clearance (obstacle traversal)
        if state['over_obstacle']:
            clearance = state['foot_height'] - state['obstacle_height']
            if clearance > 0:
                reward += self.weights['height_clearance'] * min(clearance, 0.3)
        
        # Landing stability
        if state['just_landed']:
            # Penalize high impact force
            impact = state['ground_reaction_force']
            max_safe_force = 3.0 * state['robot_mass'] * 9.81  # 3x body weight
            if impact < max_safe_force:
                reward += self.weights['landing_stability']
            else:
                reward -= self.weights['impact_reduction'] * (impact / max_safe_force - 1)
            
            # Reward fast recovery to stable standing
            if state['steps_to_stable'] < 10:
                reward += self.weights['recovery_speed']
        
        # Energy penalty
        reward += self.weights['energy_penalty'] * state['total_power']
        reward += self.weights['torque_penalty'] * np.sum(state['torques'] ** 2)
        
        return reward

Training Pipeline cho Parkour

Obstacle Course Generation

from isaacgym import gymutil, gymapi

class ParkourTerrainGenerator:
    """Generate random obstacle courses cho training."""
    
    def __init__(self, num_envs=4096, terrain_size=20.0):
        self.num_envs = num_envs
        self.terrain_size = terrain_size
        
    def generate_gap_terrain(self, difficulty):
        """Generate terrain with gaps to jump over."""
        heightfield = np.zeros((200, 200))
        gap_width = 0.3 + difficulty * 0.7    # 0.3m → 1.0m
        gap_depth = 0.5 + difficulty * 1.5    # 0.5m → 2.0m
        
        # Place random gaps
        n_gaps = int(3 + difficulty * 5)
        for i in range(n_gaps):
            x_start = np.random.randint(30, 170)
            x_width = int(gap_width / 0.1)
            heightfield[x_start:x_start+x_width, :] = -gap_depth
        
        return heightfield
    
    def generate_stepping_stones(self, difficulty):
        """Stepping stones with increasing gaps."""
        heightfield = np.ones((200, 200)) * -2.0  # Deep pit
        
        stone_size = max(0.4, 0.8 - difficulty * 0.4)   # Smaller stones
        gap = 0.3 + difficulty * 0.5                      # Larger gaps
        
        x = 0
        while x < 20:
            x_idx = int(x / 0.1)
            w_idx = int(stone_size / 0.1)
            
            # Random lateral offset
            y_offset = int(np.random.uniform(-0.3, 0.3) / 0.1)
            y_center = 100 + y_offset
            
            heightfield[x_idx:x_idx+w_idx, 
                        y_center-w_idx:y_center+w_idx] = 0.0
            
            x += stone_size + gap
        
        return heightfield
    
    def generate_boxes(self, difficulty):
        """Boxes to jump on/off."""
        heightfield = np.zeros((200, 200))
        
        n_boxes = int(3 + difficulty * 5)
        for i in range(n_boxes):
            x = np.random.randint(20, 180)
            y = np.random.randint(60, 140)
            w = np.random.randint(5, 15)
            h = 0.2 + difficulty * 0.6  # 0.2m → 0.8m height
            
            heightfield[x:x+w, y:y+w] = h
        
        return heightfield
    
    def curriculum_terrain(self, difficulty):
        """Mix terrain types based on difficulty."""
        terrain_type = np.random.choice(
            ['gaps', 'stones', 'boxes', 'stairs', 'mixed'],
            p=[0.25, 0.2, 0.25, 0.15, 0.15]
        )
        
        generators = {
            'gaps': self.generate_gap_terrain,
            'stones': self.generate_stepping_stones,
            'boxes': self.generate_boxes,
        }
        
        if terrain_type in generators:
            return generators[terrain_type](difficulty)
        
        # Mixed: combine multiple obstacle types
        hf = np.zeros((200, 200))
        for gen in generators.values():
            hf = np.maximum(hf, gen(difficulty * 0.7))
        return hf

Training Config

# Isaac Lab training config cho humanoid parkour
parkour_config = {
    # Environment
    "num_envs": 4096,
    "episode_length": 500,
    "terrain_curriculum": True,
    "initial_difficulty": 0.0,
    "max_difficulty": 1.0,
    "difficulty_increment": 0.05,
    "success_threshold": 0.7,   # 70% success → increase difficulty
    
    # Observation
    "obs_components": [
        "base_lin_vel",         # (3,) base linear velocity
        "base_ang_vel",         # (3,) base angular velocity
        "projected_gravity",    # (3,) gravity in body frame
        "commands",             # (3,) velocity commands
        "dof_pos",              # (N,) joint positions
        "dof_vel",              # (N,) joint velocities
        "actions",              # (N,) previous actions
        "height_scan",          # (M,) terrain heights ahead
    ],
    
    # PPO
    "policy_class": "ActorCriticRecurrent",  # LSTM for temporal reasoning
    "hidden_sizes": [512, 256, 128],
    "learning_rate": 1e-4,
    "num_learning_epochs": 5,
    "num_mini_batches": 4,
    "clip_param": 0.2,
    "entropy_coef": 0.01,
    "value_loss_coef": 1.0,
    "max_grad_norm": 1.0,
    "gamma": 0.99,
    "lam": 0.95,
    
    # Training
    "max_iterations": 5000,
    "save_interval": 500,
    "log_interval": 10,
}

Training visualization for parkour

Multi-Skill Composition

Unified vs Separate Policies

Approach Ưu điểm Nhược điểm
Unified policy Smooth transitions, end-to-end Khó train, slower convergence
Separate policies Easier to train, modular Transition jitter, skill selection needed
Mixture of Experts Best of both — shared backbone, expert heads Complex architecture

Mixture of Experts Implementation

class ParkourMoE(nn.Module):
    """Mixture of Experts cho parkour skills."""
    
    def __init__(self, obs_dim, action_dim, n_experts=4):
        super().__init__()
        self.n_experts = n_experts
        
        # Shared backbone
        self.backbone = nn.Sequential(
            nn.Linear(obs_dim, 512),
            nn.ELU(),
            nn.Linear(512, 256),
            nn.ELU(),
        )
        
        # Expert heads (walk, jump, climb, land)
        self.experts = nn.ModuleList([
            nn.Sequential(
                nn.Linear(256, 128),
                nn.ELU(),
                nn.Linear(128, action_dim),
            ) for _ in range(n_experts)
        ])
        
        # Gating network
        self.gate = nn.Sequential(
            nn.Linear(256, 64),
            nn.ELU(),
            nn.Linear(64, n_experts),
            nn.Softmax(dim=-1),
        )
    
    def forward(self, obs):
        features = self.backbone(obs)
        
        # Expert outputs
        expert_outputs = torch.stack([
            expert(features) for expert in self.experts
        ], dim=1)  # (B, n_experts, action_dim)
        
        # Gating weights
        weights = self.gate(features)  # (B, n_experts)
        
        # Weighted combination
        action = torch.sum(
            expert_outputs * weights.unsqueeze(-1), dim=1
        )  # (B, action_dim)
        
        return action, weights

Current State-of-the-Art và Limitations

Đã đạt được (2026)

Chưa giải quyết

Tài liệu tham khảo

  1. Robot Parkour Learning — Zhuang et al., CoRL 2023
  2. Extreme Parkour with Legged Robots — Cheng et al., ICRA 2024
  3. Agile and Versatile Robot Locomotion via Agility Training — Radosavovic et al., 2024
  4. Humanoid-Gym: Reinforcement Learning for Humanoid Robot — Gu et al., 2024

Bài viết liên quan

Bài viết liên quan

TutorialSim-to-Real cho Humanoid: Deployment Best Practices
sim2realhumanoiddeploymentrlunitreePhần 10

Sim-to-Real cho Humanoid: Deployment Best Practices

Pipeline hoàn chỉnh deploy RL locomotion policy lên robot humanoid thật — domain randomization, system identification, safety, và Unitree SDK.

9/4/202611 phút đọc
Deep DiveMulti-Step Manipulation: Curriculum Learning cho Long-Horizon
rlcurriculum-learninglong-horizonmanipulationPhần 8

Multi-Step Manipulation: Curriculum Learning cho Long-Horizon

Giải quyết manipulation dài hơi bằng RL — curriculum learning, hierarchical RL, skill chaining, và benchmark IKEA furniture assembly.

7/4/202610 phút đọc
Deep DiveHumanoid Manipulation: Whole-body Control với LeRobot
lerobothumanoidwhole-bodyunitreePhần 9

Humanoid Manipulation: Whole-body Control với LeRobot

Robot hình người là nền tảng manipulation tối thượng — high-DOF control, egocentric vision, và humanoid foundation models.

5/4/20269 phút đọc