Unitree H1: Running, Turning và Dynamic Motions

Walking là nền tảng, nhưng một humanoid thực sự hữu ích cần nhiều hơn thế. Trong bài này, chúng ta sẽ đẩy Unitree H1 vượt qua giới hạn walking để đạt được running (2+ m/s), sharp turning, lateral walking, backward walking, và một command-conditioned policy có thể chuyển đổi linh hoạt giữa các gaits.

Tiếp nối từ bài H1 basic training, bài này tập trung vào dynamic motions — nơi mà physics và control trở nên thực sự thú vị.

Từ Walking sang Running: Thay đổi gì?

Physics của running

Walking và running khác nhau về cơ bản trong giai đoạn bay (flight phase):

Đặc điểm	Walking	Running
Flight phase	Không có (ít nhất 1 chân chạm đất)	Có (cả 2 chân rời đất)
Ground contact	~60% thời gian mỗi chân	~30-40% thời gian
Peak force	~1.2x body weight	~2.5-3x body weight
Speed	0-1.5 m/s	1.5-3.3+ m/s
Froude number	< 1	> 1
Energy mode	Inverted pendulum	Spring-mass

import torch
import numpy as np

class RunningRewardExtension:
    """
    Mở rộng reward function cho running gaits.
    """

    def compute_running_rewards(self, state, command):
        rewards = {}

        # 1. Velocity tracking mở rộng range
        # Walking: max 1.5 m/s → Running: max 3.5 m/s
        vel_error = torch.sum(
            torch.square(command[:, :2] - state["base_lin_vel"][:, :2]),
            dim=1
        )
        # Sigma lớn hơn cho high-speed tracking
        sigma = torch.where(
            torch.abs(command[:, 0]) > 1.5,
            torch.tensor(0.5),   # Running: tolerance lớn hơn
            torch.tensor(0.25),  # Walking: tolerance chặt
        )
        rewards["vel_tracking"] = torch.exp(-vel_error / sigma)

        # 2. Flight phase reward
        # Khi command > 2 m/s, khuyến khích flight phase
        both_feet_air = (
            (state["foot_contact"][:, 0] < 0.5) &
            (state["foot_contact"][:, 1] < 0.5)
        ).float()
        running_command = (torch.abs(command[:, 0]) > 2.0).float()
        rewards["flight_phase"] = 0.3 * running_command * both_feet_air

        # 3. Ground reaction force symmetry
        # Running cần GRF peaks cao và symmetric
        left_grf = state["contact_forces"][:, 0, 2]   # z-component
        right_grf = state["contact_forces"][:, 1, 2]
        grf_symmetry = 1.0 - torch.abs(
            left_grf.max() - right_grf.max()
        ) / (left_grf.max() + right_grf.max() + 1e-6)
        rewards["grf_symmetry"] = 0.1 * grf_symmetry

        # 4. Knee bend during stance (energy storage)
        # Running dùng elastic energy → knee phải bend nhiều hơn
        stance_mask = state["foot_contact"] > 0.5
        knee_bend = torch.abs(state["joint_pos"][:, [3, 8]])  # knee joints
        running_knee_reward = stance_mask * torch.clamp(
            knee_bend - 0.2, min=0.0
        )
        rewards["knee_bend"] = 0.1 * torch.sum(running_knee_reward, dim=1)

        return rewards

Command-Conditioned Multi-Gait Policy

Thay vì train nhiều policy riêng, chúng ta train một policy duy nhất có thể walk, run, turn, và đi ngang — tùy thuộc vào velocity command.

Extended Command Space

class ExtendedVelocityCommand:
    """
    Extended velocity command cho multi-gait policy.
    """

    def __init__(self):
        # Command ranges
        self.ranges = {
            "lin_vel_x": (-0.5, 3.5),    # Backward → Running
            "lin_vel_y": (-0.8, 0.8),     # Lateral walking
            "ang_vel_z": (-1.5, 1.5),     # Sharp turning
        }

        # Curriculum: dần mở rộng range
        self.curriculum_factor = 0.0       # 0→1 over training

    def sample_command(self, num_envs, device="cuda"):
        """Sample random velocity commands."""
        # Áp dụng curriculum
        factor = self.curriculum_factor

        commands = torch.zeros(num_envs, 3, device=device)

        # Forward/backward velocity
        commands[:, 0] = torch.empty(num_envs, device=device).uniform_(
            self.ranges["lin_vel_x"][0] * factor,
            self.ranges["lin_vel_x"][1] * factor + (1 - factor) * 1.0,
        )

        # Lateral velocity
        commands[:, 1] = torch.empty(num_envs, device=device).uniform_(
            self.ranges["lin_vel_y"][0] * factor,
            self.ranges["lin_vel_y"][1] * factor,
        )

        # Yaw rate
        commands[:, 2] = torch.empty(num_envs, device=device).uniform_(
            self.ranges["ang_vel_z"][0] * factor,
            self.ranges["ang_vel_z"][1] * factor,
        )

        # 20% chance of zero command (standing)
        zero_mask = torch.rand(num_envs, device=device) < 0.2
        commands[zero_mask] = 0.0

        return commands

    def update_curriculum(self, iteration, total_iterations=10000):
        """Tăng dần command range."""
        self.curriculum_factor = min(iteration / (total_iterations * 0.5), 1.0)

Multi-Gait Reward Function

class MultiGaitReward:
    """
    Reward function thích ứng theo tốc độ command.
    Low speed → walking rewards. High speed → running rewards.
    """

    def compute(self, state, action, prev_action, command):
        rewards = {}
        cmd_speed = torch.abs(command[:, 0])

        # === Velocity tracking (universal) ===
        vel_error = torch.sum(
            torch.square(command[:, :2] - state["base_lin_vel"][:, :2]),
            dim=1
        )
        rewards["vel_tracking"] = 1.5 * torch.exp(-vel_error / 0.25)

        # === Yaw tracking ===
        yaw_error = torch.square(command[:, 2] - state["base_ang_vel"][:, 2])
        rewards["yaw_tracking"] = 0.8 * torch.exp(-yaw_error / 0.25)

        # === Speed-adaptive rewards ===

        # Contact pattern: walk = alternating, run = flight phases
        left_c = state["foot_contact"][:, 0]
        right_c = state["foot_contact"][:, 1]
        both_air = (1 - left_c) * (1 - right_c)

        # Walking (< 1.5 m/s): penalize both feet in air
        walk_mask = (cmd_speed < 1.5).float()
        rewards["walk_contact"] = -0.5 * walk_mask * both_air

        # Running (> 2.0 m/s): reward flight phase
        run_mask = (cmd_speed > 2.0).float()
        rewards["run_flight"] = 0.3 * run_mask * both_air

        # === Foot clearance (adaptive) ===
        target_clearance = torch.where(
            cmd_speed > 2.0,
            torch.tensor(0.12),   # Running: higher steps
            torch.tensor(0.08),   # Walking: normal
        )
        swing = state["foot_contact"] < 0.5
        clearance = torch.where(
            swing,
            torch.clamp(state["foot_height"] - target_clearance.unsqueeze(1), min=0.0),
            torch.zeros_like(state["foot_height"]),
        )
        rewards["clearance"] = 0.3 * torch.sum(clearance, dim=1)

        # === Feet air time (adaptive) ===
        target_air = torch.where(
            cmd_speed > 2.0,
            torch.tensor(0.25),   # Running: shorter
            torch.tensor(0.35),   # Walking: longer
        )
        air_error = torch.abs(state["feet_air_time"] - target_air.unsqueeze(1))
        rewards["air_time"] = 0.2 * torch.sum(
            torch.exp(-air_error / 0.1), dim=1
        )

        # === Turning rewards ===
        cmd_yaw = torch.abs(command[:, 2])
        turning_mask = (cmd_yaw > 0.5).float()

        # Allow lower forward speed during sharp turns
        vel_x_error_turning = torch.abs(
            state["base_lin_vel"][:, 0] - command[:, 0] * 0.7
        )
        rewards["turning_vel"] = 0.3 * turning_mask * torch.exp(
            -vel_x_error_turning
        )

        # === Regularization ===
        rewards["action_rate"] = -0.01 * torch.sum(
            torch.square(action - prev_action), dim=1
        )
        rewards["torque"] = -3e-5 * torch.sum(
            torch.square(state["torques"]), dim=1
        )
        rewards["upright"] = -1.5 * torch.sum(
            torch.square(state["projected_gravity"][:, :2]), dim=1
        )
        rewards["termination"] = -200.0 * state["terminated"].float()

        total = sum(rewards.values())
        return total, rewards

Sharp Turning

Turning là kỹ năng khó cho humanoid vì nó yêu cầu coordinated hip yaw và lateral weight shift đồng thời:

class TurningAnalysis:
    """Phân tích turning performance."""

    def evaluate_turning(self, env, policy, yaw_rates=[0.5, 1.0, 1.5]):
        """Đánh giá turning ở nhiều yaw rates."""

        results = {}
        for target_yaw in yaw_rates:
            # Command: forward 0.5 m/s + turn
            command = torch.tensor([[0.5, 0.0, target_yaw]])

            actual_yaws = []
            turning_radii = []
            obs = env.reset()

            for step in range(500):  # 10s
                action = policy(obs)
                obs, _, done, info = env.step(action)

                actual_yaw = info["base_ang_vel"][:, 2].mean().item()
                actual_yaws.append(actual_yaw)

                # Tính turning radius
                linear_vel = info["base_lin_vel"][:, 0].mean().item()
                if abs(actual_yaw) > 0.01:
                    radius = abs(linear_vel / actual_yaw)
                    turning_radii.append(radius)

            results[target_yaw] = {
                "tracking_error": abs(
                    np.mean(actual_yaws[-100:]) - target_yaw
                ),
                "avg_turning_radius": np.mean(turning_radii[-100:])
                    if turning_radii else float('inf'),
                "stability": np.std(actual_yaws[-100:]),
            }

        # Print results
        print(f"{'Target (rad/s)':<16} {'Error':>8} {'Radius (m)':>11} {'Stability':>10}")
        for yaw, r in results.items():
            print(f"{yaw:<16.1f} {r['tracking_error']:>8.3f} "
                  f"{r['avg_turning_radius']:>10.2f}m {r['stability']:>10.4f}")

        return results

Adversarial Motion Prior (AMP)

Để đạt được dáng chạy tự nhiên, chúng ta có thể dùng AMP — sử dụng motion capture data làm prior:

class AMPReward:
    """
    Adversarial Motion Prior cho natural-looking gaits.
    Discriminator phân biệt policy motion vs reference motion.
    """

    def __init__(self, reference_motions, obs_dim):
        import torch.nn as nn

        # Discriminator network
        self.discriminator = nn.Sequential(
            nn.Linear(obs_dim * 2, 1024),  # current + next state
            nn.ReLU(),
            nn.Linear(1024, 512),
            nn.ReLU(),
            nn.Linear(512, 1),
        )

        # Load reference motion clips
        # Format: (num_clips, num_frames, obs_dim)
        self.reference_motions = reference_motions
        self.optimizer = torch.optim.Adam(
            self.discriminator.parameters(), lr=1e-5
        )

    def compute_style_reward(self, current_obs, next_obs):
        """
        Tính AMP style reward.
        High reward = policy motion looks like reference.
        """
        # Concatenate transition
        transition = torch.cat([current_obs, next_obs], dim=1)

        # Discriminator output
        with torch.no_grad():
            d_output = self.discriminator(transition)

        # AMP reward = -log(1 - sigmoid(D))
        style_reward = -torch.log(
            1 - torch.sigmoid(d_output) + 1e-6
        ).squeeze()

        return style_reward

    def update_discriminator(self, policy_transitions, reference_transitions):
        """
        Train discriminator để phân biệt policy vs reference.
        Policy = "fake", Reference = "real".
        """
        # Real transitions from reference motions
        real_output = self.discriminator(reference_transitions)
        # Fake transitions from policy
        fake_output = self.discriminator(policy_transitions.detach())

        # GAN loss (least-squares GAN)
        real_loss = torch.mean(torch.square(real_output - 1))
        fake_loss = torch.mean(torch.square(fake_output))
        loss = 0.5 * (real_loss + fake_loss)

        # Gradient penalty
        alpha = torch.rand(reference_transitions.shape[0], 1,
                          device=reference_transitions.device)
        interp = alpha * reference_transitions + (1 - alpha) * policy_transitions
        interp.requires_grad_(True)
        interp_output = self.discriminator(interp)
        grad = torch.autograd.grad(
            interp_output, interp,
            grad_outputs=torch.ones_like(interp_output),
            create_graph=True
        )[0]
        gp = torch.mean(torch.square(grad.norm(dim=1) - 1))
        loss += 10.0 * gp

        self.optimizer.zero_grad()
        loss.backward()
        self.optimizer.step()

        return loss.item()

Emergent Behaviors

Với multi-gait policy, robot thường phát triển các hành vi emergent — không được thiết kế trực tiếp nhưng xuất hiện tự nhiên:

Behavior	Điều kiện xuất hiện	Giải thích
Arm swing	Running > 2 m/s	Policy dùng arm momentum để balance
Head bob	Walking 0.5-1.0 m/s	Vertical oscillation tự nhiên
Foot rotation	Sharp turning	Pivot foot rotates để giảm friction
Stance widening	Lateral walking	Wider stance cho stability
Knee bend deepening	Running acceleration	Energy storage trong stance phase

Full Training Config

# Multi-gait training — ~3h trên RTX 4090
python source/standalone/workflows/rsl_rl/train.py \
    --task=Isaac-Velocity-Rough-H1-MultiGait-v0 \
    --num_envs=4096 \
    --max_iterations=15000 \
    --headless \
    --logger wandb \
    --wandb_project h1-multigait

# Evaluate multi-gait
python source/standalone/workflows/rsl_rl/play.py \
    --task=Isaac-Velocity-Rough-H1-MultiGait-v0 \
    --num_envs=4 \
    --checkpoint=logs/h1_multigait/model_15000.pt

Để tìm hiểu thêm về các phương pháp control cho humanoid, xem Humanoid Control Methods. Về parkour với legged robots, xem Parkour Learning.

Tổng kết

Trong bài này, chúng ta đã mở rộng H1 policy sang dynamic motions:

Running gaits với flight phase, GRF management, và elastic knee energy
Command-conditioned policy cho walk/run/turn/lateral trong một policy duy nhất
Sharp turning với coordinated hip yaw và lateral weight shift
AMP cho natural-looking gaits sử dụng motion capture reference
Emergent behaviors như arm swing và stance widening

Bài tiếp theo — Unitree H1-2: Enhanced Locomotion — sẽ khám phá H1-2 với hardware mới và loco-manipulation basics.

Tài liệu tham khảo

AMP: Adversarial Motion Priors for Stylized Physics-Based Character Animation — Peng et al., SIGGRAPH 2021
Expressive Whole-Body Control for Humanoid Robots — Cheng et al., RSS 2024
Walk These Ways: Tuning Robot Control for Generalization — Margolis & Agrawal, CoRL 2023
Learning Humanoid Locomotion with Transformers — Radosavovic et al., 2024