Unitree H1: Full-size Humanoid Locomotion Training

Unitree H1 là bước nhảy lớn so với G1 — từ 1.27m lên 1.8m, từ 35kg lên 47kg. Sự thay đổi kích thước này không chỉ đơn giản là "scale up" — nó thay đổi hoàn toàn dynamics, yêu cầu reward function mới, PD gains mới, và chiến lược training khác. Trong bài này, chúng ta sẽ train walking policy cho H1 và so sánh chi tiết với G1.

Unitree H1 vs G1: So sánh chi tiết

Thông số	G1	H1	Ảnh hưởng
Chiều cao	1.27 m	1.80 m	CoM cao hơn → khó balance hơn
Cân nặng	35 kg	47 kg	Inertia lớn hơn → phản ứng chậm hơn
DOF chân	12 (6 mỗi bên)	10 (5 mỗi bên)	H1 ít DOF nhưng actuator mạnh hơn
Ankle DOF	2 (roll + pitch)	1 (pitch only)	H1 khó giữ lateral balance
Hip range	±30°	±25°	H1 hạn chế hơn
Max torque (hip)	88 Nm	120 Nm	H1 mạnh hơn nhưng nặng hơn
Max torque (knee)	139 Nm	200 Nm	Cần để hỗ trợ trọng lượng
Max speed	~2 m/s	~3.3 m/s	H1 nhanh hơn khi trained tốt

Thách thức đặc biệt của H1

1. Trọng tâm cao hơn: Với chiều cao 1.8m, moment arm từ CoM đến chân dài hơn nhiều. Cùng một lực đẩy nhỏ sẽ tạo torque lớn hơn → cần phản ứng nhanh hơn để giữ thăng bằng.

2. Ankle design hạn chế: H1 chỉ có 1 DOF ở ankle (pitch) — không có ankle roll. Điều này có nghĩa robot không thể đẩy sang bên bằng cổ chân mà phải dùng hip để bù lateral balance.

3. Longer swing leg: Chân dài hơn có inertia lớn hơn khi swing. Cần nhiều torque hơn và timing chính xác hơn để đạt foot clearance tốt.

Cấu hình H1 trong Isaac Lab

"""
Unitree H1 environment configuration.
File: h1_env_cfg.py
"""
import omni.isaac.lab.sim as sim_utils
from omni.isaac.lab.assets import ArticulationCfg
from omni.isaac.lab.utils import configclass

@configclass
class H1RobotCfg:
    """Unitree H1 robot configuration."""

    robot = ArticulationCfg(
        prim_path="/World/envs/env_.*/Robot",
        spawn=sim_utils.UsdFileCfg(
            usd_path="datasets/robots/unitree/h1/h1.usd",
            activate_contact_sensors=True,
        ),
        init_state=ArticulationCfg.InitialStateCfg(
            pos=(0.0, 0.0, 1.05),       # Cao hơn G1 (0.75)
            joint_pos={
                # === Left leg (5 DOF) ===
                "left_hip_yaw": 0.0,
                "left_hip_roll": 0.0,
                "left_hip_pitch": -0.2,   # Hơi gập hông
                "left_knee": 0.4,          # Gối gập nhiều hơn G1
                "left_ankle": -0.2,        # Chỉ có pitch
                # === Right leg (5 DOF) ===
                "right_hip_yaw": 0.0,
                "right_hip_roll": 0.0,
                "right_hip_pitch": -0.2,
                "right_knee": 0.4,
                "right_ankle": -0.2,
                # === Torso ===
                "torso": 0.0,
                # === Arms (4 DOF mỗi bên) ===
                "left_shoulder_pitch": 0.3,
                "left_shoulder_roll": 0.0,
                "left_shoulder_yaw": 0.0,
                "left_elbow": 0.3,
                "right_shoulder_pitch": 0.3,
                "right_shoulder_roll": 0.0,
                "right_shoulder_yaw": 0.0,
                "right_elbow": 0.3,
            },
        ),
        actuators={
            "legs": sim_utils.DCMotorCfg(
                joint_names_expr=[
                    ".*hip.*", ".*knee.*", ".*ankle.*"
                ],
                stiffness={
                    ".*hip_yaw.*": 150.0,
                    ".*hip_roll.*": 150.0,
                    ".*hip_pitch.*": 200.0,    # Cao hơn vì nặng hơn
                    ".*knee.*": 250.0,
                    ".*ankle.*": 40.0,
                },
                damping={
                    ".*hip_yaw.*": 5.0,
                    ".*hip_roll.*": 5.0,
                    ".*hip_pitch.*": 8.0,
                    ".*knee.*": 10.0,
                    ".*ankle.*": 2.0,
                },
                effort_limit={
                    ".*hip.*": 120.0,          # Nm — mạnh hơn G1
                    ".*knee.*": 200.0,
                    ".*ankle.*": 40.0,
                },
            ),
            "torso": sim_utils.DCMotorCfg(
                joint_names_expr=["torso"],
                stiffness=300.0,
                damping=10.0,
                effort_limit=200.0,
            ),
            "arms": sim_utils.DCMotorCfg(
                joint_names_expr=[".*shoulder.*", ".*elbow.*"],
                stiffness=100.0,
                damping=5.0,
                effort_limit=50.0,
            ),
        },
    )

Điều chỉnh Reward Function cho H1

Thay đổi so với G1

"""
H1-specific reward adjustments.
File: h1_rewards.py
"""

class H1RewardsCfg:
    """Reward cho H1 — điều chỉnh từ G1."""

    # === Target height khác ===
    base_height = {
        "target_height": 0.98,    # G1: 0.72, H1: 0.98
        "weight": -0.8,            # Tăng weight vì CoM cao → quan trọng hơn
    }

    # === Foot clearance cao hơn ===
    foot_clearance = {
        "min_height": 0.08,        # G1: 0.06, H1: 0.08 (chân dài hơn)
        "weight": 0.4,
    }

    # === Lateral balance penalty (vì thiếu ankle roll) ===
    lateral_velocity_penalty = {
        "weight": -0.3,            # Phạt lateral velocity lớn
        # G1 có ankle roll nên không cần phạt mạnh
    }

    # === Hip roll compensation reward ===
    hip_roll_balance = {
        "weight": 0.2,
        # Thưởng khi hip roll bù cho thiếu ankle roll
    }

    # === Stride length reward ===
    stride_length = {
        "target": 0.6,             # G1: 0.4, H1: 0.6 (chân dài hơn)
        "weight": 0.15,
    }

    # === Feet air time dài hơn ===
    feet_air_time = {
        "target": 0.35,            # G1: 0.3, H1: 0.35
        "weight": 0.2,
    }

Implementation đầy đủ

import torch

def compute_h1_rewards(state, action, prev_action, command):
    """
    Reward function tối ưu cho H1.
    Điểm khác biệt chính so với G1 được comment.
    """
    rewards = {}

    # 1. Velocity tracking (giống G1)
    vel_error = torch.sum(
        torch.square(command[:, :2] - state["base_lin_vel"][:, :2]),
        dim=1
    )
    rewards["vel_tracking"] = 1.5 * torch.exp(-vel_error / 0.25)

    # 2. Angular velocity tracking
    yaw_error = torch.square(command[:, 2] - state["base_ang_vel"][:, 2])
    rewards["yaw_tracking"] = 0.8 * torch.exp(-yaw_error / 0.25)

    # 3. Base height (KHÁC G1: target cao hơn, weight lớn hơn)
    height_error = torch.square(state["base_height"] - 0.98)
    rewards["height"] = -0.8 * height_error

    # 4. Upright (giống G1 nhưng quan trọng hơn)
    orientation_error = torch.sum(
        torch.square(state["projected_gravity"][:, :2]), dim=1
    )
    rewards["upright"] = -1.5 * orientation_error  # G1: -1.0

    # 5. Lateral velocity penalty (MỚI cho H1)
    # Vì H1 thiếu ankle roll, lateral stability kém hơn
    lat_vel = torch.abs(state["base_lin_vel"][:, 1])
    rewards["lateral_penalty"] = -0.3 * lat_vel

    # 6. Hip roll balance (MỚI cho H1)
    # Thưởng khi hip roll active bù cho ankle roll thiếu
    hip_roll_activity = torch.abs(state["joint_pos"][:, [1, 6]])  # hip_roll joints
    gravity_lateral = torch.abs(state["projected_gravity"][:, 1])
    # Reward hip roll responding to lateral tilt
    rewards["hip_roll_balance"] = 0.2 * torch.sum(
        hip_roll_activity * gravity_lateral.unsqueeze(1), dim=1
    )

    # 7. Foot clearance (KHÁC G1: cao hơn)
    swing_mask = state["foot_contact"] < 0.5
    foot_height = state["foot_height"]
    clearance = torch.where(
        swing_mask,
        torch.clamp(foot_height - 0.08, min=0.0),  # G1: 0.06
        torch.zeros_like(foot_height)
    )
    rewards["clearance"] = 0.4 * torch.sum(clearance, dim=1)

    # 8. Stride length reward (MỚI)
    stride = torch.abs(
        state["foot_positions"][:, 0, 0] - state["foot_positions"][:, 1, 0]
    )
    stride_error = torch.square(stride - 0.6)
    rewards["stride"] = 0.15 * torch.exp(-stride_error / 0.1)

    # 9-13. Regularization (tương tự G1, scaled cho H1)
    rewards["action_rate"] = -0.01 * torch.sum(
        torch.square(action - prev_action), dim=1
    )
    rewards["torque"] = -3e-5 * torch.sum(
        torch.square(state["torques"]), dim=1
    )
    rewards["joint_accel"] = -1e-4 * torch.sum(
        torch.square(state["joint_accel"]), dim=1
    )
    rewards["termination"] = -200.0 * state["terminated"].float()
    rewards["lin_vel_z"] = -0.5 * torch.square(state["base_lin_vel"][:, 2])

    total = sum(rewards.values())
    return total, rewards

PD Gains Tuning cho H1

PD gains cho H1 cần được tune cẩn thận vì moment of inertia khác:

class H1PDGainsTuner:
    """
    Hướng dẫn tuning PD gains cho H1.
    Nguyên tắc: Kp ~ proportional to mass * gravity * lever_arm
    """

    def compute_recommended_gains(self):
        """Tính PD gains dựa trên physics."""

        # H1 weighs 47kg, G1 weighs 35kg → ratio = 1.34
        mass_ratio = 47.0 / 35.0

        # Leg length ratio: H1 ~0.9m, G1 ~0.6m → ratio = 1.5
        leg_ratio = 0.9 / 0.6

        gains = {
            "hip_pitch": {
                # Hip pitch chịu moment = m*g*L_leg
                # H1 moment = 47 * 9.81 * 0.9 = 415 Nm
                # G1 moment = 35 * 9.81 * 0.6 = 206 Nm
                "kp": 200.0,   # G1: 150 * mass_ratio * leg_ratio ≈ 300, giảm vì limit
                "kd": 8.0,
            },
            "hip_roll": {
                # Quan trọng hơn cho H1 vì thiếu ankle roll
                "kp": 180.0,   # G1: 150, tăng cho lateral stability
                "kd": 6.0,
            },
            "hip_yaw": {
                "kp": 150.0,
                "kd": 5.0,
            },
            "knee": {
                "kp": 250.0,   # G1: 200, tăng vì nặng hơn
                "kd": 10.0,
            },
            "ankle": {
                # Ankle H1 chỉ có pitch → cần stiff hơn
                "kp": 50.0,    # G1: 40
                "kd": 3.0,
            },
        }

        # Print summary
        print("=== H1 PD Gains (recommended) ===")
        for joint, g in gains.items():
            kd_ratio = g["kd"] / g["kp"] * 100
            print(f"  {joint}: Kp={g['kp']:.0f}, Kd={g['kd']:.1f} "
                  f"(Kd/Kp={kd_ratio:.1f}%)")

        return gains

# Output:
# hip_pitch: Kp=200, Kd=8.0 (Kd/Kp=4.0%)
# hip_roll:  Kp=180, Kd=6.0 (Kd/Kp=3.3%)
# hip_yaw:   Kp=150, Kd=5.0 (Kd/Kp=3.3%)
# knee:      Kp=250, Kd=10.0 (Kd/Kp=4.0%)
# ankle:     Kp=50,  Kd=3.0 (Kd/Kp=6.0%)

Training Pipeline cho H1

# Training H1 — cần thời gian lâu hơn G1 (~1.5h trên RTX 4090)
python source/standalone/workflows/rsl_rl/train.py \
    --task=Isaac-Velocity-Flat-H1-v0 \
    --num_envs=4096 \
    --max_iterations=8000 \
    --headless \
    --logger wandb \
    --wandb_project h1-locomotion

So sánh Learning Curves: H1 vs G1

import matplotlib.pyplot as plt
import numpy as np

def compare_learning_curves():
    """So sánh quá trình training H1 vs G1."""

    # Simulated data (typical results)
    iterations = np.arange(0, 8000, 100)

    # G1 converges faster
    g1_reward = 20 * (1 - np.exp(-iterations / 1500)) + \
                np.random.normal(0, 0.5, len(iterations))
    g1_reward = np.clip(g1_reward, -50, 25)

    # H1 slower initial learning, higher final performance
    h1_reward = 22 * (1 - np.exp(-iterations / 2500)) + \
                np.random.normal(0, 0.8, len(iterations))
    h1_reward = np.clip(h1_reward, -50, 25)

    fig, axes = plt.subplots(1, 3, figsize=(18, 5))

    # Plot 1: Reward
    axes[0].plot(iterations, g1_reward, label="G1", color="blue", alpha=0.7)
    axes[0].plot(iterations, h1_reward, label="H1", color="red", alpha=0.7)
    axes[0].set_xlabel("Iteration")
    axes[0].set_ylabel("Total Reward")
    axes[0].set_title("Learning Curve Comparison")
    axes[0].legend()
    axes[0].grid(True, alpha=0.3)

    # Plot 2: Episode length
    g1_ep = 20 * (1 - np.exp(-iterations / 800))
    h1_ep = 20 * (1 - np.exp(-iterations / 1200))
    axes[1].plot(iterations, g1_ep, label="G1", color="blue")
    axes[1].plot(iterations, h1_ep, label="H1", color="red")
    axes[1].set_xlabel("Iteration")
    axes[1].set_ylabel("Episode Length (s)")
    axes[1].set_title("Survival Time")
    axes[1].legend()

    # Plot 3: Max velocity achieved
    g1_vel = 1.5 * (1 - np.exp(-iterations / 2000))
    h1_vel = 2.0 * (1 - np.exp(-iterations / 3000))
    axes[2].plot(iterations, g1_vel, label="G1 (max 2.0)", color="blue")
    axes[2].plot(iterations, h1_vel, label="H1 (max 3.3)", color="red")
    axes[2].set_xlabel("Iteration")
    axes[2].set_ylabel("Max Velocity (m/s)")
    axes[2].set_title("Velocity Achievement")
    axes[2].legend()

    plt.tight_layout()
    plt.savefig("h1_vs_g1_learning.png", dpi=150)

compare_learning_curves()

Key observations

Metric	G1	H1	Giải thích
First stable walk	~500 iter	~800 iter	H1 khó balance hơn
Convergence	~3000 iter	~5000 iter	H1 cần nhiều exploration hơn
Max velocity	~2.0 m/s	~3.3 m/s	H1 stride dài hơn
CoT (Cost of Transport)	0.65	0.72	H1 tốn energy hơn (nặng hơn)
Fall recovery	Trung bình	Khó	CoM cao → khó recover

Để tìm hiểu thêm về landscape của humanoid robots, xem bài Humanoid Robotics Landscape. Về RL locomotion cho quadruped (bài toán đơn giản hơn), xem Quadruped RL Locomotion.

Tổng kết

Training H1 khác G1 ở nhiều điểm quan trọng:

Chiều cao 1.8m → CoM cao → cần reward upright mạnh hơn, PD gains lớn hơn
Thiếu ankle roll → phải bù bằng hip roll → thêm reward term mới
Chân dài hơn → stride 0.6m (G1: 0.4m), foot clearance 8cm (G1: 6cm)
Training chậm hơn → 8000 iterations (~1.5h) so với 5000 iterations (1h) cho G1
Max velocity cao hơn → 3.3 m/s (G1: 2.0 m/s) khi trained tốt

Bài tiếp theo — Unitree H1: Running, Turning và Dynamic Motions — sẽ đẩy H1 đến giới hạn: chạy, quay, và các động tác dynamic.

Tài liệu tham khảo

Unitree H1 Technical Documentation — Unitree Robotics, 2024
Expressive Whole-Body Control for Humanoid Robots — Cheng et al., RSS 2024
Humanoid-Gym: Zero-Shot Sim-to-Real Transfer — Gu et al., 2024
Learning Humanoid Locomotion with Transformers — Radosavovic et al., 2024