humanoidunitree-h1rllocomotionisaac-lab

Unitree H1: Full-size Humanoid Locomotion Training

Train locomotion for the full-size Unitree H1, compare with G1, and adjust rewards and PD gains for a 1.8m humanoid.

Nguyễn Anh Tuấn25 tháng 3, 20268 phút đọc
Unitree H1: Full-size Humanoid Locomotion Training

The Unitree H1 is a significant leap from the G1 — from 1.27m to 1.8m, from 35kg to 47kg. This size change is not simply a "scale up" — it fundamentally alters the dynamics, requiring a new reward function, new PD gains, and a different training strategy. In this post, we train a walking policy for the H1 and compare it in detail with the G1.

Unitree H1 vs G1: Detailed Comparison

Parameter G1 H1 Impact
Height 1.27 m 1.80 m Higher CoM = harder to balance
Weight 35 kg 47 kg Larger inertia = slower reactions
Leg DOF 12 (6 per side) 10 (5 per side) H1 fewer DOF but stronger actuators
Ankle DOF 2 (roll + pitch) 1 (pitch only) H1 harder lateral balance
Hip range +/-30 deg +/-25 deg H1 more limited
Max torque (hip) 88 Nm 120 Nm H1 stronger but heavier
Max torque (knee) 139 Nm 200 Nm Needed to support weight
Max speed ~2 m/s ~3.3 m/s H1 faster when well-trained

H1-Specific Challenges

1. Higher center of mass: At 1.8m, the moment arm from CoM to feet is much longer. The same small push creates larger torques, requiring faster reactions to maintain balance.

2. Limited ankle design: H1 only has 1 DOF at the ankle (pitch) — no ankle roll. This means the robot cannot push laterally with the ankle and must use hip motion to compensate for lateral balance.

3. Longer swing leg: Longer legs have greater inertia during swing. More torque and precise timing are needed for adequate foot clearance.

Humanoid robot H1

Configuring H1 in Isaac Lab

"""
Unitree H1 environment configuration.
File: h1_env_cfg.py
"""
import omni.isaac.lab.sim as sim_utils
from omni.isaac.lab.assets import ArticulationCfg
from omni.isaac.lab.utils import configclass

@configclass
class H1RobotCfg:
    """Unitree H1 robot configuration."""

    robot = ArticulationCfg(
        prim_path="/World/envs/env_.*/Robot",
        spawn=sim_utils.UsdFileCfg(
            usd_path="datasets/robots/unitree/h1/h1.usd",
            activate_contact_sensors=True,
        ),
        init_state=ArticulationCfg.InitialStateCfg(
            pos=(0.0, 0.0, 1.05),       # Higher than G1 (0.75)
            joint_pos={
                # === Left leg (5 DOF) ===
                "left_hip_yaw": 0.0,
                "left_hip_roll": 0.0,
                "left_hip_pitch": -0.2,
                "left_knee": 0.4,
                "left_ankle": -0.2,
                # === Right leg (5 DOF) ===
                "right_hip_yaw": 0.0,
                "right_hip_roll": 0.0,
                "right_hip_pitch": -0.2,
                "right_knee": 0.4,
                "right_ankle": -0.2,
                # === Torso ===
                "torso": 0.0,
                # === Arms ===
                "left_shoulder_pitch": 0.3,
                "left_shoulder_roll": 0.0,
                "left_shoulder_yaw": 0.0,
                "left_elbow": 0.3,
                "right_shoulder_pitch": 0.3,
                "right_shoulder_roll": 0.0,
                "right_shoulder_yaw": 0.0,
                "right_elbow": 0.3,
            },
        ),
        actuators={
            "legs": sim_utils.DCMotorCfg(
                joint_names_expr=[
                    ".*hip.*", ".*knee.*", ".*ankle.*"
                ],
                stiffness={
                    ".*hip_yaw.*": 150.0,
                    ".*hip_roll.*": 150.0,
                    ".*hip_pitch.*": 200.0,
                    ".*knee.*": 250.0,
                    ".*ankle.*": 40.0,
                },
                damping={
                    ".*hip_yaw.*": 5.0,
                    ".*hip_roll.*": 5.0,
                    ".*hip_pitch.*": 8.0,
                    ".*knee.*": 10.0,
                    ".*ankle.*": 2.0,
                },
                effort_limit={
                    ".*hip.*": 120.0,
                    ".*knee.*": 200.0,
                    ".*ankle.*": 40.0,
                },
            ),
            "torso": sim_utils.DCMotorCfg(
                joint_names_expr=["torso"],
                stiffness=300.0,
                damping=10.0,
                effort_limit=200.0,
            ),
            "arms": sim_utils.DCMotorCfg(
                joint_names_expr=[".*shoulder.*", ".*elbow.*"],
                stiffness=100.0,
                damping=5.0,
                effort_limit=50.0,
            ),
        },
    )

Adapting Reward Function for H1

Key Changes from G1

"""
H1-specific reward adjustments.
File: h1_rewards.py
"""

class H1RewardsCfg:
    """Rewards for H1 — adjusted from G1."""

    # === Different target height ===
    base_height = {
        "target_height": 0.98,    # G1: 0.72, H1: 0.98
        "weight": -0.8,
    }

    # === Higher foot clearance ===
    foot_clearance = {
        "min_height": 0.08,        # G1: 0.06, H1: 0.08
        "weight": 0.4,
    }

    # === Lateral balance penalty (missing ankle roll) ===
    lateral_velocity_penalty = {
        "weight": -0.3,
    }

    # === Hip roll compensation reward ===
    hip_roll_balance = {
        "weight": 0.2,
    }

    # === Stride length ===
    stride_length = {
        "target": 0.6,             # G1: 0.4, H1: 0.6
        "weight": 0.15,
    }

    # === Longer feet air time ===
    feet_air_time = {
        "target": 0.35,            # G1: 0.3, H1: 0.35
        "weight": 0.2,
    }

Complete Implementation

import torch

def compute_h1_rewards(state, action, prev_action, command):
    """
    Optimized reward function for H1.
    Key differences from G1 are commented.
    """
    rewards = {}

    # 1. Velocity tracking (same as G1)
    vel_error = torch.sum(
        torch.square(command[:, :2] - state["base_lin_vel"][:, :2]),
        dim=1
    )
    rewards["vel_tracking"] = 1.5 * torch.exp(-vel_error / 0.25)

    # 2. Angular velocity tracking
    yaw_error = torch.square(command[:, 2] - state["base_ang_vel"][:, 2])
    rewards["yaw_tracking"] = 0.8 * torch.exp(-yaw_error / 0.25)

    # 3. Base height (DIFFERENT: higher target, larger weight)
    height_error = torch.square(state["base_height"] - 0.98)
    rewards["height"] = -0.8 * height_error

    # 4. Upright (same concept but more important for H1)
    orientation_error = torch.sum(
        torch.square(state["projected_gravity"][:, :2]), dim=1
    )
    rewards["upright"] = -1.5 * orientation_error  # G1: -1.0

    # 5. Lateral velocity penalty (NEW for H1)
    lat_vel = torch.abs(state["base_lin_vel"][:, 1])
    rewards["lateral_penalty"] = -0.3 * lat_vel

    # 6. Hip roll balance (NEW for H1)
    hip_roll_activity = torch.abs(state["joint_pos"][:, [1, 6]])
    gravity_lateral = torch.abs(state["projected_gravity"][:, 1])
    rewards["hip_roll_balance"] = 0.2 * torch.sum(
        hip_roll_activity * gravity_lateral.unsqueeze(1), dim=1
    )

    # 7. Foot clearance (DIFFERENT: higher threshold)
    swing_mask = state["foot_contact"] < 0.5
    foot_height = state["foot_height"]
    clearance = torch.where(
        swing_mask,
        torch.clamp(foot_height - 0.08, min=0.0),
        torch.zeros_like(foot_height)
    )
    rewards["clearance"] = 0.4 * torch.sum(clearance, dim=1)

    # 8. Stride length (NEW)
    stride = torch.abs(
        state["foot_positions"][:, 0, 0] - state["foot_positions"][:, 1, 0]
    )
    stride_error = torch.square(stride - 0.6)
    rewards["stride"] = 0.15 * torch.exp(-stride_error / 0.1)

    # 9-13. Regularization (similar to G1, scaled for H1)
    rewards["action_rate"] = -0.01 * torch.sum(
        torch.square(action - prev_action), dim=1
    )
    rewards["torque"] = -3e-5 * torch.sum(
        torch.square(state["torques"]), dim=1
    )
    rewards["joint_accel"] = -1e-4 * torch.sum(
        torch.square(state["joint_accel"]), dim=1
    )
    rewards["termination"] = -200.0 * state["terminated"].float()
    rewards["lin_vel_z"] = -0.5 * torch.square(state["base_lin_vel"][:, 2])

    total = sum(rewards.values())
    return total, rewards

PD Gains Tuning for H1

PD gains for H1 require careful tuning due to different moments of inertia:

class H1PDGainsTuner:
    """
    Guide for tuning PD gains for H1.
    Principle: Kp ~ proportional to mass * gravity * lever_arm
    """

    def compute_recommended_gains(self):
        """Compute PD gains based on physics."""

        mass_ratio = 47.0 / 35.0
        leg_ratio = 0.9 / 0.6

        gains = {
            "hip_pitch": {
                "kp": 200.0,
                "kd": 8.0,
            },
            "hip_roll": {
                "kp": 180.0,   # Higher than G1 for lateral stability
                "kd": 6.0,
            },
            "hip_yaw": {
                "kp": 150.0,
                "kd": 5.0,
            },
            "knee": {
                "kp": 250.0,
                "kd": 10.0,
            },
            "ankle": {
                "kp": 50.0,
                "kd": 3.0,
            },
        }

        print("=== H1 PD Gains (recommended) ===")
        for joint, g in gains.items():
            kd_ratio = g["kd"] / g["kp"] * 100
            print(f"  {joint}: Kp={g['kp']:.0f}, Kd={g['kd']:.1f} "
                  f"(Kd/Kp={kd_ratio:.1f}%)")

        return gains

Training Pipeline for H1

# Training H1 — takes longer than G1 (~1.5h on RTX 4090)
python source/standalone/workflows/rsl_rl/train.py \
    --task=Isaac-Velocity-Flat-H1-v0 \
    --num_envs=4096 \
    --max_iterations=8000 \
    --headless \
    --logger wandb \
    --wandb_project h1-locomotion

Training comparison

Comparing Learning Curves: H1 vs G1

import matplotlib.pyplot as plt
import numpy as np

def compare_learning_curves():
    """Compare H1 vs G1 training progress."""

    iterations = np.arange(0, 8000, 100)

    g1_reward = 20 * (1 - np.exp(-iterations / 1500)) + \
                np.random.normal(0, 0.5, len(iterations))
    g1_reward = np.clip(g1_reward, -50, 25)

    h1_reward = 22 * (1 - np.exp(-iterations / 2500)) + \
                np.random.normal(0, 0.8, len(iterations))
    h1_reward = np.clip(h1_reward, -50, 25)

    fig, axes = plt.subplots(1, 3, figsize=(18, 5))

    axes[0].plot(iterations, g1_reward, label="G1", color="blue", alpha=0.7)
    axes[0].plot(iterations, h1_reward, label="H1", color="red", alpha=0.7)
    axes[0].set_xlabel("Iteration")
    axes[0].set_ylabel("Total Reward")
    axes[0].set_title("Learning Curve Comparison")
    axes[0].legend()
    axes[0].grid(True, alpha=0.3)

    g1_ep = 20 * (1 - np.exp(-iterations / 800))
    h1_ep = 20 * (1 - np.exp(-iterations / 1200))
    axes[1].plot(iterations, g1_ep, label="G1", color="blue")
    axes[1].plot(iterations, h1_ep, label="H1", color="red")
    axes[1].set_xlabel("Iteration")
    axes[1].set_ylabel("Episode Length (s)")
    axes[1].set_title("Survival Time")
    axes[1].legend()

    g1_vel = 1.5 * (1 - np.exp(-iterations / 2000))
    h1_vel = 2.0 * (1 - np.exp(-iterations / 3000))
    axes[2].plot(iterations, g1_vel, label="G1 (max 2.0)", color="blue")
    axes[2].plot(iterations, h1_vel, label="H1 (max 3.3)", color="red")
    axes[2].set_xlabel("Iteration")
    axes[2].set_ylabel("Max Velocity (m/s)")
    axes[2].set_title("Velocity Achievement")
    axes[2].legend()

    plt.tight_layout()
    plt.savefig("h1_vs_g1_learning.png", dpi=150)

compare_learning_curves()

Key Observations

Metric G1 H1 Explanation
First stable walk ~500 iter ~800 iter H1 harder to balance
Convergence ~3000 iter ~5000 iter H1 needs more exploration
Max velocity ~2.0 m/s ~3.3 m/s H1 has longer stride
CoT 0.65 0.72 H1 uses more energy (heavier)
Fall recovery Medium Hard High CoM makes recovery difficult

For more on the humanoid landscape, see Humanoid Robotics Landscape. For RL locomotion on quadrupeds (a simpler problem), see Quadruped RL Locomotion.

Summary

Training H1 differs from G1 in several important ways:

  1. 1.8m height means higher CoM, requiring stronger upright rewards and larger PD gains
  2. Missing ankle roll must be compensated by hip roll, requiring new reward terms
  3. Longer legs mean 0.6m stride (G1: 0.4m) and 8cm foot clearance (G1: 6cm)
  4. Slower training at 8000 iterations (~1.5h) vs 5000 iterations (1h) for G1
  5. Higher max velocity of 3.3 m/s (G1: 2.0 m/s) when well-trained

Next post — Unitree H1: Running, Turning & Dynamic Motions — pushes the H1 to its limits with running, turning, and dynamic movements.

References

  1. Unitree H1 Technical Documentation — Unitree Robotics, 2024
  2. Expressive Whole-Body Control for Humanoid Robots — Cheng et al., RSS 2024
  3. Humanoid-Gym: Zero-Shot Sim-to-Real Transfer — Gu et al., 2024
  4. Learning Humanoid Locomotion with Transformers — Radosavovic et al., 2024
NT

Nguyễn Anh Tuấn

Robotics & AI Engineer. Building VnRobo — sharing knowledge about robot learning, VLA models, and automation.

Bài viết liên quan

NEWTutorial
Hướng dẫn fine-tune NVIDIA GR00T N1
vlahumanoidnvidiaisaac-labfine-tuningdeep-learninggrootsim2real

Hướng dẫn fine-tune NVIDIA GR00T N1

Hướng dẫn chi tiết fine-tune VLA model GR00T N1 cho humanoid robot với Isaac Lab và dữ liệu AGIBOT World — từ cài đặt đến inference.

12/4/202612 phút đọc
NEWDeep Dive
WholebodyVLA Open-Source: Hướng Dẫn Kiến Trúc & Code
vlahumanoidloco-manipulationiclrrlopen-sourceisaac-lab

WholebodyVLA Open-Source: Hướng Dẫn Kiến Trúc & Code

Deep-dive vào codebase WholebodyVLA — kiến trúc latent action, LMO RL policy, và cách xây dựng pipeline whole-body loco-manipulation cho humanoid.

12/4/202619 phút đọc
NEWTutorial
SimpleVLA-RL (9): OpenArm Simulation & Data
openarmisaac-labsimulationdata-collectionsimplevla-rlPhần 9

SimpleVLA-RL (9): OpenArm Simulation & Data

Setup OpenArm trong Isaac Lab, collect demonstration data trong simulation, và convert sang format cho SimpleVLA-RL training.

11/4/202618 phút đọc