← Back to Blog
humanoidunitree-h1-2locomotionrldexterous

Unitree H1-2: Enhanced Locomotion with New Hardware

Explore H1-2 with dexterous hands and improved actuators, how hardware changes affect locomotion, and loco-manipulation basics.

Nguyễn Anh Tuấn31 tháng 3, 20267 min read
Unitree H1-2: Enhanced Locomotion with New Hardware

The Unitree H1-2 is a significant evolution from the H1 — not only improving locomotion but adding dexterous hands that transform it into a complete loco-manipulation platform. In this post, we analyze the hardware changes, adapt the locomotion policy, and begin co-training locomotion with arm tasks.

Building on H1 dynamic motions, this post extends into whole-body coordination.

H1-2 Hardware: What Changed from H1?

Parameter H1 H1-2 Improvement
Height 1.80 m 1.80 m Same
Weight 47 kg 52 kg +5kg (arms + hands)
Total DOF 19 35+ +16 DOF (hands)
Hands Simple gripper Dexterous 5-finger 6 DOF per hand
Wrist DOF 0 2 (roll + pitch) Increased dexterity
Arm actuators Standard Upgraded Torque + bandwidth
Leg actuators Standard Improved +15% torque
Sensors IMU, encoders + force/torque sensors At wrist
Payload 5 kg 8 kg Carry heavier objects

Why Hardware Changes Affect Locomotion

1. Added 5kg arm/hand mass:

2. Arm dynamics coupling:

3. Improved leg actuators:

"""
H1-2 robot configuration.
File: h1_2_env_cfg.py
"""
import omni.isaac.lab.sim as sim_utils
from omni.isaac.lab.assets import ArticulationCfg

class H1_2RobotCfg:
    """Unitree H1-2 with dexterous hands."""

    robot = ArticulationCfg(
        prim_path="/World/envs/env_.*/Robot",
        spawn=sim_utils.UsdFileCfg(
            usd_path="datasets/robots/unitree/h1_2/h1_2.usd",
            activate_contact_sensors=True,
        ),
        init_state=ArticulationCfg.InitialStateCfg(
            pos=(0.0, 0.0, 1.05),
            joint_pos={
                # Legs (same as H1)
                "left_hip_yaw": 0.0,
                "left_hip_roll": 0.0,
                "left_hip_pitch": -0.2,
                "left_knee": 0.4,
                "left_ankle": -0.2,
                "right_hip_yaw": 0.0,
                "right_hip_roll": 0.0,
                "right_hip_pitch": -0.2,
                "right_knee": 0.4,
                "right_ankle": -0.2,
                # Torso
                "torso": 0.0,
                # Arms (upgraded)
                "left_shoulder_pitch": 0.3,
                "left_shoulder_roll": 0.2,
                "left_shoulder_yaw": 0.0,
                "left_elbow": 0.5,
                "left_wrist_roll": 0.0,
                "left_wrist_pitch": 0.0,
                "right_shoulder_pitch": 0.3,
                "right_shoulder_roll": -0.2,
                "right_shoulder_yaw": 0.0,
                "right_elbow": 0.5,
                "right_wrist_roll": 0.0,
                "right_wrist_pitch": 0.0,
                # Hands
                "left_thumb.*": 0.0,
                "left_index.*": 0.0,
                "left_middle.*": 0.0,
                "right_thumb.*": 0.0,
                "right_index.*": 0.0,
                "right_middle.*": 0.0,
            },
        ),
        actuators={
            "legs": sim_utils.DCMotorCfg(
                joint_names_expr=[".*hip.*", ".*knee.*", ".*ankle.*"],
                stiffness={
                    ".*hip_pitch.*": 230.0,
                    ".*hip_roll.*": 210.0,
                    ".*hip_yaw.*": 170.0,
                    ".*knee.*": 290.0,
                    ".*ankle.*": 46.0,
                },
                damping={
                    ".*hip.*": 7.0,
                    ".*knee.*": 12.0,
                    ".*ankle.*": 3.0,
                },
                effort_limit={
                    ".*hip.*": 138.0,
                    ".*knee.*": 230.0,
                    ".*ankle.*": 46.0,
                },
            ),
            "arms": sim_utils.DCMotorCfg(
                joint_names_expr=[".*shoulder.*", ".*elbow.*", ".*wrist.*"],
                stiffness=120.0,
                damping=6.0,
                effort_limit=80.0,
            ),
            "hands": sim_utils.DCMotorCfg(
                joint_names_expr=[".*thumb.*", ".*index.*", ".*middle.*"],
                stiffness=5.0,
                damping=0.5,
                effort_limit=5.0,
            ),
        },
    )

H1-2 humanoid robot with hands

Adapting the Locomotion Policy for H1-2

Extended Observation Space

import torch

class H1_2ObservationSpace:
    """Observation space for H1-2, including arm state."""

    def compute_observation(self, robot_state, command):
        obs_components = []

        # === Locomotion observations (same as H1) ===
        obs_components.append(robot_state["base_ang_vel"])
        obs_components.append(robot_state["projected_gravity"])
        obs_components.append(command["velocity"])

        # Leg joints only
        leg_joints = robot_state["leg_joint_positions"]
        obs_components.append(leg_joints - robot_state["leg_default_pos"])
        obs_components.append(robot_state["leg_joint_velocities"])
        obs_components.append(robot_state["prev_leg_actions"])

        # === NEW: Arm state (affects balance) ===
        arm_joints = robot_state["arm_joint_positions"]
        obs_components.append(arm_joints)

        arm_vel = robot_state["arm_joint_velocities"]
        obs_components.append(arm_vel * 0.05)

        # === NEW: Payload detection ===
        wrist_forces = robot_state["wrist_forces"]
        obs_components.append(wrist_forces * 0.1)

        obs = torch.cat(obs_components, dim=-1)
        return obs

Reward Adjustments for Added Mass

def compute_h1_2_loco_rewards(state, action, prev_action, command):
    """
    Locomotion rewards for H1-2.
    Different from H1: compensate for arm mass, wider stability margin.
    """
    rewards = {}

    vel_error = torch.sum(
        torch.square(command[:, :2] - state["base_lin_vel"][:, :2]),
        dim=1
    )
    rewards["vel_tracking"] = 1.5 * torch.exp(-vel_error / 0.25)

    # Slightly lower target height due to added weight
    height_error = torch.square(state["base_height"] - 0.96)
    rewards["height"] = -0.8 * height_error

    # Stronger upright reward (higher CoM + heavier)
    orientation_error = torch.sum(
        torch.square(state["projected_gravity"][:, :2]), dim=1
    )
    rewards["upright"] = -2.0 * orientation_error

    # NEW: Arm disturbance compensation
    arm_accel = state["arm_joint_accelerations"]
    rewards["arm_disturbance"] = -0.005 * torch.sum(
        torch.square(arm_accel), dim=1
    )

    # NEW: CoM stability
    com_vel_z = torch.abs(state["com_velocity"][:, 2])
    rewards["com_stability"] = -0.2 * com_vel_z

    # Standard rewards
    rewards["action_rate"] = -0.01 * torch.sum(
        torch.square(action - prev_action), dim=1
    )
    rewards["torque"] = -3e-5 * torch.sum(
        torch.square(state["leg_torques"]), dim=1
    )
    rewards["termination"] = -200.0 * state["terminated"].float()

    yaw_error = torch.square(command[:, 2] - state["base_ang_vel"][:, 2])
    rewards["yaw_tracking"] = 0.8 * torch.exp(-yaw_error / 0.25)

    total = sum(rewards.values())
    return total, rewards

Loco-Manipulation Basics

Hierarchical Control Architecture

class LocoManipulationController:
    """
    Hierarchical controller: locomotion + arm control.

    Architecture:
    - Lower level: Locomotion policy (pre-trained)
    - Upper level: Arm policy (trained new)
    - Coordination: Shared observation + reward coupling
    """

    def __init__(self, loco_policy, arm_policy):
        self.loco_policy = loco_policy
        self.arm_policy = arm_policy

        for param in self.loco_policy.parameters():
            param.requires_grad = False

    def forward(self, obs):
        loco_obs = obs[:, :61]
        arm_obs = obs[:, 61:]

        leg_actions = self.loco_policy(loco_obs)
        arm_actions = self.arm_policy(
            torch.cat([arm_obs, loco_obs[:, :6]], dim=1)
        )

        full_action = torch.cat([leg_actions, arm_actions], dim=1)
        return full_action

    def unfreeze_locomotion(self):
        """Phase 2: Fine-tune locomotion together with arm."""
        for param in self.loco_policy.parameters():
            param.requires_grad = True

Simple Arm Task: Walk + Reach

class WalkAndReachTask:
    """Simple task: walk to position + reach hand to target."""

    def __init__(self):
        self.walk_target = None
        self.reach_target = None

    def reset(self, num_envs, device="cuda"):
        self.walk_target = torch.zeros(num_envs, 3, device=device)
        self.walk_target[:, 0] = torch.empty(num_envs, device=device).uniform_(2, 5)

        self.reach_target = torch.zeros(num_envs, 3, device=device)
        self.reach_target[:, 0] = self.walk_target[:, 0]
        self.reach_target[:, 1] = torch.empty(num_envs, device=device).uniform_(-0.3, 0.3)
        self.reach_target[:, 2] = 0.8

    def compute_reward(self, state, ee_pos):
        rewards = {}

        dist_to_walk = torch.norm(
            state["base_pos"][:, :2] - self.walk_target[:, :2], dim=1
        )
        rewards["walk_progress"] = 0.5 * torch.exp(-dist_to_walk / 2.0)

        close_enough = (dist_to_walk < 0.5).float()
        reach_dist = torch.norm(ee_pos - self.reach_target, dim=1)
        rewards["reach"] = 0.8 * close_enough * torch.exp(-reach_dist / 0.1)

        far_from_target = (dist_to_walk > 1.0).float()
        arm_activity = torch.norm(state["arm_joint_velocities"], dim=1)
        rewards["phase_coupling"] = -0.1 * far_from_target * arm_activity

        total = sum(rewards.values())
        return total, rewards

Co-Training Pipeline

class CoTrainingPipeline:
    """3-phase co-training for loco-manipulation."""

    def train(self, env, total_iterations=20000):
        # Phase 1: Load pre-trained locomotion
        print("Phase 1: Loading pre-trained locomotion policy...")
        loco_policy = self.load_pretrained("h1_walking_policy.pt")
        controller = LocoManipulationController(loco_policy, self.arm_policy)

        # Phase 2: Train arm with frozen locomotion
        print("Phase 2: Training arm policy (frozen legs)...")
        for iteration in range(5000):
            obs = env.get_observations()
            actions = controller.forward(obs)
            obs, reward, done, info = env.step(actions)

            arm_loss = self.compute_arm_loss(reward, controller.arm_policy)
            arm_loss.backward()
            self.arm_optimizer.step()

            if iteration % 500 == 0:
                print(f"  Iter {iteration}: arm_reach_dist="
                      f"{info['reach_distance']:.3f}")

        # Phase 3: Fine-tune both
        print("Phase 3: Co-training legs + arms...")
        controller.unfreeze_locomotion()

        for iteration in range(15000):
            obs = env.get_observations()
            actions = controller.forward(obs)
            obs, reward, done, info = env.step(actions)

            total_loss = self.compute_total_loss(reward, controller)
            total_loss.backward()
            self.joint_optimizer.step()

            if iteration % 500 == 0:
                print(f"  Iter {iteration}: walk_dist={info['walk_distance']:.2f}, "
                      f"reach_dist={info['reach_distance']:.3f}, "
                      f"fall_rate={info['fall_rate']:.2f}")

        print("Co-training complete!")

Whole-body robot coordination

Evaluation: Locomotion Quality with Active Arms

def evaluate_loco_with_arm_tasks(env, controller, num_episodes=50):
    """Compare locomotion quality: arms idle vs arms active."""

    conditions = {
        "arms_idle": {"arm_task": False},
        "arms_reaching": {"arm_task": True, "payload": 0},
        "arms_carrying_2kg": {"arm_task": True, "payload": 2.0},
        "arms_carrying_5kg": {"arm_task": True, "payload": 5.0},
    }

    results = {}
    for condition_name, cfg in conditions.items():
        env.configure(cfg)
        vel_errors = []
        fall_count = 0

        for ep in range(num_episodes):
            obs = env.reset()
            for step in range(500):
                action = controller.forward(obs)
                obs, _, done, info = env.step(action)
                vel_errors.append(info["vel_tracking_error"])
                if done and info["fell"]:
                    fall_count += 1
                    break

        results[condition_name] = {
            "vel_error": np.mean(vel_errors),
            "fall_rate": fall_count / num_episodes,
        }

    print(f"{'Condition':<25} {'Vel Error':>10} {'Fall Rate':>10}")
    print("-" * 47)
    for name, r in results.items():
        print(f"{name:<25} {r['vel_error']:>9.3f}m/s {r['fall_rate']*100:>8.1f}%")

Typical results:

Condition Vel Error Fall Rate
Arms idle 0.08 m/s 2%
Arms reaching 0.12 m/s 5%
Arms carrying 2kg 0.15 m/s 8%
Arms carrying 5kg 0.22 m/s 15%

For more on RL for humanoids, see RL for Humanoid Robots. For detailed manipulation, see the next post Loco-Manipulation.

Summary

The H1-2 brings new challenges to locomotion:

  1. +5kg arm/hand mass changes dynamics, requiring retuned PD gains and stronger upright rewards
  2. Arm state in observation helps the locomotion policy anticipate disturbances
  3. Hierarchical control allows reusing the locomotion policy while training arms separately
  4. 3-phase co-training: frozen legs, train arm, fine-tune both
  5. Locomotion quality degrades when arms are active, especially with payload > 2kg

Next post — Loco-Manipulation — dives deep into walking while carrying and manipulating objects.

References

  1. HumanPlus: Humanoid Shadowing and Imitation from Humans — Fu et al., 2024
  2. Humanoid Loco-Manipulation with Learned Whole-Body Control — Chen et al., CoRL 2024
  3. Expressive Whole-Body Control for Humanoid Robots — Cheng et al., RSS 2024
  4. Unitree H1-2 Technical Report — Unitree Robotics

Related Posts

Related Posts

TutorialSim-to-Real cho Humanoid: Deployment Best Practices
sim2realhumanoiddeploymentrlunitreePart 10

Sim-to-Real cho Humanoid: Deployment Best Practices

Pipeline hoàn chỉnh deploy RL locomotion policy lên robot humanoid thật — domain randomization, system identification, safety, và Unitree SDK.

9/4/202611 min read
ResearchWholeBodyVLA: VLA Toàn Thân cho Humanoid Loco-Manipulation
vlahumanoidloco-manipulationiclrrl

WholeBodyVLA: VLA Toàn Thân cho Humanoid Loco-Manipulation

ICLR 2026 — học manipulation từ egocentric video, kết hợp VLA + RL cho locomotion-aware control

7/4/202613 min read
Deep DiveMulti-Step Manipulation: Curriculum Learning cho Long-Horizon
rlcurriculum-learninglong-horizonmanipulationPart 8

Multi-Step Manipulation: Curriculum Learning cho Long-Horizon

Giải quyết manipulation dài hơi bằng RL — curriculum learning, hierarchical RL, skill chaining, và benchmark IKEA furniture assembly.

7/4/202610 min read