Unitree H1-2: Enhanced Locomotion with New Hardware

The Unitree H1-2 is a significant evolution from the H1 — not only improving locomotion but adding dexterous hands that transform it into a complete loco-manipulation platform. In this post, we analyze the hardware changes, adapt the locomotion policy, and begin co-training locomotion with arm tasks.

Building on H1 dynamic motions, this post extends into whole-body coordination.

H1-2 Hardware: What Changed from H1?

Parameter	H1	H1-2	Improvement
Height	1.80 m	1.80 m	Same
Weight	47 kg	52 kg	+5kg (arms + hands)
Total DOF	19	35+	+16 DOF (hands)
Hands	Simple gripper	Dexterous 5-finger	6 DOF per hand
Wrist DOF	0	2 (roll + pitch)	Increased dexterity
Arm actuators	Standard	Upgraded	Torque + bandwidth
Leg actuators	Standard	Improved	+15% torque
Sensors	IMU, encoders	+ force/torque sensors	At wrist
Payload	5 kg	8 kg	Carry heavier objects

Why Hardware Changes Affect Locomotion

1. Added 5kg arm/hand mass:

Center of mass shifts upward, making balance harder
Inertia tensor changes, requiring retuned PD gains
Arm motion creates disturbance forces on locomotion

2. Arm dynamics coupling:

When hands carry objects, CoM shifts
Arm swing pattern changes, affecting gait
Policy needs to anticipate arm effects

3. Improved leg actuators:

+15% torque allows more aggressive locomotion
Higher bandwidth means faster response
Can be exploited for high-speed gaits

"""
H1-2 robot configuration.
File: h1_2_env_cfg.py
"""
import omni.isaac.lab.sim as sim_utils
from omni.isaac.lab.assets import ArticulationCfg

class H1_2RobotCfg:
    """Unitree H1-2 with dexterous hands."""

    robot = ArticulationCfg(
        prim_path="/World/envs/env_.*/Robot",
        spawn=sim_utils.UsdFileCfg(
            usd_path="datasets/robots/unitree/h1_2/h1_2.usd",
            activate_contact_sensors=True,
        ),
        init_state=ArticulationCfg.InitialStateCfg(
            pos=(0.0, 0.0, 1.05),
            joint_pos={
                # Legs (same as H1)
                "left_hip_yaw": 0.0,
                "left_hip_roll": 0.0,
                "left_hip_pitch": -0.2,
                "left_knee": 0.4,
                "left_ankle": -0.2,
                "right_hip_yaw": 0.0,
                "right_hip_roll": 0.0,
                "right_hip_pitch": -0.2,
                "right_knee": 0.4,
                "right_ankle": -0.2,
                # Torso
                "torso": 0.0,
                # Arms (upgraded)
                "left_shoulder_pitch": 0.3,
                "left_shoulder_roll": 0.2,
                "left_shoulder_yaw": 0.0,
                "left_elbow": 0.5,
                "left_wrist_roll": 0.0,
                "left_wrist_pitch": 0.0,
                "right_shoulder_pitch": 0.3,
                "right_shoulder_roll": -0.2,
                "right_shoulder_yaw": 0.0,
                "right_elbow": 0.5,
                "right_wrist_roll": 0.0,
                "right_wrist_pitch": 0.0,
                # Hands
                "left_thumb.*": 0.0,
                "left_index.*": 0.0,
                "left_middle.*": 0.0,
                "right_thumb.*": 0.0,
                "right_index.*": 0.0,
                "right_middle.*": 0.0,
            },
        ),
        actuators={
            "legs": sim_utils.DCMotorCfg(
                joint_names_expr=[".*hip.*", ".*knee.*", ".*ankle.*"],
                stiffness={
                    ".*hip_pitch.*": 230.0,
                    ".*hip_roll.*": 210.0,
                    ".*hip_yaw.*": 170.0,
                    ".*knee.*": 290.0,
                    ".*ankle.*": 46.0,
                },
                damping={
                    ".*hip.*": 7.0,
                    ".*knee.*": 12.0,
                    ".*ankle.*": 3.0,
                },
                effort_limit={
                    ".*hip.*": 138.0,
                    ".*knee.*": 230.0,
                    ".*ankle.*": 46.0,
                },
            ),
            "arms": sim_utils.DCMotorCfg(
                joint_names_expr=[".*shoulder.*", ".*elbow.*", ".*wrist.*"],
                stiffness=120.0,
                damping=6.0,
                effort_limit=80.0,
            ),
            "hands": sim_utils.DCMotorCfg(
                joint_names_expr=[".*thumb.*", ".*index.*", ".*middle.*"],
                stiffness=5.0,
                damping=0.5,
                effort_limit=5.0,
            ),
        },
    )

H1-2 humanoid robot with hands

Adapting the Locomotion Policy for H1-2

Extended Observation Space

import torch

class H1_2ObservationSpace:
    """Observation space for H1-2, including arm state."""

    def compute_observation(self, robot_state, command):
        obs_components = []

        # === Locomotion observations (same as H1) ===
        obs_components.append(robot_state["base_ang_vel"])
        obs_components.append(robot_state["projected_gravity"])
        obs_components.append(command["velocity"])

        # Leg joints only
        leg_joints = robot_state["leg_joint_positions"]
        obs_components.append(leg_joints - robot_state["leg_default_pos"])
        obs_components.append(robot_state["leg_joint_velocities"])
        obs_components.append(robot_state["prev_leg_actions"])

        # === NEW: Arm state (affects balance) ===
        arm_joints = robot_state["arm_joint_positions"]
        obs_components.append(arm_joints)

        arm_vel = robot_state["arm_joint_velocities"]
        obs_components.append(arm_vel * 0.05)

        # === NEW: Payload detection ===
        wrist_forces = robot_state["wrist_forces"]
        obs_components.append(wrist_forces * 0.1)

        obs = torch.cat(obs_components, dim=-1)
        return obs

Reward Adjustments for Added Mass

def compute_h1_2_loco_rewards(state, action, prev_action, command):
    """
    Locomotion rewards for H1-2.
    Different from H1: compensate for arm mass, wider stability margin.
    """
    rewards = {}

    vel_error = torch.sum(
        torch.square(command[:, :2] - state["base_lin_vel"][:, :2]),
        dim=1
    )
    rewards["vel_tracking"] = 1.5 * torch.exp(-vel_error / 0.25)

    # Slightly lower target height due to added weight
    height_error = torch.square(state["base_height"] - 0.96)
    rewards["height"] = -0.8 * height_error

    # Stronger upright reward (higher CoM + heavier)
    orientation_error = torch.sum(
        torch.square(state["projected_gravity"][:, :2]), dim=1
    )
    rewards["upright"] = -2.0 * orientation_error

    # NEW: Arm disturbance compensation
    arm_accel = state["arm_joint_accelerations"]
    rewards["arm_disturbance"] = -0.005 * torch.sum(
        torch.square(arm_accel), dim=1
    )

    # NEW: CoM stability
    com_vel_z = torch.abs(state["com_velocity"][:, 2])
    rewards["com_stability"] = -0.2 * com_vel_z

    # Standard rewards
    rewards["action_rate"] = -0.01 * torch.sum(
        torch.square(action - prev_action), dim=1
    )
    rewards["torque"] = -3e-5 * torch.sum(
        torch.square(state["leg_torques"]), dim=1
    )
    rewards["termination"] = -200.0 * state["terminated"].float()

    yaw_error = torch.square(command[:, 2] - state["base_ang_vel"][:, 2])
    rewards["yaw_tracking"] = 0.8 * torch.exp(-yaw_error / 0.25)

    total = sum(rewards.values())
    return total, rewards

Loco-Manipulation Basics

Hierarchical Control Architecture

class LocoManipulationController:
    """
    Hierarchical controller: locomotion + arm control.

    Architecture:
    - Lower level: Locomotion policy (pre-trained)
    - Upper level: Arm policy (trained new)
    - Coordination: Shared observation + reward coupling
    """

    def __init__(self, loco_policy, arm_policy):
        self.loco_policy = loco_policy
        self.arm_policy = arm_policy

        for param in self.loco_policy.parameters():
            param.requires_grad = False

    def forward(self, obs):
        loco_obs = obs[:, :61]
        arm_obs = obs[:, 61:]

        leg_actions = self.loco_policy(loco_obs)
        arm_actions = self.arm_policy(
            torch.cat([arm_obs, loco_obs[:, :6]], dim=1)
        )

        full_action = torch.cat([leg_actions, arm_actions], dim=1)
        return full_action

    def unfreeze_locomotion(self):
        """Phase 2: Fine-tune locomotion together with arm."""
        for param in self.loco_policy.parameters():
            param.requires_grad = True

Simple Arm Task: Walk + Reach

class WalkAndReachTask:
    """Simple task: walk to position + reach hand to target."""

    def __init__(self):
        self.walk_target = None
        self.reach_target = None

    def reset(self, num_envs, device="cuda"):
        self.walk_target = torch.zeros(num_envs, 3, device=device)
        self.walk_target[:, 0] = torch.empty(num_envs, device=device).uniform_(2, 5)

        self.reach_target = torch.zeros(num_envs, 3, device=device)
        self.reach_target[:, 0] = self.walk_target[:, 0]
        self.reach_target[:, 1] = torch.empty(num_envs, device=device).uniform_(-0.3, 0.3)
        self.reach_target[:, 2] = 0.8

    def compute_reward(self, state, ee_pos):
        rewards = {}

        dist_to_walk = torch.norm(
            state["base_pos"][:, :2] - self.walk_target[:, :2], dim=1
        )
        rewards["walk_progress"] = 0.5 * torch.exp(-dist_to_walk / 2.0)

        close_enough = (dist_to_walk < 0.5).float()
        reach_dist = torch.norm(ee_pos - self.reach_target, dim=1)
        rewards["reach"] = 0.8 * close_enough * torch.exp(-reach_dist / 0.1)

        far_from_target = (dist_to_walk > 1.0).float()
        arm_activity = torch.norm(state["arm_joint_velocities"], dim=1)
        rewards["phase_coupling"] = -0.1 * far_from_target * arm_activity

        total = sum(rewards.values())
        return total, rewards

Co-Training Pipeline

class CoTrainingPipeline:
    """3-phase co-training for loco-manipulation."""

    def train(self, env, total_iterations=20000):
        # Phase 1: Load pre-trained locomotion
        print("Phase 1: Loading pre-trained locomotion policy...")
        loco_policy = self.load_pretrained("h1_walking_policy.pt")
        controller = LocoManipulationController(loco_policy, self.arm_policy)

        # Phase 2: Train arm with frozen locomotion
        print("Phase 2: Training arm policy (frozen legs)...")
        for iteration in range(5000):
            obs = env.get_observations()
            actions = controller.forward(obs)
            obs, reward, done, info = env.step(actions)

            arm_loss = self.compute_arm_loss(reward, controller.arm_policy)
            arm_loss.backward()
            self.arm_optimizer.step()

            if iteration % 500 == 0:
                print(f"  Iter {iteration}: arm_reach_dist="
                      f"{info['reach_distance']:.3f}")

        # Phase 3: Fine-tune both
        print("Phase 3: Co-training legs + arms...")
        controller.unfreeze_locomotion()

        for iteration in range(15000):
            obs = env.get_observations()
            actions = controller.forward(obs)
            obs, reward, done, info = env.step(actions)

            total_loss = self.compute_total_loss(reward, controller)
            total_loss.backward()
            self.joint_optimizer.step()

            if iteration % 500 == 0:
                print(f"  Iter {iteration}: walk_dist={info['walk_distance']:.2f}, "
                      f"reach_dist={info['reach_distance']:.3f}, "
                      f"fall_rate={info['fall_rate']:.2f}")

        print("Co-training complete!")

Whole-body robot coordination

Evaluation: Locomotion Quality with Active Arms

def evaluate_loco_with_arm_tasks(env, controller, num_episodes=50):
    """Compare locomotion quality: arms idle vs arms active."""

    conditions = {
        "arms_idle": {"arm_task": False},
        "arms_reaching": {"arm_task": True, "payload": 0},
        "arms_carrying_2kg": {"arm_task": True, "payload": 2.0},
        "arms_carrying_5kg": {"arm_task": True, "payload": 5.0},
    }

    results = {}
    for condition_name, cfg in conditions.items():
        env.configure(cfg)
        vel_errors = []
        fall_count = 0

        for ep in range(num_episodes):
            obs = env.reset()
            for step in range(500):
                action = controller.forward(obs)
                obs, _, done, info = env.step(action)
                vel_errors.append(info["vel_tracking_error"])
                if done and info["fell"]:
                    fall_count += 1
                    break

        results[condition_name] = {
            "vel_error": np.mean(vel_errors),
            "fall_rate": fall_count / num_episodes,
        }

    print(f"{'Condition':<25} {'Vel Error':>10} {'Fall Rate':>10}")
    print("-" * 47)
    for name, r in results.items():
        print(f"{name:<25} {r['vel_error']:>9.3f}m/s {r['fall_rate']*100:>8.1f}%")

Typical results:

Condition	Vel Error	Fall Rate
Arms idle	0.08 m/s	2%
Arms reaching	0.12 m/s	5%
Arms carrying 2kg	0.15 m/s	8%
Arms carrying 5kg	0.22 m/s	15%

For more on RL for humanoids, see RL for Humanoid Robots. For detailed manipulation, see the next post Loco-Manipulation.

Summary

The H1-2 brings new challenges to locomotion:

+5kg arm/hand mass changes dynamics, requiring retuned PD gains and stronger upright rewards
Arm state in observation helps the locomotion policy anticipate disturbances
Hierarchical control allows reusing the locomotion policy while training arms separately
3-phase co-training: frozen legs, train arm, fine-tune both
Locomotion quality degrades when arms are active, especially with payload > 2kg

Next post — Loco-Manipulation — dives deep into walking while carrying and manipulating objects.

References

HumanPlus: Humanoid Shadowing and Imitation from Humans — Fu et al., 2024
Humanoid Loco-Manipulation with Learned Whole-Body Control — Chen et al., CoRL 2024
Expressive Whole-Body Control for Humanoid Robots — Cheng et al., RSS 2024
Unitree H1-2 Technical Report — Unitree Robotics

Building on H1 dynamic motions, this post extends into whole-body coordination.

H1-2 Hardware: What Changed from H1?

Parameter	H1	H1-2	Improvement
Height	1.80 m	1.80 m	Same
Weight	47 kg	52 kg	+5kg (arms + hands)
Total DOF	19	35+	+16 DOF (hands)
Hands	Simple gripper	Dexterous 5-finger	6 DOF per hand
Wrist DOF	0	2 (roll + pitch)	Increased dexterity
Arm actuators	Standard	Upgraded	Torque + bandwidth
Leg actuators	Standard	Improved	+15% torque
Sensors	IMU, encoders	+ force/torque sensors	At wrist
Payload	5 kg	8 kg	Carry heavier objects

Why Hardware Changes Affect Locomotion

1. Added 5kg arm/hand mass:

Center of mass shifts upward, making balance harder
Inertia tensor changes, requiring retuned PD gains
Arm motion creates disturbance forces on locomotion

2. Arm dynamics coupling:

When hands carry objects, CoM shifts
Arm swing pattern changes, affecting gait
Policy needs to anticipate arm effects

3. Improved leg actuators:

+15% torque allows more aggressive locomotion
Higher bandwidth means faster response
Can be exploited for high-speed gaits

"""
H1-2 robot configuration.
File: h1_2_env_cfg.py
"""
import omni.isaac.lab.sim as sim_utils
from omni.isaac.lab.assets import ArticulationCfg

class H1_2RobotCfg:
    """Unitree H1-2 with dexterous hands."""

    robot = ArticulationCfg(
        prim_path="/World/envs/env_.*/Robot",
        spawn=sim_utils.UsdFileCfg(
            usd_path="datasets/robots/unitree/h1_2/h1_2.usd",
            activate_contact_sensors=True,
        ),
        init_state=ArticulationCfg.InitialStateCfg(
            pos=(0.0, 0.0, 1.05),
            joint_pos={
                # Legs (same as H1)
                "left_hip_yaw": 0.0,
                "left_hip_roll": 0.0,
                "left_hip_pitch": -0.2,
                "left_knee": 0.4,
                "left_ankle": -0.2,
                "right_hip_yaw": 0.0,
                "right_hip_roll": 0.0,
                "right_hip_pitch": -0.2,
                "right_knee": 0.4,
                "right_ankle": -0.2,
                # Torso
                "torso": 0.0,
                # Arms (upgraded)
                "left_shoulder_pitch": 0.3,
                "left_shoulder_roll": 0.2,
                "left_shoulder_yaw": 0.0,
                "left_elbow": 0.5,
                "left_wrist_roll": 0.0,
                "left_wrist_pitch": 0.0,
                "right_shoulder_pitch": 0.3,
                "right_shoulder_roll": -0.2,
                "right_shoulder_yaw": 0.0,
                "right_elbow": 0.5,
                "right_wrist_roll": 0.0,
                "right_wrist_pitch": 0.0,
                # Hands
                "left_thumb.*": 0.0,
                "left_index.*": 0.0,
                "left_middle.*": 0.0,
                "right_thumb.*": 0.0,
                "right_index.*": 0.0,
                "right_middle.*": 0.0,
            },
        ),
        actuators={
            "legs": sim_utils.DCMotorCfg(
                joint_names_expr=[".*hip.*", ".*knee.*", ".*ankle.*"],
                stiffness={
                    ".*hip_pitch.*": 230.0,
                    ".*hip_roll.*": 210.0,
                    ".*hip_yaw.*": 170.0,
                    ".*knee.*": 290.0,
                    ".*ankle.*": 46.0,
                },
                damping={
                    ".*hip.*": 7.0,
                    ".*knee.*": 12.0,
                    ".*ankle.*": 3.0,
                },
                effort_limit={
                    ".*hip.*": 138.0,
                    ".*knee.*": 230.0,
                    ".*ankle.*": 46.0,
                },
            ),
            "arms": sim_utils.DCMotorCfg(
                joint_names_expr=[".*shoulder.*", ".*elbow.*", ".*wrist.*"],
                stiffness=120.0,
                damping=6.0,
                effort_limit=80.0,
            ),
            "hands": sim_utils.DCMotorCfg(
                joint_names_expr=[".*thumb.*", ".*index.*", ".*middle.*"],
                stiffness=5.0,
                damping=0.5,
                effort_limit=5.0,
            ),
        },
    )

H1-2 humanoid robot with hands

Adapting the Locomotion Policy for H1-2

Extended Observation Space

import torch

class H1_2ObservationSpace:
    """Observation space for H1-2, including arm state."""

    def compute_observation(self, robot_state, command):
        obs_components = []

        # === Locomotion observations (same as H1) ===
        obs_components.append(robot_state["base_ang_vel"])
        obs_components.append(robot_state["projected_gravity"])
        obs_components.append(command["velocity"])

        # Leg joints only
        leg_joints = robot_state["leg_joint_positions"]
        obs_components.append(leg_joints - robot_state["leg_default_pos"])
        obs_components.append(robot_state["leg_joint_velocities"])
        obs_components.append(robot_state["prev_leg_actions"])

        # === NEW: Arm state (affects balance) ===
        arm_joints = robot_state["arm_joint_positions"]
        obs_components.append(arm_joints)

        arm_vel = robot_state["arm_joint_velocities"]
        obs_components.append(arm_vel * 0.05)

        # === NEW: Payload detection ===
        wrist_forces = robot_state["wrist_forces"]
        obs_components.append(wrist_forces * 0.1)

        obs = torch.cat(obs_components, dim=-1)
        return obs

Reward Adjustments for Added Mass

def compute_h1_2_loco_rewards(state, action, prev_action, command):
    """
    Locomotion rewards for H1-2.
    Different from H1: compensate for arm mass, wider stability margin.
    """
    rewards = {}

    vel_error = torch.sum(
        torch.square(command[:, :2] - state["base_lin_vel"][:, :2]),
        dim=1
    )
    rewards["vel_tracking"] = 1.5 * torch.exp(-vel_error / 0.25)

    # Slightly lower target height due to added weight
    height_error = torch.square(state["base_height"] - 0.96)
    rewards["height"] = -0.8 * height_error

    # Stronger upright reward (higher CoM + heavier)
    orientation_error = torch.sum(
        torch.square(state["projected_gravity"][:, :2]), dim=1
    )
    rewards["upright"] = -2.0 * orientation_error

    # NEW: Arm disturbance compensation
    arm_accel = state["arm_joint_accelerations"]
    rewards["arm_disturbance"] = -0.005 * torch.sum(
        torch.square(arm_accel), dim=1
    )

    # NEW: CoM stability
    com_vel_z = torch.abs(state["com_velocity"][:, 2])
    rewards["com_stability"] = -0.2 * com_vel_z

    # Standard rewards
    rewards["action_rate"] = -0.01 * torch.sum(
        torch.square(action - prev_action), dim=1
    )
    rewards["torque"] = -3e-5 * torch.sum(
        torch.square(state["leg_torques"]), dim=1
    )
    rewards["termination"] = -200.0 * state["terminated"].float()

    yaw_error = torch.square(command[:, 2] - state["base_ang_vel"][:, 2])
    rewards["yaw_tracking"] = 0.8 * torch.exp(-yaw_error / 0.25)

    total = sum(rewards.values())
    return total, rewards

Loco-Manipulation Basics

Hierarchical Control Architecture

class LocoManipulationController:
    """
    Hierarchical controller: locomotion + arm control.

    Architecture:
    - Lower level: Locomotion policy (pre-trained)
    - Upper level: Arm policy (trained new)
    - Coordination: Shared observation + reward coupling
    """

    def __init__(self, loco_policy, arm_policy):
        self.loco_policy = loco_policy
        self.arm_policy = arm_policy

        for param in self.loco_policy.parameters():
            param.requires_grad = False

    def forward(self, obs):
        loco_obs = obs[:, :61]
        arm_obs = obs[:, 61:]

        leg_actions = self.loco_policy(loco_obs)
        arm_actions = self.arm_policy(
            torch.cat([arm_obs, loco_obs[:, :6]], dim=1)
        )

        full_action = torch.cat([leg_actions, arm_actions], dim=1)
        return full_action

    def unfreeze_locomotion(self):
        """Phase 2: Fine-tune locomotion together with arm."""
        for param in self.loco_policy.parameters():
            param.requires_grad = True

Simple Arm Task: Walk + Reach

class WalkAndReachTask:
    """Simple task: walk to position + reach hand to target."""

    def __init__(self):
        self.walk_target = None
        self.reach_target = None

    def reset(self, num_envs, device="cuda"):
        self.walk_target = torch.zeros(num_envs, 3, device=device)
        self.walk_target[:, 0] = torch.empty(num_envs, device=device).uniform_(2, 5)

        self.reach_target = torch.zeros(num_envs, 3, device=device)
        self.reach_target[:, 0] = self.walk_target[:, 0]
        self.reach_target[:, 1] = torch.empty(num_envs, device=device).uniform_(-0.3, 0.3)
        self.reach_target[:, 2] = 0.8

    def compute_reward(self, state, ee_pos):
        rewards = {}

        dist_to_walk = torch.norm(
            state["base_pos"][:, :2] - self.walk_target[:, :2], dim=1
        )
        rewards["walk_progress"] = 0.5 * torch.exp(-dist_to_walk / 2.0)

        close_enough = (dist_to_walk < 0.5).float()
        reach_dist = torch.norm(ee_pos - self.reach_target, dim=1)
        rewards["reach"] = 0.8 * close_enough * torch.exp(-reach_dist / 0.1)

        far_from_target = (dist_to_walk > 1.0).float()
        arm_activity = torch.norm(state["arm_joint_velocities"], dim=1)
        rewards["phase_coupling"] = -0.1 * far_from_target * arm_activity

        total = sum(rewards.values())
        return total, rewards

Co-Training Pipeline

class CoTrainingPipeline:
    """3-phase co-training for loco-manipulation."""

    def train(self, env, total_iterations=20000):
        # Phase 1: Load pre-trained locomotion
        print("Phase 1: Loading pre-trained locomotion policy...")
        loco_policy = self.load_pretrained("h1_walking_policy.pt")
        controller = LocoManipulationController(loco_policy, self.arm_policy)

        # Phase 2: Train arm with frozen locomotion
        print("Phase 2: Training arm policy (frozen legs)...")
        for iteration in range(5000):
            obs = env.get_observations()
            actions = controller.forward(obs)
            obs, reward, done, info = env.step(actions)

            arm_loss = self.compute_arm_loss(reward, controller.arm_policy)
            arm_loss.backward()
            self.arm_optimizer.step()

            if iteration % 500 == 0:
                print(f"  Iter {iteration}: arm_reach_dist="
                      f"{info['reach_distance']:.3f}")

        # Phase 3: Fine-tune both
        print("Phase 3: Co-training legs + arms...")
        controller.unfreeze_locomotion()

        for iteration in range(15000):
            obs = env.get_observations()
            actions = controller.forward(obs)
            obs, reward, done, info = env.step(actions)

            total_loss = self.compute_total_loss(reward, controller)
            total_loss.backward()
            self.joint_optimizer.step()

            if iteration % 500 == 0:
                print(f"  Iter {iteration}: walk_dist={info['walk_distance']:.2f}, "
                      f"reach_dist={info['reach_distance']:.3f}, "
                      f"fall_rate={info['fall_rate']:.2f}")

        print("Co-training complete!")

Whole-body robot coordination

Evaluation: Locomotion Quality with Active Arms

def evaluate_loco_with_arm_tasks(env, controller, num_episodes=50):
    """Compare locomotion quality: arms idle vs arms active."""

    conditions = {
        "arms_idle": {"arm_task": False},
        "arms_reaching": {"arm_task": True, "payload": 0},
        "arms_carrying_2kg": {"arm_task": True, "payload": 2.0},
        "arms_carrying_5kg": {"arm_task": True, "payload": 5.0},
    }

    results = {}
    for condition_name, cfg in conditions.items():
        env.configure(cfg)
        vel_errors = []
        fall_count = 0

        for ep in range(num_episodes):
            obs = env.reset()
            for step in range(500):
                action = controller.forward(obs)
                obs, _, done, info = env.step(action)
                vel_errors.append(info["vel_tracking_error"])
                if done and info["fell"]:
                    fall_count += 1
                    break

        results[condition_name] = {
            "vel_error": np.mean(vel_errors),
            "fall_rate": fall_count / num_episodes,
        }

    print(f"{'Condition':<25} {'Vel Error':>10} {'Fall Rate':>10}")
    print("-" * 47)
    for name, r in results.items():
        print(f"{name:<25} {r['vel_error']:>9.3f}m/s {r['fall_rate']*100:>8.1f}%")

Typical results:

Condition	Vel Error	Fall Rate
Arms idle	0.08 m/s	2%
Arms reaching	0.12 m/s	5%
Arms carrying 2kg	0.15 m/s	8%
Arms carrying 5kg	0.22 m/s	15%

For more on RL for humanoids, see RL for Humanoid Robots. For detailed manipulation, see the next post Loco-Manipulation.

Summary

The H1-2 brings new challenges to locomotion:

+5kg arm/hand mass changes dynamics, requiring retuned PD gains and stronger upright rewards
Arm state in observation helps the locomotion policy anticipate disturbances
Hierarchical control allows reusing the locomotion policy while training arms separately
3-phase co-training: frozen legs, train arm, fine-tune both
Locomotion quality degrades when arms are active, especially with payload > 2kg

Next post — Loco-Manipulation — dives deep into walking while carrying and manipulating objects.

References

HumanPlus: Humanoid Shadowing and Imitation from Humans — Fu et al., 2024
Humanoid Loco-Manipulation with Learned Whole-Body Control — Chen et al., CoRL 2024
Expressive Whole-Body Control for Humanoid Robots — Cheng et al., RSS 2024
Unitree H1-2 Technical Report — Unitree Robotics

Unitree H1-2: Enhanced Locomotion with New Hardware

H1-2 Hardware: What Changed from H1?

Why Hardware Changes Affect Locomotion

Adapting the Locomotion Policy for H1-2

Extended Observation Space

Reward Adjustments for Added Mass

Loco-Manipulation Basics

Hierarchical Control Architecture

Simple Arm Task: Walk + Reach

Co-Training Pipeline

Evaluation: Locomotion Quality with Active Arms

Summary

References

Nguyễn Anh Tuấn

Related Posts

Unitree H1: Full-size Humanoid Locomotion Training

Unitree G1: Terrain Adaptation và Robust Walking

Unitree G1: Train Walking Policy từ Scratch trong Isaac Lab

Unitree H1-2: Enhanced Locomotion with New Hardware

H1-2 Hardware: What Changed from H1?

Why Hardware Changes Affect Locomotion

Adapting the Locomotion Policy for H1-2

Extended Observation Space

Reward Adjustments for Added Mass

Loco-Manipulation Basics

Hierarchical Control Architecture

Simple Arm Task: Walk + Reach

Co-Training Pipeline

Evaluation: Locomotion Quality with Active Arms

Summary

References

Nguyễn Anh Tuấn

Related Posts

Unitree H1: Full-size Humanoid Locomotion Training

Unitree G1: Terrain Adaptation và Robust Walking

Unitree G1: Train Walking Policy từ Scratch trong Isaac Lab