The Unitree H1-2 is a significant evolution from the H1 — not only improving locomotion but adding dexterous hands that transform it into a complete loco-manipulation platform. In this post, we analyze the hardware changes, adapt the locomotion policy, and begin co-training locomotion with arm tasks.
Building on H1 dynamic motions, this post extends into whole-body coordination.
H1-2 Hardware: What Changed from H1?
| Parameter | H1 | H1-2 | Improvement |
|---|---|---|---|
| Height | 1.80 m | 1.80 m | Same |
| Weight | 47 kg | 52 kg | +5kg (arms + hands) |
| Total DOF | 19 | 35+ | +16 DOF (hands) |
| Hands | Simple gripper | Dexterous 5-finger | 6 DOF per hand |
| Wrist DOF | 0 | 2 (roll + pitch) | Increased dexterity |
| Arm actuators | Standard | Upgraded | Torque + bandwidth |
| Leg actuators | Standard | Improved | +15% torque |
| Sensors | IMU, encoders | + force/torque sensors | At wrist |
| Payload | 5 kg | 8 kg | Carry heavier objects |
Why Hardware Changes Affect Locomotion
1. Added 5kg arm/hand mass:
- Center of mass shifts upward, making balance harder
- Inertia tensor changes, requiring retuned PD gains
- Arm motion creates disturbance forces on locomotion
2. Arm dynamics coupling:
- When hands carry objects, CoM shifts
- Arm swing pattern changes, affecting gait
- Policy needs to anticipate arm effects
3. Improved leg actuators:
- +15% torque allows more aggressive locomotion
- Higher bandwidth means faster response
- Can be exploited for high-speed gaits
"""
H1-2 robot configuration.
File: h1_2_env_cfg.py
"""
import omni.isaac.lab.sim as sim_utils
from omni.isaac.lab.assets import ArticulationCfg
class H1_2RobotCfg:
"""Unitree H1-2 with dexterous hands."""
robot = ArticulationCfg(
prim_path="/World/envs/env_.*/Robot",
spawn=sim_utils.UsdFileCfg(
usd_path="datasets/robots/unitree/h1_2/h1_2.usd",
activate_contact_sensors=True,
),
init_state=ArticulationCfg.InitialStateCfg(
pos=(0.0, 0.0, 1.05),
joint_pos={
# Legs (same as H1)
"left_hip_yaw": 0.0,
"left_hip_roll": 0.0,
"left_hip_pitch": -0.2,
"left_knee": 0.4,
"left_ankle": -0.2,
"right_hip_yaw": 0.0,
"right_hip_roll": 0.0,
"right_hip_pitch": -0.2,
"right_knee": 0.4,
"right_ankle": -0.2,
# Torso
"torso": 0.0,
# Arms (upgraded)
"left_shoulder_pitch": 0.3,
"left_shoulder_roll": 0.2,
"left_shoulder_yaw": 0.0,
"left_elbow": 0.5,
"left_wrist_roll": 0.0,
"left_wrist_pitch": 0.0,
"right_shoulder_pitch": 0.3,
"right_shoulder_roll": -0.2,
"right_shoulder_yaw": 0.0,
"right_elbow": 0.5,
"right_wrist_roll": 0.0,
"right_wrist_pitch": 0.0,
# Hands
"left_thumb.*": 0.0,
"left_index.*": 0.0,
"left_middle.*": 0.0,
"right_thumb.*": 0.0,
"right_index.*": 0.0,
"right_middle.*": 0.0,
},
),
actuators={
"legs": sim_utils.DCMotorCfg(
joint_names_expr=[".*hip.*", ".*knee.*", ".*ankle.*"],
stiffness={
".*hip_pitch.*": 230.0,
".*hip_roll.*": 210.0,
".*hip_yaw.*": 170.0,
".*knee.*": 290.0,
".*ankle.*": 46.0,
},
damping={
".*hip.*": 7.0,
".*knee.*": 12.0,
".*ankle.*": 3.0,
},
effort_limit={
".*hip.*": 138.0,
".*knee.*": 230.0,
".*ankle.*": 46.0,
},
),
"arms": sim_utils.DCMotorCfg(
joint_names_expr=[".*shoulder.*", ".*elbow.*", ".*wrist.*"],
stiffness=120.0,
damping=6.0,
effort_limit=80.0,
),
"hands": sim_utils.DCMotorCfg(
joint_names_expr=[".*thumb.*", ".*index.*", ".*middle.*"],
stiffness=5.0,
damping=0.5,
effort_limit=5.0,
),
},
)
Adapting the Locomotion Policy for H1-2
Extended Observation Space
import torch
class H1_2ObservationSpace:
"""Observation space for H1-2, including arm state."""
def compute_observation(self, robot_state, command):
obs_components = []
# === Locomotion observations (same as H1) ===
obs_components.append(robot_state["base_ang_vel"])
obs_components.append(robot_state["projected_gravity"])
obs_components.append(command["velocity"])
# Leg joints only
leg_joints = robot_state["leg_joint_positions"]
obs_components.append(leg_joints - robot_state["leg_default_pos"])
obs_components.append(robot_state["leg_joint_velocities"])
obs_components.append(robot_state["prev_leg_actions"])
# === NEW: Arm state (affects balance) ===
arm_joints = robot_state["arm_joint_positions"]
obs_components.append(arm_joints)
arm_vel = robot_state["arm_joint_velocities"]
obs_components.append(arm_vel * 0.05)
# === NEW: Payload detection ===
wrist_forces = robot_state["wrist_forces"]
obs_components.append(wrist_forces * 0.1)
obs = torch.cat(obs_components, dim=-1)
return obs
Reward Adjustments for Added Mass
def compute_h1_2_loco_rewards(state, action, prev_action, command):
"""
Locomotion rewards for H1-2.
Different from H1: compensate for arm mass, wider stability margin.
"""
rewards = {}
vel_error = torch.sum(
torch.square(command[:, :2] - state["base_lin_vel"][:, :2]),
dim=1
)
rewards["vel_tracking"] = 1.5 * torch.exp(-vel_error / 0.25)
# Slightly lower target height due to added weight
height_error = torch.square(state["base_height"] - 0.96)
rewards["height"] = -0.8 * height_error
# Stronger upright reward (higher CoM + heavier)
orientation_error = torch.sum(
torch.square(state["projected_gravity"][:, :2]), dim=1
)
rewards["upright"] = -2.0 * orientation_error
# NEW: Arm disturbance compensation
arm_accel = state["arm_joint_accelerations"]
rewards["arm_disturbance"] = -0.005 * torch.sum(
torch.square(arm_accel), dim=1
)
# NEW: CoM stability
com_vel_z = torch.abs(state["com_velocity"][:, 2])
rewards["com_stability"] = -0.2 * com_vel_z
# Standard rewards
rewards["action_rate"] = -0.01 * torch.sum(
torch.square(action - prev_action), dim=1
)
rewards["torque"] = -3e-5 * torch.sum(
torch.square(state["leg_torques"]), dim=1
)
rewards["termination"] = -200.0 * state["terminated"].float()
yaw_error = torch.square(command[:, 2] - state["base_ang_vel"][:, 2])
rewards["yaw_tracking"] = 0.8 * torch.exp(-yaw_error / 0.25)
total = sum(rewards.values())
return total, rewards
Loco-Manipulation Basics
Hierarchical Control Architecture
class LocoManipulationController:
"""
Hierarchical controller: locomotion + arm control.
Architecture:
- Lower level: Locomotion policy (pre-trained)
- Upper level: Arm policy (trained new)
- Coordination: Shared observation + reward coupling
"""
def __init__(self, loco_policy, arm_policy):
self.loco_policy = loco_policy
self.arm_policy = arm_policy
for param in self.loco_policy.parameters():
param.requires_grad = False
def forward(self, obs):
loco_obs = obs[:, :61]
arm_obs = obs[:, 61:]
leg_actions = self.loco_policy(loco_obs)
arm_actions = self.arm_policy(
torch.cat([arm_obs, loco_obs[:, :6]], dim=1)
)
full_action = torch.cat([leg_actions, arm_actions], dim=1)
return full_action
def unfreeze_locomotion(self):
"""Phase 2: Fine-tune locomotion together with arm."""
for param in self.loco_policy.parameters():
param.requires_grad = True
Simple Arm Task: Walk + Reach
class WalkAndReachTask:
"""Simple task: walk to position + reach hand to target."""
def __init__(self):
self.walk_target = None
self.reach_target = None
def reset(self, num_envs, device="cuda"):
self.walk_target = torch.zeros(num_envs, 3, device=device)
self.walk_target[:, 0] = torch.empty(num_envs, device=device).uniform_(2, 5)
self.reach_target = torch.zeros(num_envs, 3, device=device)
self.reach_target[:, 0] = self.walk_target[:, 0]
self.reach_target[:, 1] = torch.empty(num_envs, device=device).uniform_(-0.3, 0.3)
self.reach_target[:, 2] = 0.8
def compute_reward(self, state, ee_pos):
rewards = {}
dist_to_walk = torch.norm(
state["base_pos"][:, :2] - self.walk_target[:, :2], dim=1
)
rewards["walk_progress"] = 0.5 * torch.exp(-dist_to_walk / 2.0)
close_enough = (dist_to_walk < 0.5).float()
reach_dist = torch.norm(ee_pos - self.reach_target, dim=1)
rewards["reach"] = 0.8 * close_enough * torch.exp(-reach_dist / 0.1)
far_from_target = (dist_to_walk > 1.0).float()
arm_activity = torch.norm(state["arm_joint_velocities"], dim=1)
rewards["phase_coupling"] = -0.1 * far_from_target * arm_activity
total = sum(rewards.values())
return total, rewards
Co-Training Pipeline
class CoTrainingPipeline:
"""3-phase co-training for loco-manipulation."""
def train(self, env, total_iterations=20000):
# Phase 1: Load pre-trained locomotion
print("Phase 1: Loading pre-trained locomotion policy...")
loco_policy = self.load_pretrained("h1_walking_policy.pt")
controller = LocoManipulationController(loco_policy, self.arm_policy)
# Phase 2: Train arm with frozen locomotion
print("Phase 2: Training arm policy (frozen legs)...")
for iteration in range(5000):
obs = env.get_observations()
actions = controller.forward(obs)
obs, reward, done, info = env.step(actions)
arm_loss = self.compute_arm_loss(reward, controller.arm_policy)
arm_loss.backward()
self.arm_optimizer.step()
if iteration % 500 == 0:
print(f" Iter {iteration}: arm_reach_dist="
f"{info['reach_distance']:.3f}")
# Phase 3: Fine-tune both
print("Phase 3: Co-training legs + arms...")
controller.unfreeze_locomotion()
for iteration in range(15000):
obs = env.get_observations()
actions = controller.forward(obs)
obs, reward, done, info = env.step(actions)
total_loss = self.compute_total_loss(reward, controller)
total_loss.backward()
self.joint_optimizer.step()
if iteration % 500 == 0:
print(f" Iter {iteration}: walk_dist={info['walk_distance']:.2f}, "
f"reach_dist={info['reach_distance']:.3f}, "
f"fall_rate={info['fall_rate']:.2f}")
print("Co-training complete!")
Evaluation: Locomotion Quality with Active Arms
def evaluate_loco_with_arm_tasks(env, controller, num_episodes=50):
"""Compare locomotion quality: arms idle vs arms active."""
conditions = {
"arms_idle": {"arm_task": False},
"arms_reaching": {"arm_task": True, "payload": 0},
"arms_carrying_2kg": {"arm_task": True, "payload": 2.0},
"arms_carrying_5kg": {"arm_task": True, "payload": 5.0},
}
results = {}
for condition_name, cfg in conditions.items():
env.configure(cfg)
vel_errors = []
fall_count = 0
for ep in range(num_episodes):
obs = env.reset()
for step in range(500):
action = controller.forward(obs)
obs, _, done, info = env.step(action)
vel_errors.append(info["vel_tracking_error"])
if done and info["fell"]:
fall_count += 1
break
results[condition_name] = {
"vel_error": np.mean(vel_errors),
"fall_rate": fall_count / num_episodes,
}
print(f"{'Condition':<25} {'Vel Error':>10} {'Fall Rate':>10}")
print("-" * 47)
for name, r in results.items():
print(f"{name:<25} {r['vel_error']:>9.3f}m/s {r['fall_rate']*100:>8.1f}%")
Typical results:
| Condition | Vel Error | Fall Rate |
|---|---|---|
| Arms idle | 0.08 m/s | 2% |
| Arms reaching | 0.12 m/s | 5% |
| Arms carrying 2kg | 0.15 m/s | 8% |
| Arms carrying 5kg | 0.22 m/s | 15% |
For more on RL for humanoids, see RL for Humanoid Robots. For detailed manipulation, see the next post Loco-Manipulation.
Summary
The H1-2 brings new challenges to locomotion:
- +5kg arm/hand mass changes dynamics, requiring retuned PD gains and stronger upright rewards
- Arm state in observation helps the locomotion policy anticipate disturbances
- Hierarchical control allows reusing the locomotion policy while training arms separately
- 3-phase co-training: frozen legs, train arm, fine-tune both
- Locomotion quality degrades when arms are active, especially with payload > 2kg
Next post — Loco-Manipulation — dives deep into walking while carrying and manipulating objects.
References
- HumanPlus: Humanoid Shadowing and Imitation from Humans — Fu et al., 2024
- Humanoid Loco-Manipulation with Learned Whole-Body Control — Chen et al., CoRL 2024
- Expressive Whole-Body Control for Humanoid Robots — Cheng et al., RSS 2024
- Unitree H1-2 Technical Report — Unitree Robotics