Unitree H1-2 là evolution đáng kể từ H1 — không chỉ cải tiến locomotion mà còn thêm dexterous hands biến nó thành platform loco-manipulation hoàn chỉnh. Trong bài này, chúng ta sẽ phân tích hardware changes, adapt locomotion policy, và bắt đầu co-training locomotion với arm tasks.
Tiếp nối từ H1 dynamic motions, bài này mở rộng sang whole-body coordination.
H1-2 Hardware: Gì mới so với H1?
| Thông số | H1 | H1-2 | Cải tiến |
|---|---|---|---|
| Chiều cao | 1.80 m | 1.80 m | Giữ nguyên |
| Cân nặng | 47 kg | 52 kg | +5kg (tay + hands) |
| Tổng DOF | 19 | 35+ | +16 DOF (hands) |
| Hands | Gripper đơn giản | Dexterous 5-finger | 6 DOF mỗi tay |
| Wrist DOF | 0 | 2 (roll + pitch) | Tăng dexterity |
| Arm actuators | Standard | Upgraded | Torque + bandwidth |
| Leg actuators | Standard | Improved | +15% torque |
| Sensors | IMU, encoders | + force/torque sensors | Ở wrist |
| Payload | 5 kg | 8 kg | Cầm vật nặng hơn |
Tại sao hardware changes ảnh hưởng locomotion?
1. Thêm 5kg khối lượng cánh tay/bàn tay:
- Trọng tâm dịch lên trên → khó balance hơn
- Inertia tensor thay đổi → cần retune PD gains
- Arm motion tạo disturbance force lên locomotion
2. Arm dynamics coupling:
- Khi tay cầm vật nặng, trọng tâm dịch chuyển
- Arm swing pattern thay đổi → ảnh hưởng gait
- Policy cần anticipate arm effects
3. Improved leg actuators:
- +15% torque cho phép locomotion aggressively hơn
- Bandwidth cao hơn → responsive hơn
- Có thể exploit trong high-speed gaits
"""
H1-2 robot configuration.
File: h1_2_env_cfg.py
"""
import omni.isaac.lab.sim as sim_utils
from omni.isaac.lab.assets import ArticulationCfg
class H1_2RobotCfg:
"""Unitree H1-2 với dexterous hands."""
robot = ArticulationCfg(
prim_path="/World/envs/env_.*/Robot",
spawn=sim_utils.UsdFileCfg(
usd_path="datasets/robots/unitree/h1_2/h1_2.usd",
activate_contact_sensors=True,
),
init_state=ArticulationCfg.InitialStateCfg(
pos=(0.0, 0.0, 1.05),
joint_pos={
# Legs (same as H1)
"left_hip_yaw": 0.0,
"left_hip_roll": 0.0,
"left_hip_pitch": -0.2,
"left_knee": 0.4,
"left_ankle": -0.2,
"right_hip_yaw": 0.0,
"right_hip_roll": 0.0,
"right_hip_pitch": -0.2,
"right_knee": 0.4,
"right_ankle": -0.2,
# Torso
"torso": 0.0,
# Arms (upgraded)
"left_shoulder_pitch": 0.3,
"left_shoulder_roll": 0.2,
"left_shoulder_yaw": 0.0,
"left_elbow": 0.5,
"left_wrist_roll": 0.0, # MỚI
"left_wrist_pitch": 0.0, # MỚI
# Right arm
"right_shoulder_pitch": 0.3,
"right_shoulder_roll": -0.2,
"right_shoulder_yaw": 0.0,
"right_elbow": 0.5,
"right_wrist_roll": 0.0,
"right_wrist_pitch": 0.0,
# Hands (6 DOF mỗi bên)
"left_thumb.*": 0.0,
"left_index.*": 0.0,
"left_middle.*": 0.0,
"right_thumb.*": 0.0,
"right_index.*": 0.0,
"right_middle.*": 0.0,
},
),
actuators={
"legs": sim_utils.DCMotorCfg(
joint_names_expr=[".*hip.*", ".*knee.*", ".*ankle.*"],
stiffness={
".*hip_pitch.*": 230.0, # +15% so với H1
".*hip_roll.*": 210.0,
".*hip_yaw.*": 170.0,
".*knee.*": 290.0,
".*ankle.*": 46.0,
},
damping={
".*hip.*": 7.0,
".*knee.*": 12.0,
".*ankle.*": 3.0,
},
effort_limit={
".*hip.*": 138.0, # +15% so với H1
".*knee.*": 230.0,
".*ankle.*": 46.0,
},
),
"arms": sim_utils.DCMotorCfg(
joint_names_expr=[".*shoulder.*", ".*elbow.*", ".*wrist.*"],
stiffness=120.0,
damping=6.0,
effort_limit=80.0,
),
"hands": sim_utils.DCMotorCfg(
joint_names_expr=[".*thumb.*", ".*index.*", ".*middle.*"],
stiffness=5.0,
damping=0.5,
effort_limit=5.0,
),
},
)
Adapt Locomotion Policy cho H1-2
Observation space mở rộng
import torch
class H1_2ObservationSpace:
"""
Observation space cho H1-2, bao gồm arm state.
"""
def compute_observation(self, robot_state, command):
obs_components = []
# === Locomotion observations (giống H1) ===
obs_components.append(robot_state["base_ang_vel"]) # (3,)
obs_components.append(robot_state["projected_gravity"]) # (3,)
obs_components.append(command["velocity"]) # (3,)
# Leg joints only
leg_joints = robot_state["leg_joint_positions"] # (10,)
obs_components.append(leg_joints - robot_state["leg_default_pos"])
obs_components.append(robot_state["leg_joint_velocities"]) # (10,)
# Previous leg actions
obs_components.append(robot_state["prev_leg_actions"]) # (10,)
# === MỚI: Arm state (ảnh hưởng balance) ===
# Arm configuration affect CoM → locomotion policy cần biết
arm_joints = robot_state["arm_joint_positions"] # (8,)
obs_components.append(arm_joints)
# Arm velocities (motion tạo disturbance)
arm_vel = robot_state["arm_joint_velocities"] # (8,)
obs_components.append(arm_vel * 0.05) # Scale down
# === MỚI: Payload detection ===
# Wrist force/torque sensor detect cầm vật
wrist_forces = robot_state["wrist_forces"] # (6,) 3 per wrist
obs_components.append(wrist_forces * 0.1)
obs = torch.cat(obs_components, dim=-1)
# Total: 3+3+3+10+10+10+8+8+6 = 61 dims
return obs
Reward adjustments cho thêm trọng lượng
def compute_h1_2_loco_rewards(state, action, prev_action, command):
"""
Locomotion rewards cho H1-2.
Khác H1: compensate cho arm mass, wider stability margin.
"""
rewards = {}
# Velocity tracking (same)
vel_error = torch.sum(
torch.square(command[:, :2] - state["base_lin_vel"][:, :2]),
dim=1
)
rewards["vel_tracking"] = 1.5 * torch.exp(-vel_error / 0.25)
# Base height (KHÁC: target hơi thấp hơn vì nặng hơn)
height_error = torch.square(state["base_height"] - 0.96)
rewards["height"] = -0.8 * height_error
# Upright (KHÁC: weight lớn hơn vì CoM cao + nặng hơn)
orientation_error = torch.sum(
torch.square(state["projected_gravity"][:, :2]), dim=1
)
rewards["upright"] = -2.0 * orientation_error # H1: -1.5
# MỚI: Arm disturbance compensation
# Penalize excessive arm acceleration during locomotion
arm_accel = state["arm_joint_accelerations"]
rewards["arm_disturbance"] = -0.005 * torch.sum(
torch.square(arm_accel), dim=1
)
# MỚI: CoM stability
# Thưởng khi CoM velocity smooth (không bị arm motion disturb)
com_vel_z = torch.abs(state["com_velocity"][:, 2])
rewards["com_stability"] = -0.2 * com_vel_z
# Standard rewards
rewards["action_rate"] = -0.01 * torch.sum(
torch.square(action - prev_action), dim=1
)
rewards["torque"] = -3e-5 * torch.sum(
torch.square(state["leg_torques"]), dim=1
)
rewards["termination"] = -200.0 * state["terminated"].float()
# Yaw tracking
yaw_error = torch.square(command[:, 2] - state["base_ang_vel"][:, 2])
rewards["yaw_tracking"] = 0.8 * torch.exp(-yaw_error / 0.25)
total = sum(rewards.values())
return total, rewards
Loco-Manipulation Basics
Hierarchical Control Architecture
class LocoManipulationController:
"""
Hierarchical controller: locomotion + arm control.
Kiến trúc:
- Lower level: Locomotion policy (đã trained)
- Upper level: Arm policy (train mới)
- Coordination: Shared observation + reward coupling
"""
def __init__(self, loco_policy, arm_policy):
self.loco_policy = loco_policy # Pre-trained, có thể freeze
self.arm_policy = arm_policy # Train mới
# Freeze locomotion policy ban đầu
for param in self.loco_policy.parameters():
param.requires_grad = False
def forward(self, obs):
"""
Tách observation và chạy 2 policies.
"""
# Tách observation
loco_obs = obs[:, :61] # Locomotion observations
arm_obs = obs[:, 61:] # Arm task observations
# Locomotion actions (legs)
leg_actions = self.loco_policy(loco_obs)
# Arm actions
arm_actions = self.arm_policy(
torch.cat([arm_obs, loco_obs[:, :6]], dim=1) # arm_obs + base state
)
# Combine
full_action = torch.cat([leg_actions, arm_actions], dim=1)
return full_action
def unfreeze_locomotion(self):
"""
Phase 2: Fine-tune locomotion cùng arm.
Dùng khi arm task ảnh hưởng nhiều đến balance.
"""
for param in self.loco_policy.parameters():
param.requires_grad = True
Simple Arm Task: Walk + Reach
class WalkAndReachTask:
"""
Task đơn giản: đi đến vị trí + đưa tay đến target.
"""
def __init__(self):
self.walk_target = None
self.reach_target = None
def reset(self, num_envs, device="cuda"):
"""Random walk + reach targets."""
# Walk target: 2-5m phía trước
self.walk_target = torch.zeros(num_envs, 3, device=device)
self.walk_target[:, 0] = torch.empty(num_envs, device=device).uniform_(2, 5)
# Reach target: table height (0.8m), random lateral
self.reach_target = torch.zeros(num_envs, 3, device=device)
self.reach_target[:, 0] = self.walk_target[:, 0] # Same x as walk
self.reach_target[:, 1] = torch.empty(num_envs, device=device).uniform_(-0.3, 0.3)
self.reach_target[:, 2] = 0.8 # Table height
def compute_reward(self, state, ee_pos):
"""
Multi-objective reward: walk + reach.
"""
rewards = {}
# 1. Walk toward target
dist_to_walk = torch.norm(
state["base_pos"][:, :2] - self.walk_target[:, :2], dim=1
)
rewards["walk_progress"] = 0.5 * torch.exp(-dist_to_walk / 2.0)
# 2. Reach target (only when close enough to walk)
close_enough = (dist_to_walk < 0.5).float()
reach_dist = torch.norm(ee_pos - self.reach_target, dim=1)
rewards["reach"] = 0.8 * close_enough * torch.exp(-reach_dist / 0.1)
# 3. Penalize reaching while walking (decouple phases)
far_from_target = (dist_to_walk > 1.0).float()
arm_activity = torch.norm(
state["arm_joint_velocities"], dim=1
)
rewards["phase_coupling"] = -0.1 * far_from_target * arm_activity
total = sum(rewards.values())
return total, rewards
Co-Training Pipeline
class CoTrainingPipeline:
"""
3-phase co-training cho loco-manipulation.
"""
def train(self, env, total_iterations=20000):
"""Full training pipeline."""
# Phase 1: Locomotion only (reuse H1 policy)
print("Phase 1: Loading pre-trained locomotion policy...")
loco_policy = self.load_pretrained("h1_walking_policy.pt")
controller = LocoManipulationController(loco_policy, self.arm_policy)
# Phase 2: Train arm with frozen locomotion (5000 iter)
print("Phase 2: Training arm policy (frozen legs)...")
for iteration in range(5000):
obs = env.get_observations()
actions = controller.forward(obs)
obs, reward, done, info = env.step(actions)
# Only update arm_policy
arm_loss = self.compute_arm_loss(reward, controller.arm_policy)
arm_loss.backward()
self.arm_optimizer.step()
if iteration % 500 == 0:
print(f" Iter {iteration}: arm_reach_dist="
f"{info['reach_distance']:.3f}")
# Phase 3: Fine-tune both (15000 iter)
print("Phase 3: Co-training legs + arms...")
controller.unfreeze_locomotion()
for iteration in range(15000):
obs = env.get_observations()
actions = controller.forward(obs)
obs, reward, done, info = env.step(actions)
# Update both policies
total_loss = self.compute_total_loss(reward, controller)
total_loss.backward()
self.joint_optimizer.step()
if iteration % 500 == 0:
print(f" Iter {iteration}: walk_dist={info['walk_distance']:.2f}, "
f"reach_dist={info['reach_distance']:.3f}, "
f"fall_rate={info['fall_rate']:.2f}")
print("Co-training complete!")
Evaluation: Locomotion Quality khi Arm Active
def evaluate_loco_with_arm_tasks(env, controller, num_episodes=50):
"""
So sánh locomotion quality: arms idle vs arms active.
"""
conditions = {
"arms_idle": {"arm_task": False},
"arms_reaching": {"arm_task": True, "payload": 0},
"arms_carrying_2kg": {"arm_task": True, "payload": 2.0},
"arms_carrying_5kg": {"arm_task": True, "payload": 5.0},
}
results = {}
for condition_name, cfg in conditions.items():
env.configure(cfg)
vel_errors = []
fall_count = 0
for ep in range(num_episodes):
obs = env.reset()
for step in range(500):
action = controller.forward(obs)
obs, _, done, info = env.step(action)
vel_errors.append(info["vel_tracking_error"])
if done and info["fell"]:
fall_count += 1
break
results[condition_name] = {
"vel_error": np.mean(vel_errors),
"fall_rate": fall_count / num_episodes,
}
print(f"{'Condition':<25} {'Vel Error':>10} {'Fall Rate':>10}")
print("-" * 47)
for name, r in results.items():
print(f"{name:<25} {r['vel_error']:>9.3f}m/s {r['fall_rate']*100:>8.1f}%")
Kết quả điển hình:
| Condition | Vel Error | Fall Rate |
|---|---|---|
| Arms idle | 0.08 m/s | 2% |
| Arms reaching | 0.12 m/s | 5% |
| Arms carrying 2kg | 0.15 m/s | 8% |
| Arms carrying 5kg | 0.22 m/s | 15% |
Để hiểu thêm về RL cho humanoid, xem RL cho Humanoid Robots. Về manipulation chi tiết, xem Loco-Manipulation trong bài tiếp theo.
Tổng kết
H1-2 mang đến thách thức mới cho locomotion:
- +5kg arm/hand mass thay đổi dynamics → retune PD gains, tăng upright reward
- Arm state trong observation giúp locomotion policy anticipate disturbances
- Hierarchical control cho phép reuse locomotion policy + train arm riêng
- 3-phase co-training: frozen legs → train arm → fine-tune both
- Locomotion quality giảm khi arm active, đặc biệt với payload > 2kg
Bài tiếp theo — Loco-Manipulation — sẽ đi sâu vào walking trong khi bê và thao tác vật thể.
Tài liệu tham khảo
- HumanPlus: Humanoid Shadowing and Imitation from Humans — Fu et al., 2024
- Humanoid Loco-Manipulation with Learned Whole-Body Control — Chen et al., CoRL 2024
- Expressive Whole-Body Control for Humanoid Robots — Cheng et al., RSS 2024
- Unitree H1-2 Technical Report — Unitree Robotics