Unitree H1-2: Enhanced Locomotion với New Hardware

Unitree H1-2 là evolution đáng kể từ H1 — không chỉ cải tiến locomotion mà còn thêm dexterous hands biến nó thành platform loco-manipulation hoàn chỉnh. Trong bài này, chúng ta sẽ phân tích hardware changes, adapt locomotion policy, và bắt đầu co-training locomotion với arm tasks.

Tiếp nối từ H1 dynamic motions, bài này mở rộng sang whole-body coordination.

H1-2 Hardware: Gì mới so với H1?

Thông số	H1	H1-2	Cải tiến
Chiều cao	1.80 m	1.80 m	Giữ nguyên
Cân nặng	47 kg	52 kg	+5kg (tay + hands)
Tổng DOF	19	35+	+16 DOF (hands)
Hands	Gripper đơn giản	Dexterous 5-finger	6 DOF mỗi tay
Wrist DOF	0	2 (roll + pitch)	Tăng dexterity
Arm actuators	Standard	Upgraded	Torque + bandwidth
Leg actuators	Standard	Improved	+15% torque
Sensors	IMU, encoders	+ force/torque sensors	Ở wrist
Payload	5 kg	8 kg	Cầm vật nặng hơn

Tại sao hardware changes ảnh hưởng locomotion?

1. Thêm 5kg khối lượng cánh tay/bàn tay:

Trọng tâm dịch lên trên → khó balance hơn
Inertia tensor thay đổi → cần retune PD gains
Arm motion tạo disturbance force lên locomotion

2. Arm dynamics coupling:

Khi tay cầm vật nặng, trọng tâm dịch chuyển
Arm swing pattern thay đổi → ảnh hưởng gait
Policy cần anticipate arm effects

3. Improved leg actuators:

+15% torque cho phép locomotion aggressively hơn
Bandwidth cao hơn → responsive hơn
Có thể exploit trong high-speed gaits

"""
H1-2 robot configuration.
File: h1_2_env_cfg.py
"""
import omni.isaac.lab.sim as sim_utils
from omni.isaac.lab.assets import ArticulationCfg

class H1_2RobotCfg:
    """Unitree H1-2 với dexterous hands."""

    robot = ArticulationCfg(
        prim_path="/World/envs/env_.*/Robot",
        spawn=sim_utils.UsdFileCfg(
            usd_path="datasets/robots/unitree/h1_2/h1_2.usd",
            activate_contact_sensors=True,
        ),
        init_state=ArticulationCfg.InitialStateCfg(
            pos=(0.0, 0.0, 1.05),
            joint_pos={
                # Legs (same as H1)
                "left_hip_yaw": 0.0,
                "left_hip_roll": 0.0,
                "left_hip_pitch": -0.2,
                "left_knee": 0.4,
                "left_ankle": -0.2,
                "right_hip_yaw": 0.0,
                "right_hip_roll": 0.0,
                "right_hip_pitch": -0.2,
                "right_knee": 0.4,
                "right_ankle": -0.2,
                # Torso
                "torso": 0.0,
                # Arms (upgraded)
                "left_shoulder_pitch": 0.3,
                "left_shoulder_roll": 0.2,
                "left_shoulder_yaw": 0.0,
                "left_elbow": 0.5,
                "left_wrist_roll": 0.0,      # MỚI
                "left_wrist_pitch": 0.0,     # MỚI
                # Right arm
                "right_shoulder_pitch": 0.3,
                "right_shoulder_roll": -0.2,
                "right_shoulder_yaw": 0.0,
                "right_elbow": 0.5,
                "right_wrist_roll": 0.0,
                "right_wrist_pitch": 0.0,
                # Hands (6 DOF mỗi bên)
                "left_thumb.*": 0.0,
                "left_index.*": 0.0,
                "left_middle.*": 0.0,
                "right_thumb.*": 0.0,
                "right_index.*": 0.0,
                "right_middle.*": 0.0,
            },
        ),
        actuators={
            "legs": sim_utils.DCMotorCfg(
                joint_names_expr=[".*hip.*", ".*knee.*", ".*ankle.*"],
                stiffness={
                    ".*hip_pitch.*": 230.0,    # +15% so với H1
                    ".*hip_roll.*": 210.0,
                    ".*hip_yaw.*": 170.0,
                    ".*knee.*": 290.0,
                    ".*ankle.*": 46.0,
                },
                damping={
                    ".*hip.*": 7.0,
                    ".*knee.*": 12.0,
                    ".*ankle.*": 3.0,
                },
                effort_limit={
                    ".*hip.*": 138.0,          # +15% so với H1
                    ".*knee.*": 230.0,
                    ".*ankle.*": 46.0,
                },
            ),
            "arms": sim_utils.DCMotorCfg(
                joint_names_expr=[".*shoulder.*", ".*elbow.*", ".*wrist.*"],
                stiffness=120.0,
                damping=6.0,
                effort_limit=80.0,
            ),
            "hands": sim_utils.DCMotorCfg(
                joint_names_expr=[".*thumb.*", ".*index.*", ".*middle.*"],
                stiffness=5.0,
                damping=0.5,
                effort_limit=5.0,
            ),
        },
    )

Adapt Locomotion Policy cho H1-2

Observation space mở rộng

import torch

class H1_2ObservationSpace:
    """
    Observation space cho H1-2, bao gồm arm state.
    """

    def compute_observation(self, robot_state, command):
        obs_components = []

        # === Locomotion observations (giống H1) ===
        obs_components.append(robot_state["base_ang_vel"])        # (3,)
        obs_components.append(robot_state["projected_gravity"])   # (3,)
        obs_components.append(command["velocity"])                # (3,)

        # Leg joints only
        leg_joints = robot_state["leg_joint_positions"]           # (10,)
        obs_components.append(leg_joints - robot_state["leg_default_pos"])
        obs_components.append(robot_state["leg_joint_velocities"])  # (10,)

        # Previous leg actions
        obs_components.append(robot_state["prev_leg_actions"])    # (10,)

        # === MỚI: Arm state (ảnh hưởng balance) ===
        # Arm configuration affect CoM → locomotion policy cần biết
        arm_joints = robot_state["arm_joint_positions"]           # (8,)
        obs_components.append(arm_joints)

        # Arm velocities (motion tạo disturbance)
        arm_vel = robot_state["arm_joint_velocities"]             # (8,)
        obs_components.append(arm_vel * 0.05)  # Scale down

        # === MỚI: Payload detection ===
        # Wrist force/torque sensor detect cầm vật
        wrist_forces = robot_state["wrist_forces"]                # (6,) 3 per wrist
        obs_components.append(wrist_forces * 0.1)

        obs = torch.cat(obs_components, dim=-1)
        # Total: 3+3+3+10+10+10+8+8+6 = 61 dims
        return obs

Reward adjustments cho thêm trọng lượng

def compute_h1_2_loco_rewards(state, action, prev_action, command):
    """
    Locomotion rewards cho H1-2.
    Khác H1: compensate cho arm mass, wider stability margin.
    """
    rewards = {}

    # Velocity tracking (same)
    vel_error = torch.sum(
        torch.square(command[:, :2] - state["base_lin_vel"][:, :2]),
        dim=1
    )
    rewards["vel_tracking"] = 1.5 * torch.exp(-vel_error / 0.25)

    # Base height (KHÁC: target hơi thấp hơn vì nặng hơn)
    height_error = torch.square(state["base_height"] - 0.96)
    rewards["height"] = -0.8 * height_error

    # Upright (KHÁC: weight lớn hơn vì CoM cao + nặng hơn)
    orientation_error = torch.sum(
        torch.square(state["projected_gravity"][:, :2]), dim=1
    )
    rewards["upright"] = -2.0 * orientation_error  # H1: -1.5

    # MỚI: Arm disturbance compensation
    # Penalize excessive arm acceleration during locomotion
    arm_accel = state["arm_joint_accelerations"]
    rewards["arm_disturbance"] = -0.005 * torch.sum(
        torch.square(arm_accel), dim=1
    )

    # MỚI: CoM stability
    # Thưởng khi CoM velocity smooth (không bị arm motion disturb)
    com_vel_z = torch.abs(state["com_velocity"][:, 2])
    rewards["com_stability"] = -0.2 * com_vel_z

    # Standard rewards
    rewards["action_rate"] = -0.01 * torch.sum(
        torch.square(action - prev_action), dim=1
    )
    rewards["torque"] = -3e-5 * torch.sum(
        torch.square(state["leg_torques"]), dim=1
    )
    rewards["termination"] = -200.0 * state["terminated"].float()

    # Yaw tracking
    yaw_error = torch.square(command[:, 2] - state["base_ang_vel"][:, 2])
    rewards["yaw_tracking"] = 0.8 * torch.exp(-yaw_error / 0.25)

    total = sum(rewards.values())
    return total, rewards

Loco-Manipulation Basics

Hierarchical Control Architecture

class LocoManipulationController:
    """
    Hierarchical controller: locomotion + arm control.

    Kiến trúc:
    - Lower level: Locomotion policy (đã trained)
    - Upper level: Arm policy (train mới)
    - Coordination: Shared observation + reward coupling
    """

    def __init__(self, loco_policy, arm_policy):
        self.loco_policy = loco_policy  # Pre-trained, có thể freeze
        self.arm_policy = arm_policy     # Train mới

        # Freeze locomotion policy ban đầu
        for param in self.loco_policy.parameters():
            param.requires_grad = False

    def forward(self, obs):
        """
        Tách observation và chạy 2 policies.
        """
        # Tách observation
        loco_obs = obs[:, :61]    # Locomotion observations
        arm_obs = obs[:, 61:]      # Arm task observations

        # Locomotion actions (legs)
        leg_actions = self.loco_policy(loco_obs)

        # Arm actions
        arm_actions = self.arm_policy(
            torch.cat([arm_obs, loco_obs[:, :6]], dim=1)  # arm_obs + base state
        )

        # Combine
        full_action = torch.cat([leg_actions, arm_actions], dim=1)
        return full_action

    def unfreeze_locomotion(self):
        """
        Phase 2: Fine-tune locomotion cùng arm.
        Dùng khi arm task ảnh hưởng nhiều đến balance.
        """
        for param in self.loco_policy.parameters():
            param.requires_grad = True

Simple Arm Task: Walk + Reach

class WalkAndReachTask:
    """
    Task đơn giản: đi đến vị trí + đưa tay đến target.
    """

    def __init__(self):
        self.walk_target = None
        self.reach_target = None

    def reset(self, num_envs, device="cuda"):
        """Random walk + reach targets."""
        # Walk target: 2-5m phía trước
        self.walk_target = torch.zeros(num_envs, 3, device=device)
        self.walk_target[:, 0] = torch.empty(num_envs, device=device).uniform_(2, 5)

        # Reach target: table height (0.8m), random lateral
        self.reach_target = torch.zeros(num_envs, 3, device=device)
        self.reach_target[:, 0] = self.walk_target[:, 0]  # Same x as walk
        self.reach_target[:, 1] = torch.empty(num_envs, device=device).uniform_(-0.3, 0.3)
        self.reach_target[:, 2] = 0.8  # Table height

    def compute_reward(self, state, ee_pos):
        """
        Multi-objective reward: walk + reach.
        """
        rewards = {}

        # 1. Walk toward target
        dist_to_walk = torch.norm(
            state["base_pos"][:, :2] - self.walk_target[:, :2], dim=1
        )
        rewards["walk_progress"] = 0.5 * torch.exp(-dist_to_walk / 2.0)

        # 2. Reach target (only when close enough to walk)
        close_enough = (dist_to_walk < 0.5).float()
        reach_dist = torch.norm(ee_pos - self.reach_target, dim=1)
        rewards["reach"] = 0.8 * close_enough * torch.exp(-reach_dist / 0.1)

        # 3. Penalize reaching while walking (decouple phases)
        far_from_target = (dist_to_walk > 1.0).float()
        arm_activity = torch.norm(
            state["arm_joint_velocities"], dim=1
        )
        rewards["phase_coupling"] = -0.1 * far_from_target * arm_activity

        total = sum(rewards.values())
        return total, rewards

Co-Training Pipeline

class CoTrainingPipeline:
    """
    3-phase co-training cho loco-manipulation.
    """

    def train(self, env, total_iterations=20000):
        """Full training pipeline."""

        # Phase 1: Locomotion only (reuse H1 policy)
        print("Phase 1: Loading pre-trained locomotion policy...")
        loco_policy = self.load_pretrained("h1_walking_policy.pt")
        controller = LocoManipulationController(loco_policy, self.arm_policy)

        # Phase 2: Train arm with frozen locomotion (5000 iter)
        print("Phase 2: Training arm policy (frozen legs)...")
        for iteration in range(5000):
            obs = env.get_observations()
            actions = controller.forward(obs)
            obs, reward, done, info = env.step(actions)

            # Only update arm_policy
            arm_loss = self.compute_arm_loss(reward, controller.arm_policy)
            arm_loss.backward()
            self.arm_optimizer.step()

            if iteration % 500 == 0:
                print(f"  Iter {iteration}: arm_reach_dist="
                      f"{info['reach_distance']:.3f}")

        # Phase 3: Fine-tune both (15000 iter)
        print("Phase 3: Co-training legs + arms...")
        controller.unfreeze_locomotion()

        for iteration in range(15000):
            obs = env.get_observations()
            actions = controller.forward(obs)
            obs, reward, done, info = env.step(actions)

            # Update both policies
            total_loss = self.compute_total_loss(reward, controller)
            total_loss.backward()
            self.joint_optimizer.step()

            if iteration % 500 == 0:
                print(f"  Iter {iteration}: walk_dist={info['walk_distance']:.2f}, "
                      f"reach_dist={info['reach_distance']:.3f}, "
                      f"fall_rate={info['fall_rate']:.2f}")

        print("Co-training complete!")

Evaluation: Locomotion Quality khi Arm Active

def evaluate_loco_with_arm_tasks(env, controller, num_episodes=50):
    """
    So sánh locomotion quality: arms idle vs arms active.
    """
    conditions = {
        "arms_idle": {"arm_task": False},
        "arms_reaching": {"arm_task": True, "payload": 0},
        "arms_carrying_2kg": {"arm_task": True, "payload": 2.0},
        "arms_carrying_5kg": {"arm_task": True, "payload": 5.0},
    }

    results = {}
    for condition_name, cfg in conditions.items():
        env.configure(cfg)
        vel_errors = []
        fall_count = 0

        for ep in range(num_episodes):
            obs = env.reset()
            for step in range(500):
                action = controller.forward(obs)
                obs, _, done, info = env.step(action)
                vel_errors.append(info["vel_tracking_error"])
                if done and info["fell"]:
                    fall_count += 1
                    break

        results[condition_name] = {
            "vel_error": np.mean(vel_errors),
            "fall_rate": fall_count / num_episodes,
        }

    print(f"{'Condition':<25} {'Vel Error':>10} {'Fall Rate':>10}")
    print("-" * 47)
    for name, r in results.items():
        print(f"{name:<25} {r['vel_error']:>9.3f}m/s {r['fall_rate']*100:>8.1f}%")

Kết quả điển hình:

Condition	Vel Error	Fall Rate
Arms idle	0.08 m/s	2%
Arms reaching	0.12 m/s	5%
Arms carrying 2kg	0.15 m/s	8%
Arms carrying 5kg	0.22 m/s	15%

Để hiểu thêm về RL cho humanoid, xem RL cho Humanoid Robots. Về manipulation chi tiết, xem Loco-Manipulation trong bài tiếp theo.

Tổng kết

H1-2 mang đến thách thức mới cho locomotion:

+5kg arm/hand mass thay đổi dynamics → retune PD gains, tăng upright reward
Arm state trong observation giúp locomotion policy anticipate disturbances
Hierarchical control cho phép reuse locomotion policy + train arm riêng
3-phase co-training: frozen legs → train arm → fine-tune both
Locomotion quality giảm khi arm active, đặc biệt với payload > 2kg

Bài tiếp theo — Loco-Manipulation — sẽ đi sâu vào walking trong khi bê và thao tác vật thể.

Tài liệu tham khảo

HumanPlus: Humanoid Shadowing and Imitation from Humans — Fu et al., 2024
Humanoid Loco-Manipulation with Learned Whole-Body Control — Chen et al., CoRL 2024
Expressive Whole-Body Control for Humanoid Robots — Cheng et al., RSS 2024
Unitree H1-2 Technical Report — Unitree Robotics