humanoidunitree-h1runningdynamic-locomotionrl

Unitree H1: Running, Turning & Dynamic Motions

Extend H1 policy to running at 2+ m/s, sharp turning, lateral walking, and a multi-gait command-conditioned policy.

Nguyễn Anh Tuấn28 tháng 3, 20268 phút đọc
Unitree H1: Running, Turning & Dynamic Motions

Walking is the foundation, but a truly useful humanoid needs much more. In this post, we push the Unitree H1 beyond walking to achieve running (2+ m/s), sharp turning, lateral walking, backward walking, and a command-conditioned policy that flexibly switches between gaits.

Building on the H1 basic training post, this post focuses on dynamic motions — where physics and control become truly interesting.

From Walking to Running: What Changes?

Physics of Running

Walking and running differ fundamentally in the flight phase:

Characteristic Walking Running
Flight phase None (at least 1 foot on ground) Yes (both feet leave ground)
Ground contact ~60% time per foot ~30-40% time
Peak force ~1.2x body weight ~2.5-3x body weight
Speed 0-1.5 m/s 1.5-3.3+ m/s
Froude number < 1 > 1
Energy mode Inverted pendulum Spring-mass
import torch
import numpy as np

class RunningRewardExtension:
    """
    Extended reward function for running gaits.
    """

    def compute_running_rewards(self, state, command):
        rewards = {}

        # 1. Extended velocity tracking range
        vel_error = torch.sum(
            torch.square(command[:, :2] - state["base_lin_vel"][:, :2]),
            dim=1
        )
        sigma = torch.where(
            torch.abs(command[:, 0]) > 1.5,
            torch.tensor(0.5),   # Running: larger tolerance
            torch.tensor(0.25),  # Walking: tight tolerance
        )
        rewards["vel_tracking"] = torch.exp(-vel_error / sigma)

        # 2. Flight phase reward
        both_feet_air = (
            (state["foot_contact"][:, 0] < 0.5) &
            (state["foot_contact"][:, 1] < 0.5)
        ).float()
        running_command = (torch.abs(command[:, 0]) > 2.0).float()
        rewards["flight_phase"] = 0.3 * running_command * both_feet_air

        # 3. Ground reaction force symmetry
        left_grf = state["contact_forces"][:, 0, 2]
        right_grf = state["contact_forces"][:, 1, 2]
        grf_symmetry = 1.0 - torch.abs(
            left_grf.max() - right_grf.max()
        ) / (left_grf.max() + right_grf.max() + 1e-6)
        rewards["grf_symmetry"] = 0.1 * grf_symmetry

        # 4. Knee bend during stance (energy storage)
        stance_mask = state["foot_contact"] > 0.5
        knee_bend = torch.abs(state["joint_pos"][:, [3, 8]])
        running_knee_reward = stance_mask * torch.clamp(
            knee_bend - 0.2, min=0.0
        )
        rewards["knee_bend"] = 0.1 * torch.sum(running_knee_reward, dim=1)

        return rewards

Dynamic running motion

Command-Conditioned Multi-Gait Policy

Instead of training multiple separate policies, we train a single policy that can walk, run, turn, and move laterally — depending on the velocity command.

Extended Command Space

class ExtendedVelocityCommand:
    """
    Extended velocity command for multi-gait policy.
    """

    def __init__(self):
        self.ranges = {
            "lin_vel_x": (-0.5, 3.5),    # Backward to Running
            "lin_vel_y": (-0.8, 0.8),     # Lateral walking
            "ang_vel_z": (-1.5, 1.5),     # Sharp turning
        }
        self.curriculum_factor = 0.0

    def sample_command(self, num_envs, device="cuda"):
        """Sample random velocity commands."""
        factor = self.curriculum_factor

        commands = torch.zeros(num_envs, 3, device=device)

        commands[:, 0] = torch.empty(num_envs, device=device).uniform_(
            self.ranges["lin_vel_x"][0] * factor,
            self.ranges["lin_vel_x"][1] * factor + (1 - factor) * 1.0,
        )

        commands[:, 1] = torch.empty(num_envs, device=device).uniform_(
            self.ranges["lin_vel_y"][0] * factor,
            self.ranges["lin_vel_y"][1] * factor,
        )

        commands[:, 2] = torch.empty(num_envs, device=device).uniform_(
            self.ranges["ang_vel_z"][0] * factor,
            self.ranges["ang_vel_z"][1] * factor,
        )

        # 20% chance of zero command (standing)
        zero_mask = torch.rand(num_envs, device=device) < 0.2
        commands[zero_mask] = 0.0

        return commands

    def update_curriculum(self, iteration, total_iterations=10000):
        """Gradually expand command range."""
        self.curriculum_factor = min(iteration / (total_iterations * 0.5), 1.0)

Multi-Gait Reward Function

class MultiGaitReward:
    """
    Speed-adaptive reward function.
    Low speed -> walking rewards. High speed -> running rewards.
    """

    def compute(self, state, action, prev_action, command):
        rewards = {}
        cmd_speed = torch.abs(command[:, 0])

        # === Velocity tracking (universal) ===
        vel_error = torch.sum(
            torch.square(command[:, :2] - state["base_lin_vel"][:, :2]),
            dim=1
        )
        rewards["vel_tracking"] = 1.5 * torch.exp(-vel_error / 0.25)

        # === Yaw tracking ===
        yaw_error = torch.square(command[:, 2] - state["base_ang_vel"][:, 2])
        rewards["yaw_tracking"] = 0.8 * torch.exp(-yaw_error / 0.25)

        # === Speed-adaptive rewards ===
        left_c = state["foot_contact"][:, 0]
        right_c = state["foot_contact"][:, 1]
        both_air = (1 - left_c) * (1 - right_c)

        # Walking (< 1.5 m/s): penalize both feet in air
        walk_mask = (cmd_speed < 1.5).float()
        rewards["walk_contact"] = -0.5 * walk_mask * both_air

        # Running (> 2.0 m/s): reward flight phase
        run_mask = (cmd_speed > 2.0).float()
        rewards["run_flight"] = 0.3 * run_mask * both_air

        # === Adaptive foot clearance ===
        target_clearance = torch.where(
            cmd_speed > 2.0,
            torch.tensor(0.12),
            torch.tensor(0.08),
        )
        swing = state["foot_contact"] < 0.5
        clearance = torch.where(
            swing,
            torch.clamp(state["foot_height"] - target_clearance.unsqueeze(1), min=0.0),
            torch.zeros_like(state["foot_height"]),
        )
        rewards["clearance"] = 0.3 * torch.sum(clearance, dim=1)

        # === Adaptive feet air time ===
        target_air = torch.where(
            cmd_speed > 2.0,
            torch.tensor(0.25),
            torch.tensor(0.35),
        )
        air_error = torch.abs(state["feet_air_time"] - target_air.unsqueeze(1))
        rewards["air_time"] = 0.2 * torch.sum(
            torch.exp(-air_error / 0.1), dim=1
        )

        # === Turning rewards ===
        cmd_yaw = torch.abs(command[:, 2])
        turning_mask = (cmd_yaw > 0.5).float()
        vel_x_error_turning = torch.abs(
            state["base_lin_vel"][:, 0] - command[:, 0] * 0.7
        )
        rewards["turning_vel"] = 0.3 * turning_mask * torch.exp(
            -vel_x_error_turning
        )

        # === Regularization ===
        rewards["action_rate"] = -0.01 * torch.sum(
            torch.square(action - prev_action), dim=1
        )
        rewards["torque"] = -3e-5 * torch.sum(
            torch.square(state["torques"]), dim=1
        )
        rewards["upright"] = -1.5 * torch.sum(
            torch.square(state["projected_gravity"][:, :2]), dim=1
        )
        rewards["termination"] = -200.0 * state["terminated"].float()

        total = sum(rewards.values())
        return total, rewards

Sharp Turning

Turning is challenging for humanoids because it requires coordinated hip yaw and lateral weight shift simultaneously:

class TurningAnalysis:
    """Analyze turning performance."""

    def evaluate_turning(self, env, policy, yaw_rates=[0.5, 1.0, 1.5]):
        """Evaluate turning at multiple yaw rates."""

        results = {}
        for target_yaw in yaw_rates:
            command = torch.tensor([[0.5, 0.0, target_yaw]])

            actual_yaws = []
            turning_radii = []
            obs = env.reset()

            for step in range(500):
                action = policy(obs)
                obs, _, done, info = env.step(action)

                actual_yaw = info["base_ang_vel"][:, 2].mean().item()
                actual_yaws.append(actual_yaw)

                linear_vel = info["base_lin_vel"][:, 0].mean().item()
                if abs(actual_yaw) > 0.01:
                    radius = abs(linear_vel / actual_yaw)
                    turning_radii.append(radius)

            results[target_yaw] = {
                "tracking_error": abs(
                    np.mean(actual_yaws[-100:]) - target_yaw
                ),
                "avg_turning_radius": np.mean(turning_radii[-100:])
                    if turning_radii else float('inf'),
                "stability": np.std(actual_yaws[-100:]),
            }

        print(f"{'Target (rad/s)':<16} {'Error':>8} {'Radius (m)':>11} {'Stability':>10}")
        for yaw, r in results.items():
            print(f"{yaw:<16.1f} {r['tracking_error']:>8.3f} "
                  f"{r['avg_turning_radius']:>10.2f}m {r['stability']:>10.4f}")

        return results

Adversarial Motion Prior (AMP)

To achieve natural-looking running gaits, we can use AMP — using motion capture data as a prior:

class AMPReward:
    """
    Adversarial Motion Prior for natural-looking gaits.
    Discriminator distinguishes policy motion vs reference motion.
    """

    def __init__(self, reference_motions, obs_dim):
        import torch.nn as nn

        self.discriminator = nn.Sequential(
            nn.Linear(obs_dim * 2, 1024),
            nn.ReLU(),
            nn.Linear(1024, 512),
            nn.ReLU(),
            nn.Linear(512, 1),
        )

        self.reference_motions = reference_motions
        self.optimizer = torch.optim.Adam(
            self.discriminator.parameters(), lr=1e-5
        )

    def compute_style_reward(self, current_obs, next_obs):
        """
        Compute AMP style reward.
        High reward = policy motion looks like reference.
        """
        transition = torch.cat([current_obs, next_obs], dim=1)

        with torch.no_grad():
            d_output = self.discriminator(transition)

        style_reward = -torch.log(
            1 - torch.sigmoid(d_output) + 1e-6
        ).squeeze()

        return style_reward

    def update_discriminator(self, policy_transitions, reference_transitions):
        """Train discriminator: policy = fake, reference = real."""
        real_output = self.discriminator(reference_transitions)
        fake_output = self.discriminator(policy_transitions.detach())

        real_loss = torch.mean(torch.square(real_output - 1))
        fake_loss = torch.mean(torch.square(fake_output))
        loss = 0.5 * (real_loss + fake_loss)

        alpha = torch.rand(reference_transitions.shape[0], 1,
                          device=reference_transitions.device)
        interp = alpha * reference_transitions + (1 - alpha) * policy_transitions
        interp.requires_grad_(True)
        interp_output = self.discriminator(interp)
        grad = torch.autograd.grad(
            interp_output, interp,
            grad_outputs=torch.ones_like(interp_output),
            create_graph=True
        )[0]
        gp = torch.mean(torch.square(grad.norm(dim=1) - 1))
        loss += 10.0 * gp

        self.optimizer.zero_grad()
        loss.backward()
        self.optimizer.step()

        return loss.item()

Robot dynamic motion

Emergent Behaviors

With a multi-gait policy, the robot often develops emergent behaviors — not directly designed but naturally appearing:

Behavior Trigger condition Explanation
Arm swing Running > 2 m/s Policy uses arm momentum for balance
Head bob Walking 0.5-1.0 m/s Natural vertical oscillation
Foot rotation Sharp turning Pivot foot rotates to reduce friction
Stance widening Lateral walking Wider stance for stability
Deep knee bend Running acceleration Energy storage during stance phase

Full Training Config

# Multi-gait training — ~3h on RTX 4090
python source/standalone/workflows/rsl_rl/train.py \
    --task=Isaac-Velocity-Rough-H1-MultiGait-v0 \
    --num_envs=4096 \
    --max_iterations=15000 \
    --headless \
    --logger wandb \
    --wandb_project h1-multigait

# Evaluate multi-gait
python source/standalone/workflows/rsl_rl/play.py \
    --task=Isaac-Velocity-Rough-H1-MultiGait-v0 \
    --num_envs=4 \
    --checkpoint=logs/h1_multigait/model_15000.pt

For more on humanoid control methods, see Humanoid Control Methods. For parkour with legged robots, see Parkour Learning.

Summary

In this post, we extended the H1 policy to dynamic motions:

  1. Running gaits with flight phase, GRF management, and elastic knee energy
  2. Command-conditioned policy for walk/run/turn/lateral in a single policy
  3. Sharp turning with coordinated hip yaw and lateral weight shift
  4. AMP for natural-looking gaits using motion capture references
  5. Emergent behaviors like arm swing and stance widening

Next post — Unitree H1-2: Enhanced Locomotion — explores the H1-2 with new hardware and loco-manipulation basics.

References

  1. AMP: Adversarial Motion Priors for Stylized Physics-Based Character Animation — Peng et al., SIGGRAPH 2021
  2. Expressive Whole-Body Control for Humanoid Robots — Cheng et al., RSS 2024
  3. Walk These Ways: Tuning Robot Control for Generalization — Margolis & Agrawal, CoRL 2023
  4. Learning Humanoid Locomotion with Transformers — Radosavovic et al., 2024
NT

Nguyễn Anh Tuấn

Robotics & AI Engineer. Building VnRobo — sharing knowledge about robot learning, VLA models, and automation.

Bài viết liên quan

NEWDeep Dive
WholebodyVLA Open-Source: Hướng Dẫn Kiến Trúc & Code
vlahumanoidloco-manipulationiclrrlopen-sourceisaac-lab

WholebodyVLA Open-Source: Hướng Dẫn Kiến Trúc & Code

Deep-dive vào codebase WholebodyVLA — kiến trúc latent action, LMO RL policy, và cách xây dựng pipeline whole-body loco-manipulation cho humanoid.

12/4/202619 phút đọc
Tutorial
Sim-to-Real cho Humanoid: Deployment Best Practices
sim2realhumanoiddeploymentrlunitreePhần 10

Sim-to-Real cho Humanoid: Deployment Best Practices

Pipeline hoàn chỉnh deploy RL locomotion policy lên robot humanoid thật — domain randomization, system identification, safety, và Unitree SDK.

9/4/202611 phút đọc
Deep Dive
Multi-Step Manipulation: Curriculum Learning cho Long-Horizon
rlcurriculum-learninglong-horizonmanipulationPhần 8

Multi-Step Manipulation: Curriculum Learning cho Long-Horizon

Giải quyết manipulation dài hơi bằng RL — curriculum learning, hierarchical RL, skill chaining, và benchmark IKEA furniture assembly.

7/4/202610 phút đọc