← Back to Blog
humanoidunitree-h1runningdynamic-locomotionrl

Unitree H1: Running, Turning & Dynamic Motions

Extend H1 policy to running at 2+ m/s, sharp turning, lateral walking, and a multi-gait command-conditioned policy.

Nguyễn Anh Tuấn28 tháng 3, 20268 min read
Unitree H1: Running, Turning & Dynamic Motions

Walking is the foundation, but a truly useful humanoid needs much more. In this post, we push the Unitree H1 beyond walking to achieve running (2+ m/s), sharp turning, lateral walking, backward walking, and a command-conditioned policy that flexibly switches between gaits.

Building on the H1 basic training post, this post focuses on dynamic motions — where physics and control become truly interesting.

From Walking to Running: What Changes?

Physics of Running

Walking and running differ fundamentally in the flight phase:

Characteristic Walking Running
Flight phase None (at least 1 foot on ground) Yes (both feet leave ground)
Ground contact ~60% time per foot ~30-40% time
Peak force ~1.2x body weight ~2.5-3x body weight
Speed 0-1.5 m/s 1.5-3.3+ m/s
Froude number < 1 > 1
Energy mode Inverted pendulum Spring-mass
import torch
import numpy as np

class RunningRewardExtension:
    """
    Extended reward function for running gaits.
    """

    def compute_running_rewards(self, state, command):
        rewards = {}

        # 1. Extended velocity tracking range
        vel_error = torch.sum(
            torch.square(command[:, :2] - state["base_lin_vel"][:, :2]),
            dim=1
        )
        sigma = torch.where(
            torch.abs(command[:, 0]) > 1.5,
            torch.tensor(0.5),   # Running: larger tolerance
            torch.tensor(0.25),  # Walking: tight tolerance
        )
        rewards["vel_tracking"] = torch.exp(-vel_error / sigma)

        # 2. Flight phase reward
        both_feet_air = (
            (state["foot_contact"][:, 0] < 0.5) &
            (state["foot_contact"][:, 1] < 0.5)
        ).float()
        running_command = (torch.abs(command[:, 0]) > 2.0).float()
        rewards["flight_phase"] = 0.3 * running_command * both_feet_air

        # 3. Ground reaction force symmetry
        left_grf = state["contact_forces"][:, 0, 2]
        right_grf = state["contact_forces"][:, 1, 2]
        grf_symmetry = 1.0 - torch.abs(
            left_grf.max() - right_grf.max()
        ) / (left_grf.max() + right_grf.max() + 1e-6)
        rewards["grf_symmetry"] = 0.1 * grf_symmetry

        # 4. Knee bend during stance (energy storage)
        stance_mask = state["foot_contact"] > 0.5
        knee_bend = torch.abs(state["joint_pos"][:, [3, 8]])
        running_knee_reward = stance_mask * torch.clamp(
            knee_bend - 0.2, min=0.0
        )
        rewards["knee_bend"] = 0.1 * torch.sum(running_knee_reward, dim=1)

        return rewards

Dynamic running motion

Command-Conditioned Multi-Gait Policy

Instead of training multiple separate policies, we train a single policy that can walk, run, turn, and move laterally — depending on the velocity command.

Extended Command Space

class ExtendedVelocityCommand:
    """
    Extended velocity command for multi-gait policy.
    """

    def __init__(self):
        self.ranges = {
            "lin_vel_x": (-0.5, 3.5),    # Backward to Running
            "lin_vel_y": (-0.8, 0.8),     # Lateral walking
            "ang_vel_z": (-1.5, 1.5),     # Sharp turning
        }
        self.curriculum_factor = 0.0

    def sample_command(self, num_envs, device="cuda"):
        """Sample random velocity commands."""
        factor = self.curriculum_factor

        commands = torch.zeros(num_envs, 3, device=device)

        commands[:, 0] = torch.empty(num_envs, device=device).uniform_(
            self.ranges["lin_vel_x"][0] * factor,
            self.ranges["lin_vel_x"][1] * factor + (1 - factor) * 1.0,
        )

        commands[:, 1] = torch.empty(num_envs, device=device).uniform_(
            self.ranges["lin_vel_y"][0] * factor,
            self.ranges["lin_vel_y"][1] * factor,
        )

        commands[:, 2] = torch.empty(num_envs, device=device).uniform_(
            self.ranges["ang_vel_z"][0] * factor,
            self.ranges["ang_vel_z"][1] * factor,
        )

        # 20% chance of zero command (standing)
        zero_mask = torch.rand(num_envs, device=device) < 0.2
        commands[zero_mask] = 0.0

        return commands

    def update_curriculum(self, iteration, total_iterations=10000):
        """Gradually expand command range."""
        self.curriculum_factor = min(iteration / (total_iterations * 0.5), 1.0)

Multi-Gait Reward Function

class MultiGaitReward:
    """
    Speed-adaptive reward function.
    Low speed -> walking rewards. High speed -> running rewards.
    """

    def compute(self, state, action, prev_action, command):
        rewards = {}
        cmd_speed = torch.abs(command[:, 0])

        # === Velocity tracking (universal) ===
        vel_error = torch.sum(
            torch.square(command[:, :2] - state["base_lin_vel"][:, :2]),
            dim=1
        )
        rewards["vel_tracking"] = 1.5 * torch.exp(-vel_error / 0.25)

        # === Yaw tracking ===
        yaw_error = torch.square(command[:, 2] - state["base_ang_vel"][:, 2])
        rewards["yaw_tracking"] = 0.8 * torch.exp(-yaw_error / 0.25)

        # === Speed-adaptive rewards ===
        left_c = state["foot_contact"][:, 0]
        right_c = state["foot_contact"][:, 1]
        both_air = (1 - left_c) * (1 - right_c)

        # Walking (< 1.5 m/s): penalize both feet in air
        walk_mask = (cmd_speed < 1.5).float()
        rewards["walk_contact"] = -0.5 * walk_mask * both_air

        # Running (> 2.0 m/s): reward flight phase
        run_mask = (cmd_speed > 2.0).float()
        rewards["run_flight"] = 0.3 * run_mask * both_air

        # === Adaptive foot clearance ===
        target_clearance = torch.where(
            cmd_speed > 2.0,
            torch.tensor(0.12),
            torch.tensor(0.08),
        )
        swing = state["foot_contact"] < 0.5
        clearance = torch.where(
            swing,
            torch.clamp(state["foot_height"] - target_clearance.unsqueeze(1), min=0.0),
            torch.zeros_like(state["foot_height"]),
        )
        rewards["clearance"] = 0.3 * torch.sum(clearance, dim=1)

        # === Adaptive feet air time ===
        target_air = torch.where(
            cmd_speed > 2.0,
            torch.tensor(0.25),
            torch.tensor(0.35),
        )
        air_error = torch.abs(state["feet_air_time"] - target_air.unsqueeze(1))
        rewards["air_time"] = 0.2 * torch.sum(
            torch.exp(-air_error / 0.1), dim=1
        )

        # === Turning rewards ===
        cmd_yaw = torch.abs(command[:, 2])
        turning_mask = (cmd_yaw > 0.5).float()
        vel_x_error_turning = torch.abs(
            state["base_lin_vel"][:, 0] - command[:, 0] * 0.7
        )
        rewards["turning_vel"] = 0.3 * turning_mask * torch.exp(
            -vel_x_error_turning
        )

        # === Regularization ===
        rewards["action_rate"] = -0.01 * torch.sum(
            torch.square(action - prev_action), dim=1
        )
        rewards["torque"] = -3e-5 * torch.sum(
            torch.square(state["torques"]), dim=1
        )
        rewards["upright"] = -1.5 * torch.sum(
            torch.square(state["projected_gravity"][:, :2]), dim=1
        )
        rewards["termination"] = -200.0 * state["terminated"].float()

        total = sum(rewards.values())
        return total, rewards

Sharp Turning

Turning is challenging for humanoids because it requires coordinated hip yaw and lateral weight shift simultaneously:

class TurningAnalysis:
    """Analyze turning performance."""

    def evaluate_turning(self, env, policy, yaw_rates=[0.5, 1.0, 1.5]):
        """Evaluate turning at multiple yaw rates."""

        results = {}
        for target_yaw in yaw_rates:
            command = torch.tensor([[0.5, 0.0, target_yaw]])

            actual_yaws = []
            turning_radii = []
            obs = env.reset()

            for step in range(500):
                action = policy(obs)
                obs, _, done, info = env.step(action)

                actual_yaw = info["base_ang_vel"][:, 2].mean().item()
                actual_yaws.append(actual_yaw)

                linear_vel = info["base_lin_vel"][:, 0].mean().item()
                if abs(actual_yaw) > 0.01:
                    radius = abs(linear_vel / actual_yaw)
                    turning_radii.append(radius)

            results[target_yaw] = {
                "tracking_error": abs(
                    np.mean(actual_yaws[-100:]) - target_yaw
                ),
                "avg_turning_radius": np.mean(turning_radii[-100:])
                    if turning_radii else float('inf'),
                "stability": np.std(actual_yaws[-100:]),
            }

        print(f"{'Target (rad/s)':<16} {'Error':>8} {'Radius (m)':>11} {'Stability':>10}")
        for yaw, r in results.items():
            print(f"{yaw:<16.1f} {r['tracking_error']:>8.3f} "
                  f"{r['avg_turning_radius']:>10.2f}m {r['stability']:>10.4f}")

        return results

Adversarial Motion Prior (AMP)

To achieve natural-looking running gaits, we can use AMP — using motion capture data as a prior:

class AMPReward:
    """
    Adversarial Motion Prior for natural-looking gaits.
    Discriminator distinguishes policy motion vs reference motion.
    """

    def __init__(self, reference_motions, obs_dim):
        import torch.nn as nn

        self.discriminator = nn.Sequential(
            nn.Linear(obs_dim * 2, 1024),
            nn.ReLU(),
            nn.Linear(1024, 512),
            nn.ReLU(),
            nn.Linear(512, 1),
        )

        self.reference_motions = reference_motions
        self.optimizer = torch.optim.Adam(
            self.discriminator.parameters(), lr=1e-5
        )

    def compute_style_reward(self, current_obs, next_obs):
        """
        Compute AMP style reward.
        High reward = policy motion looks like reference.
        """
        transition = torch.cat([current_obs, next_obs], dim=1)

        with torch.no_grad():
            d_output = self.discriminator(transition)

        style_reward = -torch.log(
            1 - torch.sigmoid(d_output) + 1e-6
        ).squeeze()

        return style_reward

    def update_discriminator(self, policy_transitions, reference_transitions):
        """Train discriminator: policy = fake, reference = real."""
        real_output = self.discriminator(reference_transitions)
        fake_output = self.discriminator(policy_transitions.detach())

        real_loss = torch.mean(torch.square(real_output - 1))
        fake_loss = torch.mean(torch.square(fake_output))
        loss = 0.5 * (real_loss + fake_loss)

        alpha = torch.rand(reference_transitions.shape[0], 1,
                          device=reference_transitions.device)
        interp = alpha * reference_transitions + (1 - alpha) * policy_transitions
        interp.requires_grad_(True)
        interp_output = self.discriminator(interp)
        grad = torch.autograd.grad(
            interp_output, interp,
            grad_outputs=torch.ones_like(interp_output),
            create_graph=True
        )[0]
        gp = torch.mean(torch.square(grad.norm(dim=1) - 1))
        loss += 10.0 * gp

        self.optimizer.zero_grad()
        loss.backward()
        self.optimizer.step()

        return loss.item()

Robot dynamic motion

Emergent Behaviors

With a multi-gait policy, the robot often develops emergent behaviors — not directly designed but naturally appearing:

Behavior Trigger condition Explanation
Arm swing Running > 2 m/s Policy uses arm momentum for balance
Head bob Walking 0.5-1.0 m/s Natural vertical oscillation
Foot rotation Sharp turning Pivot foot rotates to reduce friction
Stance widening Lateral walking Wider stance for stability
Deep knee bend Running acceleration Energy storage during stance phase

Full Training Config

# Multi-gait training — ~3h on RTX 4090
python source/standalone/workflows/rsl_rl/train.py \
    --task=Isaac-Velocity-Rough-H1-MultiGait-v0 \
    --num_envs=4096 \
    --max_iterations=15000 \
    --headless \
    --logger wandb \
    --wandb_project h1-multigait

# Evaluate multi-gait
python source/standalone/workflows/rsl_rl/play.py \
    --task=Isaac-Velocity-Rough-H1-MultiGait-v0 \
    --num_envs=4 \
    --checkpoint=logs/h1_multigait/model_15000.pt

For more on humanoid control methods, see Humanoid Control Methods. For parkour with legged robots, see Parkour Learning.

Summary

In this post, we extended the H1 policy to dynamic motions:

  1. Running gaits with flight phase, GRF management, and elastic knee energy
  2. Command-conditioned policy for walk/run/turn/lateral in a single policy
  3. Sharp turning with coordinated hip yaw and lateral weight shift
  4. AMP for natural-looking gaits using motion capture references
  5. Emergent behaviors like arm swing and stance widening

Next post — Unitree H1-2: Enhanced Locomotion — explores the H1-2 with new hardware and loco-manipulation basics.

References

  1. AMP: Adversarial Motion Priors for Stylized Physics-Based Character Animation — Peng et al., SIGGRAPH 2021
  2. Expressive Whole-Body Control for Humanoid Robots — Cheng et al., RSS 2024
  3. Walk These Ways: Tuning Robot Control for Generalization — Margolis & Agrawal, CoRL 2023
  4. Learning Humanoid Locomotion with Transformers — Radosavovic et al., 2024

Related Posts

Related Posts

TutorialSim-to-Real cho Humanoid: Deployment Best Practices
sim2realhumanoiddeploymentrlunitreePart 10

Sim-to-Real cho Humanoid: Deployment Best Practices

Pipeline hoàn chỉnh deploy RL locomotion policy lên robot humanoid thật — domain randomization, system identification, safety, và Unitree SDK.

9/4/202611 min read
ResearchWholeBodyVLA: VLA Toàn Thân cho Humanoid Loco-Manipulation
vlahumanoidloco-manipulationiclrrl

WholeBodyVLA: VLA Toàn Thân cho Humanoid Loco-Manipulation

ICLR 2026 — học manipulation từ egocentric video, kết hợp VLA + RL cho locomotion-aware control

7/4/202613 min read
Deep DiveMulti-Step Manipulation: Curriculum Learning cho Long-Horizon
rlcurriculum-learninglong-horizonmanipulationPart 8

Multi-Step Manipulation: Curriculum Learning cho Long-Horizon

Giải quyết manipulation dài hơi bằng RL — curriculum learning, hierarchical RL, skill chaining, và benchmark IKEA furniture assembly.

7/4/202610 min read