Walking is the foundation, but a truly useful humanoid needs much more. In this post, we push the Unitree H1 beyond walking to achieve running (2+ m/s), sharp turning, lateral walking, backward walking, and a command-conditioned policy that flexibly switches between gaits.
Building on the H1 basic training post, this post focuses on dynamic motions — where physics and control become truly interesting.
From Walking to Running: What Changes?
Physics of Running
Walking and running differ fundamentally in the flight phase:
| Characteristic | Walking | Running |
|---|---|---|
| Flight phase | None (at least 1 foot on ground) | Yes (both feet leave ground) |
| Ground contact | ~60% time per foot | ~30-40% time |
| Peak force | ~1.2x body weight | ~2.5-3x body weight |
| Speed | 0-1.5 m/s | 1.5-3.3+ m/s |
| Froude number | < 1 | > 1 |
| Energy mode | Inverted pendulum | Spring-mass |
import torch
import numpy as np
class RunningRewardExtension:
"""
Extended reward function for running gaits.
"""
def compute_running_rewards(self, state, command):
rewards = {}
# 1. Extended velocity tracking range
vel_error = torch.sum(
torch.square(command[:, :2] - state["base_lin_vel"][:, :2]),
dim=1
)
sigma = torch.where(
torch.abs(command[:, 0]) > 1.5,
torch.tensor(0.5), # Running: larger tolerance
torch.tensor(0.25), # Walking: tight tolerance
)
rewards["vel_tracking"] = torch.exp(-vel_error / sigma)
# 2. Flight phase reward
both_feet_air = (
(state["foot_contact"][:, 0] < 0.5) &
(state["foot_contact"][:, 1] < 0.5)
).float()
running_command = (torch.abs(command[:, 0]) > 2.0).float()
rewards["flight_phase"] = 0.3 * running_command * both_feet_air
# 3. Ground reaction force symmetry
left_grf = state["contact_forces"][:, 0, 2]
right_grf = state["contact_forces"][:, 1, 2]
grf_symmetry = 1.0 - torch.abs(
left_grf.max() - right_grf.max()
) / (left_grf.max() + right_grf.max() + 1e-6)
rewards["grf_symmetry"] = 0.1 * grf_symmetry
# 4. Knee bend during stance (energy storage)
stance_mask = state["foot_contact"] > 0.5
knee_bend = torch.abs(state["joint_pos"][:, [3, 8]])
running_knee_reward = stance_mask * torch.clamp(
knee_bend - 0.2, min=0.0
)
rewards["knee_bend"] = 0.1 * torch.sum(running_knee_reward, dim=1)
return rewards
Command-Conditioned Multi-Gait Policy
Instead of training multiple separate policies, we train a single policy that can walk, run, turn, and move laterally — depending on the velocity command.
Extended Command Space
class ExtendedVelocityCommand:
"""
Extended velocity command for multi-gait policy.
"""
def __init__(self):
self.ranges = {
"lin_vel_x": (-0.5, 3.5), # Backward to Running
"lin_vel_y": (-0.8, 0.8), # Lateral walking
"ang_vel_z": (-1.5, 1.5), # Sharp turning
}
self.curriculum_factor = 0.0
def sample_command(self, num_envs, device="cuda"):
"""Sample random velocity commands."""
factor = self.curriculum_factor
commands = torch.zeros(num_envs, 3, device=device)
commands[:, 0] = torch.empty(num_envs, device=device).uniform_(
self.ranges["lin_vel_x"][0] * factor,
self.ranges["lin_vel_x"][1] * factor + (1 - factor) * 1.0,
)
commands[:, 1] = torch.empty(num_envs, device=device).uniform_(
self.ranges["lin_vel_y"][0] * factor,
self.ranges["lin_vel_y"][1] * factor,
)
commands[:, 2] = torch.empty(num_envs, device=device).uniform_(
self.ranges["ang_vel_z"][0] * factor,
self.ranges["ang_vel_z"][1] * factor,
)
# 20% chance of zero command (standing)
zero_mask = torch.rand(num_envs, device=device) < 0.2
commands[zero_mask] = 0.0
return commands
def update_curriculum(self, iteration, total_iterations=10000):
"""Gradually expand command range."""
self.curriculum_factor = min(iteration / (total_iterations * 0.5), 1.0)
Multi-Gait Reward Function
class MultiGaitReward:
"""
Speed-adaptive reward function.
Low speed -> walking rewards. High speed -> running rewards.
"""
def compute(self, state, action, prev_action, command):
rewards = {}
cmd_speed = torch.abs(command[:, 0])
# === Velocity tracking (universal) ===
vel_error = torch.sum(
torch.square(command[:, :2] - state["base_lin_vel"][:, :2]),
dim=1
)
rewards["vel_tracking"] = 1.5 * torch.exp(-vel_error / 0.25)
# === Yaw tracking ===
yaw_error = torch.square(command[:, 2] - state["base_ang_vel"][:, 2])
rewards["yaw_tracking"] = 0.8 * torch.exp(-yaw_error / 0.25)
# === Speed-adaptive rewards ===
left_c = state["foot_contact"][:, 0]
right_c = state["foot_contact"][:, 1]
both_air = (1 - left_c) * (1 - right_c)
# Walking (< 1.5 m/s): penalize both feet in air
walk_mask = (cmd_speed < 1.5).float()
rewards["walk_contact"] = -0.5 * walk_mask * both_air
# Running (> 2.0 m/s): reward flight phase
run_mask = (cmd_speed > 2.0).float()
rewards["run_flight"] = 0.3 * run_mask * both_air
# === Adaptive foot clearance ===
target_clearance = torch.where(
cmd_speed > 2.0,
torch.tensor(0.12),
torch.tensor(0.08),
)
swing = state["foot_contact"] < 0.5
clearance = torch.where(
swing,
torch.clamp(state["foot_height"] - target_clearance.unsqueeze(1), min=0.0),
torch.zeros_like(state["foot_height"]),
)
rewards["clearance"] = 0.3 * torch.sum(clearance, dim=1)
# === Adaptive feet air time ===
target_air = torch.where(
cmd_speed > 2.0,
torch.tensor(0.25),
torch.tensor(0.35),
)
air_error = torch.abs(state["feet_air_time"] - target_air.unsqueeze(1))
rewards["air_time"] = 0.2 * torch.sum(
torch.exp(-air_error / 0.1), dim=1
)
# === Turning rewards ===
cmd_yaw = torch.abs(command[:, 2])
turning_mask = (cmd_yaw > 0.5).float()
vel_x_error_turning = torch.abs(
state["base_lin_vel"][:, 0] - command[:, 0] * 0.7
)
rewards["turning_vel"] = 0.3 * turning_mask * torch.exp(
-vel_x_error_turning
)
# === Regularization ===
rewards["action_rate"] = -0.01 * torch.sum(
torch.square(action - prev_action), dim=1
)
rewards["torque"] = -3e-5 * torch.sum(
torch.square(state["torques"]), dim=1
)
rewards["upright"] = -1.5 * torch.sum(
torch.square(state["projected_gravity"][:, :2]), dim=1
)
rewards["termination"] = -200.0 * state["terminated"].float()
total = sum(rewards.values())
return total, rewards
Sharp Turning
Turning is challenging for humanoids because it requires coordinated hip yaw and lateral weight shift simultaneously:
class TurningAnalysis:
"""Analyze turning performance."""
def evaluate_turning(self, env, policy, yaw_rates=[0.5, 1.0, 1.5]):
"""Evaluate turning at multiple yaw rates."""
results = {}
for target_yaw in yaw_rates:
command = torch.tensor([[0.5, 0.0, target_yaw]])
actual_yaws = []
turning_radii = []
obs = env.reset()
for step in range(500):
action = policy(obs)
obs, _, done, info = env.step(action)
actual_yaw = info["base_ang_vel"][:, 2].mean().item()
actual_yaws.append(actual_yaw)
linear_vel = info["base_lin_vel"][:, 0].mean().item()
if abs(actual_yaw) > 0.01:
radius = abs(linear_vel / actual_yaw)
turning_radii.append(radius)
results[target_yaw] = {
"tracking_error": abs(
np.mean(actual_yaws[-100:]) - target_yaw
),
"avg_turning_radius": np.mean(turning_radii[-100:])
if turning_radii else float('inf'),
"stability": np.std(actual_yaws[-100:]),
}
print(f"{'Target (rad/s)':<16} {'Error':>8} {'Radius (m)':>11} {'Stability':>10}")
for yaw, r in results.items():
print(f"{yaw:<16.1f} {r['tracking_error']:>8.3f} "
f"{r['avg_turning_radius']:>10.2f}m {r['stability']:>10.4f}")
return results
Adversarial Motion Prior (AMP)
To achieve natural-looking running gaits, we can use AMP — using motion capture data as a prior:
class AMPReward:
"""
Adversarial Motion Prior for natural-looking gaits.
Discriminator distinguishes policy motion vs reference motion.
"""
def __init__(self, reference_motions, obs_dim):
import torch.nn as nn
self.discriminator = nn.Sequential(
nn.Linear(obs_dim * 2, 1024),
nn.ReLU(),
nn.Linear(1024, 512),
nn.ReLU(),
nn.Linear(512, 1),
)
self.reference_motions = reference_motions
self.optimizer = torch.optim.Adam(
self.discriminator.parameters(), lr=1e-5
)
def compute_style_reward(self, current_obs, next_obs):
"""
Compute AMP style reward.
High reward = policy motion looks like reference.
"""
transition = torch.cat([current_obs, next_obs], dim=1)
with torch.no_grad():
d_output = self.discriminator(transition)
style_reward = -torch.log(
1 - torch.sigmoid(d_output) + 1e-6
).squeeze()
return style_reward
def update_discriminator(self, policy_transitions, reference_transitions):
"""Train discriminator: policy = fake, reference = real."""
real_output = self.discriminator(reference_transitions)
fake_output = self.discriminator(policy_transitions.detach())
real_loss = torch.mean(torch.square(real_output - 1))
fake_loss = torch.mean(torch.square(fake_output))
loss = 0.5 * (real_loss + fake_loss)
alpha = torch.rand(reference_transitions.shape[0], 1,
device=reference_transitions.device)
interp = alpha * reference_transitions + (1 - alpha) * policy_transitions
interp.requires_grad_(True)
interp_output = self.discriminator(interp)
grad = torch.autograd.grad(
interp_output, interp,
grad_outputs=torch.ones_like(interp_output),
create_graph=True
)[0]
gp = torch.mean(torch.square(grad.norm(dim=1) - 1))
loss += 10.0 * gp
self.optimizer.zero_grad()
loss.backward()
self.optimizer.step()
return loss.item()
Emergent Behaviors
With a multi-gait policy, the robot often develops emergent behaviors — not directly designed but naturally appearing:
| Behavior | Trigger condition | Explanation |
|---|---|---|
| Arm swing | Running > 2 m/s | Policy uses arm momentum for balance |
| Head bob | Walking 0.5-1.0 m/s | Natural vertical oscillation |
| Foot rotation | Sharp turning | Pivot foot rotates to reduce friction |
| Stance widening | Lateral walking | Wider stance for stability |
| Deep knee bend | Running acceleration | Energy storage during stance phase |
Full Training Config
# Multi-gait training — ~3h on RTX 4090
python source/standalone/workflows/rsl_rl/train.py \
--task=Isaac-Velocity-Rough-H1-MultiGait-v0 \
--num_envs=4096 \
--max_iterations=15000 \
--headless \
--logger wandb \
--wandb_project h1-multigait
# Evaluate multi-gait
python source/standalone/workflows/rsl_rl/play.py \
--task=Isaac-Velocity-Rough-H1-MultiGait-v0 \
--num_envs=4 \
--checkpoint=logs/h1_multigait/model_15000.pt
For more on humanoid control methods, see Humanoid Control Methods. For parkour with legged robots, see Parkour Learning.
Summary
In this post, we extended the H1 policy to dynamic motions:
- Running gaits with flight phase, GRF management, and elastic knee energy
- Command-conditioned policy for walk/run/turn/lateral in a single policy
- Sharp turning with coordinated hip yaw and lateral weight shift
- AMP for natural-looking gaits using motion capture references
- Emergent behaviors like arm swing and stance widening
Next post — Unitree H1-2: Enhanced Locomotion — explores the H1-2 with new hardware and loco-manipulation basics.
References
- AMP: Adversarial Motion Priors for Stylized Physics-Based Character Animation — Peng et al., SIGGRAPH 2021
- Expressive Whole-Body Control for Humanoid Robots — Cheng et al., RSS 2024
- Walk These Ways: Tuning Robot Control for Generalization — Margolis & Agrawal, CoRL 2023
- Learning Humanoid Locomotion with Transformers — Radosavovic et al., 2024