Walking là nền tảng, nhưng một humanoid thực sự hữu ích cần nhiều hơn thế. Trong bài này, chúng ta sẽ đẩy Unitree H1 vượt qua giới hạn walking để đạt được running (2+ m/s), sharp turning, lateral walking, backward walking, và một command-conditioned policy có thể chuyển đổi linh hoạt giữa các gaits.
Tiếp nối từ bài H1 basic training, bài này tập trung vào dynamic motions — nơi mà physics và control trở nên thực sự thú vị.
Từ Walking sang Running: Thay đổi gì?
Physics của running
Walking và running khác nhau về cơ bản trong giai đoạn bay (flight phase):
| Đặc điểm | Walking | Running |
|---|---|---|
| Flight phase | Không có (ít nhất 1 chân chạm đất) | Có (cả 2 chân rời đất) |
| Ground contact | ~60% thời gian mỗi chân | ~30-40% thời gian |
| Peak force | ~1.2x body weight | ~2.5-3x body weight |
| Speed | 0-1.5 m/s | 1.5-3.3+ m/s |
| Froude number | < 1 | > 1 |
| Energy mode | Inverted pendulum | Spring-mass |
import torch
import numpy as np
class RunningRewardExtension:
"""
Mở rộng reward function cho running gaits.
"""
def compute_running_rewards(self, state, command):
rewards = {}
# 1. Velocity tracking mở rộng range
# Walking: max 1.5 m/s → Running: max 3.5 m/s
vel_error = torch.sum(
torch.square(command[:, :2] - state["base_lin_vel"][:, :2]),
dim=1
)
# Sigma lớn hơn cho high-speed tracking
sigma = torch.where(
torch.abs(command[:, 0]) > 1.5,
torch.tensor(0.5), # Running: tolerance lớn hơn
torch.tensor(0.25), # Walking: tolerance chặt
)
rewards["vel_tracking"] = torch.exp(-vel_error / sigma)
# 2. Flight phase reward
# Khi command > 2 m/s, khuyến khích flight phase
both_feet_air = (
(state["foot_contact"][:, 0] < 0.5) &
(state["foot_contact"][:, 1] < 0.5)
).float()
running_command = (torch.abs(command[:, 0]) > 2.0).float()
rewards["flight_phase"] = 0.3 * running_command * both_feet_air
# 3. Ground reaction force symmetry
# Running cần GRF peaks cao và symmetric
left_grf = state["contact_forces"][:, 0, 2] # z-component
right_grf = state["contact_forces"][:, 1, 2]
grf_symmetry = 1.0 - torch.abs(
left_grf.max() - right_grf.max()
) / (left_grf.max() + right_grf.max() + 1e-6)
rewards["grf_symmetry"] = 0.1 * grf_symmetry
# 4. Knee bend during stance (energy storage)
# Running dùng elastic energy → knee phải bend nhiều hơn
stance_mask = state["foot_contact"] > 0.5
knee_bend = torch.abs(state["joint_pos"][:, [3, 8]]) # knee joints
running_knee_reward = stance_mask * torch.clamp(
knee_bend - 0.2, min=0.0
)
rewards["knee_bend"] = 0.1 * torch.sum(running_knee_reward, dim=1)
return rewards
Command-Conditioned Multi-Gait Policy
Thay vì train nhiều policy riêng, chúng ta train một policy duy nhất có thể walk, run, turn, và đi ngang — tùy thuộc vào velocity command.
Extended Command Space
class ExtendedVelocityCommand:
"""
Extended velocity command cho multi-gait policy.
"""
def __init__(self):
# Command ranges
self.ranges = {
"lin_vel_x": (-0.5, 3.5), # Backward → Running
"lin_vel_y": (-0.8, 0.8), # Lateral walking
"ang_vel_z": (-1.5, 1.5), # Sharp turning
}
# Curriculum: dần mở rộng range
self.curriculum_factor = 0.0 # 0→1 over training
def sample_command(self, num_envs, device="cuda"):
"""Sample random velocity commands."""
# Áp dụng curriculum
factor = self.curriculum_factor
commands = torch.zeros(num_envs, 3, device=device)
# Forward/backward velocity
commands[:, 0] = torch.empty(num_envs, device=device).uniform_(
self.ranges["lin_vel_x"][0] * factor,
self.ranges["lin_vel_x"][1] * factor + (1 - factor) * 1.0,
)
# Lateral velocity
commands[:, 1] = torch.empty(num_envs, device=device).uniform_(
self.ranges["lin_vel_y"][0] * factor,
self.ranges["lin_vel_y"][1] * factor,
)
# Yaw rate
commands[:, 2] = torch.empty(num_envs, device=device).uniform_(
self.ranges["ang_vel_z"][0] * factor,
self.ranges["ang_vel_z"][1] * factor,
)
# 20% chance of zero command (standing)
zero_mask = torch.rand(num_envs, device=device) < 0.2
commands[zero_mask] = 0.0
return commands
def update_curriculum(self, iteration, total_iterations=10000):
"""Tăng dần command range."""
self.curriculum_factor = min(iteration / (total_iterations * 0.5), 1.0)
Multi-Gait Reward Function
class MultiGaitReward:
"""
Reward function thích ứng theo tốc độ command.
Low speed → walking rewards. High speed → running rewards.
"""
def compute(self, state, action, prev_action, command):
rewards = {}
cmd_speed = torch.abs(command[:, 0])
# === Velocity tracking (universal) ===
vel_error = torch.sum(
torch.square(command[:, :2] - state["base_lin_vel"][:, :2]),
dim=1
)
rewards["vel_tracking"] = 1.5 * torch.exp(-vel_error / 0.25)
# === Yaw tracking ===
yaw_error = torch.square(command[:, 2] - state["base_ang_vel"][:, 2])
rewards["yaw_tracking"] = 0.8 * torch.exp(-yaw_error / 0.25)
# === Speed-adaptive rewards ===
# Contact pattern: walk = alternating, run = flight phases
left_c = state["foot_contact"][:, 0]
right_c = state["foot_contact"][:, 1]
both_air = (1 - left_c) * (1 - right_c)
# Walking (< 1.5 m/s): penalize both feet in air
walk_mask = (cmd_speed < 1.5).float()
rewards["walk_contact"] = -0.5 * walk_mask * both_air
# Running (> 2.0 m/s): reward flight phase
run_mask = (cmd_speed > 2.0).float()
rewards["run_flight"] = 0.3 * run_mask * both_air
# === Foot clearance (adaptive) ===
target_clearance = torch.where(
cmd_speed > 2.0,
torch.tensor(0.12), # Running: higher steps
torch.tensor(0.08), # Walking: normal
)
swing = state["foot_contact"] < 0.5
clearance = torch.where(
swing,
torch.clamp(state["foot_height"] - target_clearance.unsqueeze(1), min=0.0),
torch.zeros_like(state["foot_height"]),
)
rewards["clearance"] = 0.3 * torch.sum(clearance, dim=1)
# === Feet air time (adaptive) ===
target_air = torch.where(
cmd_speed > 2.0,
torch.tensor(0.25), # Running: shorter
torch.tensor(0.35), # Walking: longer
)
air_error = torch.abs(state["feet_air_time"] - target_air.unsqueeze(1))
rewards["air_time"] = 0.2 * torch.sum(
torch.exp(-air_error / 0.1), dim=1
)
# === Turning rewards ===
cmd_yaw = torch.abs(command[:, 2])
turning_mask = (cmd_yaw > 0.5).float()
# Allow lower forward speed during sharp turns
vel_x_error_turning = torch.abs(
state["base_lin_vel"][:, 0] - command[:, 0] * 0.7
)
rewards["turning_vel"] = 0.3 * turning_mask * torch.exp(
-vel_x_error_turning
)
# === Regularization ===
rewards["action_rate"] = -0.01 * torch.sum(
torch.square(action - prev_action), dim=1
)
rewards["torque"] = -3e-5 * torch.sum(
torch.square(state["torques"]), dim=1
)
rewards["upright"] = -1.5 * torch.sum(
torch.square(state["projected_gravity"][:, :2]), dim=1
)
rewards["termination"] = -200.0 * state["terminated"].float()
total = sum(rewards.values())
return total, rewards
Sharp Turning
Turning là kỹ năng khó cho humanoid vì nó yêu cầu coordinated hip yaw và lateral weight shift đồng thời:
class TurningAnalysis:
"""Phân tích turning performance."""
def evaluate_turning(self, env, policy, yaw_rates=[0.5, 1.0, 1.5]):
"""Đánh giá turning ở nhiều yaw rates."""
results = {}
for target_yaw in yaw_rates:
# Command: forward 0.5 m/s + turn
command = torch.tensor([[0.5, 0.0, target_yaw]])
actual_yaws = []
turning_radii = []
obs = env.reset()
for step in range(500): # 10s
action = policy(obs)
obs, _, done, info = env.step(action)
actual_yaw = info["base_ang_vel"][:, 2].mean().item()
actual_yaws.append(actual_yaw)
# Tính turning radius
linear_vel = info["base_lin_vel"][:, 0].mean().item()
if abs(actual_yaw) > 0.01:
radius = abs(linear_vel / actual_yaw)
turning_radii.append(radius)
results[target_yaw] = {
"tracking_error": abs(
np.mean(actual_yaws[-100:]) - target_yaw
),
"avg_turning_radius": np.mean(turning_radii[-100:])
if turning_radii else float('inf'),
"stability": np.std(actual_yaws[-100:]),
}
# Print results
print(f"{'Target (rad/s)':<16} {'Error':>8} {'Radius (m)':>11} {'Stability':>10}")
for yaw, r in results.items():
print(f"{yaw:<16.1f} {r['tracking_error']:>8.3f} "
f"{r['avg_turning_radius']:>10.2f}m {r['stability']:>10.4f}")
return results
Adversarial Motion Prior (AMP)
Để đạt được dáng chạy tự nhiên, chúng ta có thể dùng AMP — sử dụng motion capture data làm prior:
class AMPReward:
"""
Adversarial Motion Prior cho natural-looking gaits.
Discriminator phân biệt policy motion vs reference motion.
"""
def __init__(self, reference_motions, obs_dim):
import torch.nn as nn
# Discriminator network
self.discriminator = nn.Sequential(
nn.Linear(obs_dim * 2, 1024), # current + next state
nn.ReLU(),
nn.Linear(1024, 512),
nn.ReLU(),
nn.Linear(512, 1),
)
# Load reference motion clips
# Format: (num_clips, num_frames, obs_dim)
self.reference_motions = reference_motions
self.optimizer = torch.optim.Adam(
self.discriminator.parameters(), lr=1e-5
)
def compute_style_reward(self, current_obs, next_obs):
"""
Tính AMP style reward.
High reward = policy motion looks like reference.
"""
# Concatenate transition
transition = torch.cat([current_obs, next_obs], dim=1)
# Discriminator output
with torch.no_grad():
d_output = self.discriminator(transition)
# AMP reward = -log(1 - sigmoid(D))
style_reward = -torch.log(
1 - torch.sigmoid(d_output) + 1e-6
).squeeze()
return style_reward
def update_discriminator(self, policy_transitions, reference_transitions):
"""
Train discriminator để phân biệt policy vs reference.
Policy = "fake", Reference = "real".
"""
# Real transitions from reference motions
real_output = self.discriminator(reference_transitions)
# Fake transitions from policy
fake_output = self.discriminator(policy_transitions.detach())
# GAN loss (least-squares GAN)
real_loss = torch.mean(torch.square(real_output - 1))
fake_loss = torch.mean(torch.square(fake_output))
loss = 0.5 * (real_loss + fake_loss)
# Gradient penalty
alpha = torch.rand(reference_transitions.shape[0], 1,
device=reference_transitions.device)
interp = alpha * reference_transitions + (1 - alpha) * policy_transitions
interp.requires_grad_(True)
interp_output = self.discriminator(interp)
grad = torch.autograd.grad(
interp_output, interp,
grad_outputs=torch.ones_like(interp_output),
create_graph=True
)[0]
gp = torch.mean(torch.square(grad.norm(dim=1) - 1))
loss += 10.0 * gp
self.optimizer.zero_grad()
loss.backward()
self.optimizer.step()
return loss.item()
Emergent Behaviors
Với multi-gait policy, robot thường phát triển các hành vi emergent — không được thiết kế trực tiếp nhưng xuất hiện tự nhiên:
| Behavior | Điều kiện xuất hiện | Giải thích |
|---|---|---|
| Arm swing | Running > 2 m/s | Policy dùng arm momentum để balance |
| Head bob | Walking 0.5-1.0 m/s | Vertical oscillation tự nhiên |
| Foot rotation | Sharp turning | Pivot foot rotates để giảm friction |
| Stance widening | Lateral walking | Wider stance cho stability |
| Knee bend deepening | Running acceleration | Energy storage trong stance phase |
Full Training Config
# Multi-gait training — ~3h trên RTX 4090
python source/standalone/workflows/rsl_rl/train.py \
--task=Isaac-Velocity-Rough-H1-MultiGait-v0 \
--num_envs=4096 \
--max_iterations=15000 \
--headless \
--logger wandb \
--wandb_project h1-multigait
# Evaluate multi-gait
python source/standalone/workflows/rsl_rl/play.py \
--task=Isaac-Velocity-Rough-H1-MultiGait-v0 \
--num_envs=4 \
--checkpoint=logs/h1_multigait/model_15000.pt
Để tìm hiểu thêm về các phương pháp control cho humanoid, xem Humanoid Control Methods. Về parkour với legged robots, xem Parkour Learning.
Tổng kết
Trong bài này, chúng ta đã mở rộng H1 policy sang dynamic motions:
- Running gaits với flight phase, GRF management, và elastic knee energy
- Command-conditioned policy cho walk/run/turn/lateral trong một policy duy nhất
- Sharp turning với coordinated hip yaw và lateral weight shift
- AMP cho natural-looking gaits sử dụng motion capture reference
- Emergent behaviors như arm swing và stance widening
Bài tiếp theo — Unitree H1-2: Enhanced Locomotion — sẽ khám phá H1-2 với hardware mới và loco-manipulation basics.
Tài liệu tham khảo
- AMP: Adversarial Motion Priors for Stylized Physics-Based Character Animation — Peng et al., SIGGRAPH 2021
- Expressive Whole-Body Control for Humanoid Robots — Cheng et al., RSS 2024
- Walk These Ways: Tuning Robot Control for Generalization — Margolis & Agrawal, CoRL 2023
- Learning Humanoid Locomotion with Transformers — Radosavovic et al., 2024