The Problem: One Gait, One Policy?
In previous parts of this series (Part 1, Part 2, Part 3), we trained locomotion policies — but typically each policy learns only one type of movement. Want robot to trot? Train one policy. Want it to gallop? Train another policy. Want slow walking, fast walking, sideways walking? Need yet more policies.
This approach has many problems:
- Inefficient: N movement types = N policies = N training sessions
- Hard gait transitions: How to smoothly transition between 2 policies?
- Not flexible: When needing new gait, must train from scratch
Paper "Walk These Ways: Tuning Robot Control for Generalization with Multiplicity of Behavior" (arXiv:2212.03238) by Gabriel Margolis and Pulkit Agrawal (MIT CSAIL, CoRL 2023) solves exactly this problem.
Core Idea: Multiplicity of Behavior (MoB)
Key Insight
When training locomotion with RL, there are many ways to solve the same task. For example, to move forward at 1 m/s, a robot can:
- Trot (diagonal legs simultaneously)
- Walk (one leg at a time)
- Bound (front legs then back legs)
- Pronk (all four legs together)
RL typically converges to one single strategy (usually trot because it's most stable). This paper asks: how can one policy learn MANY strategies simultaneously?
Solution: Command Conditioning
Instead of just sending velocity command (vx, vy, yaw_rate), Walk These Ways adds an extended command vector that controls how the robot moves:
# Standard locomotion command
standard_command = {
"vx": 1.0, # m/s, forward
"vy": 0.0, # m/s, lateral
"yaw_rate": 0.0, # rad/s, turning
}
# Walk These Ways EXTENDED command
wtw_command = {
# === Velocity (same as standard) ===
"vx": 1.0,
"vy": 0.0,
"yaw_rate": 0.0,
# === Gait parameters (NEW) ===
"body_height": 0.0, # [-1, 1] low/high
"step_frequency": 3.0, # Hz, step frequency
"gait": [1, 0, 0], # one-hot: trot/pace/bound
"swing_height": 0.08, # m, foot lift height
"stance_width": 0.0, # [-1, 1] narrow/wide
"body_pitch": 0.0, # rad, forward/backward lean
"body_roll": 0.0, # rad, left/right lean
"footswing_height": 0.08, # m
# TOTAL: 15 dimensions (vs 3 before)
}
Key idea: Policy receives 15-dim command vector and learns to execute any combination of these parameters. One policy, infinite movement types.
Architecture and Training
Observation Space
observation = {
# Proprioception (same as standard)
"base_angular_velocity": 3,
"projected_gravity": 3,
"joint_positions": 12,
"joint_velocities": 12,
"previous_actions": 12,
# Extended command (vs 3 dims standard)
"extended_command": 15,
# TOTAL: 57 dimensions
}
Reward Function
Beyond standard rewards (velocity tracking, energy penalty), Walk These Ways adds gait-specific rewards:
def compute_gait_reward(env):
"""
Reward for gait pattern matching.
Based on commanded gait, encourage correct contact pattern.
"""
rewards = {}
# 1. Step frequency tracking
# Count actual steps vs. commanded frequency
actual_freq = compute_step_frequency(env.foot_contacts)
freq_error = (actual_freq - env.commands.step_frequency).square()
rewards["step_freq"] = torch.exp(-freq_error / 0.25)
# 2. Gait pattern tracking
# Each gait has desired phase offsets between 4 legs
gait_phases = {
"trot": [0.0, 0.5, 0.5, 0.0], # FL-RR in phase, FR-RL in phase
"pace": [0.0, 0.5, 0.0, 0.5], # FL-RL in phase, FR-RR in phase
"bound": [0.0, 0.0, 0.5, 0.5], # front in phase, rear in phase
}
desired_phases = get_desired_phases(env.commands.gait, gait_phases)
phase_error = compute_phase_error(env.foot_contacts, desired_phases)
rewards["gait_phase"] = torch.exp(-phase_error / 0.5)
# 3. Swing height tracking
actual_swing = env.foot_heights.max(dim=-1).values
swing_error = (actual_swing - env.commands.swing_height).square()
rewards["swing_height"] = torch.exp(-swing_error / 0.01)
# 4. Body height tracking
height_error = (env.base_height - env.commands.body_height_target).square()
rewards["body_height"] = torch.exp(-height_error / 0.01)
# 5. Body orientation tracking
pitch_error = (env.base_euler[:, 1] - env.commands.body_pitch).square()
roll_error = (env.base_euler[:, 0] - env.commands.body_roll).square()
rewards["orientation"] = torch.exp(-(pitch_error + roll_error) / 0.1)
return rewards
Training Procedure
training_config = {
"num_envs": 4096,
"max_iterations": 3000, # More than standard (1500) — task is more complex
# Command sampling — CRITICAL
"command_sampling": {
"vx_range": [-1.0, 2.0],
"vy_range": [-0.5, 0.5],
"yaw_range": [-1.0, 1.0],
"body_height_range": [-0.1, 0.1],
"step_freq_range": [2.0, 4.0],
"gait": "uniform_categorical", # randomly select gait
"swing_height_range": [0.04, 0.12],
"stance_width_range": [-0.05, 0.05],
"body_pitch_range": [-0.3, 0.3],
"body_roll_range": [-0.2, 0.2],
},
# Each episode, sample RANDOM command combination
# → policy must learn all combinations
"command_resample_interval": 500, # steps
}
The brilliant insight: Each episode, each environment gets random command combination. With 4096 parallel envs, each iteration has 4096 different combinations running simultaneously. After 3000 iterations, the policy has "seen" millions of combinations.
Results and Demo
What One Policy Can Do
With Walk These Ways, one single policy can:
| Behavior | Command |
|---|---|
| Trot 2 m/s | vx=2.0, gait=trot, freq=3.0 |
| Slow walk | vx=0.3, gait=trot, freq=1.5 |
| Crouch walk | vx=0.5, body_height=-0.08 |
| High-step march | vx=0.5, swing_height=0.12 |
| Bound gallop | vx=1.5, gait=bound |
| Strafe left | vy=-0.5, gait=trot |
| Spin in place | vx=0, yaw_rate=1.5 |
| Lean forward | vx=0, body_pitch=0.3 |
| Dance rhythm | Oscillate swing_height and body_height |
| Brace against push | body_height=-0.05, stance_width=0.05 |
And all transitions between behaviors are smooth — it's just changing continuous command values.
Comparison with Single-Task Policies
| Metric | Single-task policy | Walk These Ways |
|---|---|---|
| Tracking accuracy | Higher (~5%) | Good, slightly lower |
| Gait diversity | 1 gait | Multiple gaits |
| Transition quality | None | Smooth |
| Training time | 20 min × N gaits | 60 min (once) |
| Deployment complexity | N models | 1 model |
| Novel behaviors | No | Yes (by tuning commands) |
Hardware Demo
Paper demos on Unitree A1 (predecessor to Go2). Policy deployed on onboard Jetson Xavier, inference at 50Hz. Robot can:
- Walk on grass, dirt, rocks
- Climb 25-degree slopes
- Withstand push/kick perturbations
- Switch gait in real-time via joystick
- Navigate 10cm steps
How to Replicate
Step 1: Clone Repo
git clone https://github.com/Improbable-AI/walk-these-ways.git
cd walk-these-ways
pip install -e .
Step 2: Adjust Command Ranges
Main config file:
# walk_these_ways/envs/configs/go2_config.py
class Go2WTWCfg:
class commands:
# Adjust ranges for your robot
lin_vel_x_range = [-1.0, 2.0]
lin_vel_y_range = [-0.5, 0.5]
ang_vel_yaw_range = [-1.0, 1.0]
body_height_range = [-0.05, 0.05]
step_frequency_range = [2.0, 4.0]
gait_types = ["trot", "pace", "bound"]
swing_height_range = [0.04, 0.10]
Step 3: Train
python train.py --task go2_wtw --num_envs 4096 --max_iterations 3000
# Training takes ~60 minutes on RTX 4090
# Longer than standard because observation and reward are more complex
Step 4: Deploy
Export ONNX and run on Go2 same as Part 3. Only difference: observation has 15 command dims instead of 3, and you need GUI/joystick to adjust commands in real-time.
# Joystick mapping for Walk These Ways
joystick_mapping = {
"left_stick_x": "vy",
"left_stick_y": "vx",
"right_stick_x": "yaw_rate",
"right_stick_y": "body_height",
"dpad_up": "swing_height += 0.01",
"dpad_down": "swing_height -= 0.01",
"button_a": "gait = trot",
"button_b": "gait = pace",
"button_x": "gait = bound",
"L1": "step_frequency -= 0.5",
"R1": "step_frequency += 0.5",
}
Impact and Related Works
Walk These Ways is one of the most influential papers in locomotion RL. It showed that RL policies can be generalizable — not just across terrains, but across behaviors.
Papers Building on Walk These Ways
- Extreme Parkour with Legged Robots (Cheng et al., 2024) — Extends from flat terrain to parkour (jumping, climbing, crawling), still uses command conditioning approach
- DTC: Deep Tracking Control — Uses Walk These Ways policy as low-level controller, adds high-level vision policy
- Humanoid locomotion — Teams like Agility Robotics (Digit) and Tesla (Optimus) have applied similar ideas to bipedal robots
Comparison with Other Approaches
| Approach | Paper | Strengths | Weaknesses |
|---|---|---|---|
| Walk These Ways | Margolis & Agrawal, 2023 | 1 policy, many gaits, open-source | Command design requires experience |
| AMP (Adversarial Motion Priors) | Peng et al., 2021 | Natural motion from mocap | Needs motion capture data |
| DribbleBot | Ji et al., 2023 | Soccer + locomotion | Task-specific |
| Parkour | Cheng et al., 2024 | Extreme terrain | Requires depth camera |
Lessons from the Paper
1. Command Space Design is the Core
Designing the command space is the most important decision. Too few dimensions → not expressive enough. Too many → hard to train. Walk These Ways chose 15 dims after extensive experimentation.
2. Reward Engineering is Still an Art
Even with RL, the reward function still needs domain knowledge. Knowing what gaits are, what phase means, what swing height is — all comes from classical locomotion knowledge (Part 1).
3. Open-Source Changes Everything
Walk These Ways was fully open-sourced — code, config, trained weights. Anyone with a GPU and Unitree A1/Go2 can replicate it. This is why the paper has such large impact.
4. Sim-to-Real Remains the Bottleneck
Even with an excellent policy in sim, sim-to-real transfer is still the hardest step. The paper uses strong domain randomization (friction, mass, motor strength) but still requires fine-tuning for new robots.
Series Summary: Locomotion from Zero to Hero
Across these four posts, we've gone from theoretical foundations to state-of-the-art papers:
- Part 1: ZMP, CPG, IK — classical methods and why they're being replaced
- Part 2: RL formulation — MDP, reward shaping, PPO, curriculum learning
- Part 3: Hands-on — legged_gym, Unitree Go2, sim-to-real deployment
- Part 4 (this post): Walk These Ways — multi-gait learning from one policy
Locomotion RL is evolving rapidly. Hot new directions include:
- Whole-body control: Not just walking but also using arms to manipulate (loco-manipulation)
- Vision-based locomotion: Using cameras to "see" terrain ahead
- Foundation models for locomotion: Pre-train on many robots, fine-tune for specific robot
- Humanoid locomotion: From quadruped to bipedal — much harder but seeing many breakthroughs (Agility Digit, Tesla Optimus, Fourier GR-2)
Related Posts
- Locomotion Basics: From ZMP to CPG — Part 1 of the series
- RL for Locomotion: PPO, reward shaping and curriculum — Part 2 of the series
- Quadruped Locomotion: legged_gym to Unitree Go2 — Part 3 of the series
- RL Basics for Robotics: From Markov to PPO — RL foundations
- Sim-to-Real Transfer: Train Simulation, Run Reality — Domain randomization in detail