Sim-to-Real for Locomotion: Not Just Domain Randomization
If you've read the general sim-to-real overview (Sim-to-Real Transfer: Train in Simulation, Run in Reality, Domain Randomization), you understand the basics. This post dives deep into sim-to-real specific to locomotion -- the challenges and solutions that only appear when training robots to walk, run, and jump.
Locomotion sim-to-real differs from manipulation sim-to-real in 3 key ways:
- Contact dynamics more complex: Robot feet continuously contact the ground, with impact forces varying by terrain, velocity, and contact angle
- Actuator dynamics more critical: Locomotion pushes actuators to their limits (high torque, high speed) -- where motor response is most nonlinear
- Error accumulation faster: Each step error affects the next step. After a few seconds, small errors snowball into robot falling
Let me analyze each gap and corresponding solutions, sharing best practices from leading teams.
Primary Sim-to-Real Gaps in Locomotion
Gap 1: Actuator Dynamics
Problem: In simulation, motors are typically modeled as ideal torque sources -- you command torque, motor outputs that torque instantly. Reality is completely different:
- Torque response has delay: 5-20ms latency from command to actual torque
- Velocity-dependent torque: Motor weaker at high speeds (back-EMF effect)
- Temperature-dependent: Motor hot → output torque decreases
- Friction and backlash: Gears have friction, backlash causes hysteresis
- Torque saturation: Non-linear clipping near maximum torque
This is the biggest gap for locomotion because robots continuously operate near actuator limits (jumping, running fast, recovery from perturbation).
Gap 2: Terrain Friction and Contact
Problem: Simulation uses simplified contact models (usually Coulomb friction with fixed coefficient). Reality is much more complex:
- Friction varies: Wood floor (mu=0.4), carpet (mu=0.8), wet floor (mu=0.2)
- Anisotropic friction: Friction differs by direction (carpets, grooved surfaces)
- Deformable terrain: Soft ground, grass, sand -- robot feet sink
- Foot geometry: Simulation uses simple collision shapes, real feet have complex geometry
Gap 3: Sensor Noise and Latency
Locomotion-specific issues:
- IMU drift: Gyroscope drift accumulates over time, especially problematic for balance
- Joint encoder noise: +/- 0.1-0.5 degrees, accumulates through kinematic chain
- Foot contact detection: Binary contact sensors inaccurate -- usually threshold foot force
- Communication latency: 1-5ms bus delay between sensors and controller
Gap 4: Unmodeled Dynamics
Things simulation usually ignores:
- Cable routing: Wires affect joint movement
- Structural flexibility: Mechanics not perfectly rigid
- Thermal effects: Motor heats → performance changes over time
- Battery voltage drop: Voltage decreases → motor output changes
Solution 1: Actuator Network
Actuator network is the most important technique for locomotion sim-to-real, introduced by Hwangbo et al. in "Learning Agile and Dynamic Motor Skills for Legged Robots" on ANYmal.
Concept
Instead of ideal motor model in simulation, train a neural network to predict actual motor response:
Input: [position_error_history, velocity_history, command_history]
(window 50-100 timesteps)
Output: predicted_actual_torque
Data Collection
- Place real robot on stand (feet off ground)
- Send random joint commands (sinusoidal, step, random)
- Record: commanded position, actual position, actual velocity, actual torque (if available)
- ~30 minutes data sufficient for one robot
Training Actuator Network
class ActuatorNet(nn.Module):
def __init__(self, history_length=50):
super().__init__()
self.net = nn.Sequential(
nn.Linear(history_length * 3, 256), # pos_err, vel, cmd
nn.ELU(),
nn.Linear(256, 128),
nn.ELU(),
nn.Linear(128, 1) # predicted torque
)
def forward(self, pos_error_hist, vel_hist, cmd_hist):
x = torch.cat([pos_error_hist, vel_hist, cmd_hist], dim=-1)
return self.net(x)
Integrating into Simulation
Replace the default motor model with the actuator network:
# In simulation loop
for step in range(num_steps):
action = policy(obs) # RL policy output
# Instead of: sim.set_torque(action)
predicted_torque = actuator_net( # Pass through actuator network
pos_error_history, vel_history, action_history
)
sim.set_torque(predicted_torque) # Apply predicted torque
sim.step()
Results
ANYmal with actuator network:
- ~500K timesteps/second simulation speed (fast enough for RL)
- Significantly reduced sim-to-real gap: Policy transfers successfully without heavy domain randomization
- Actuator network runs at ~50K Hz inference -- lighter than physics simulation
Limitations
- Requires torque sensing on robot (not all robots have it)
- Actuator network is specific to each robot -- must re-collect data and retrain for new robots
- Doesn't capture temperature-dependent changes (unless data collected at multiple temperatures)
Solution 2: Domain Randomization for Locomotion
Domain randomization (DR) is a "brute force" approach -- instead of modeling the real world accurately, randomize simulation parameters so the policy is robust to all variations.
Recommended Ranges for Locomotion
Based on papers from ETH Zurich (ANYmal), CMU (Unitree), Berkeley (Cassie):
| Parameter | Range | Notes |
|---|---|---|
| Body mass | +/- 15-25% | Simulate payload variation |
| CoM position | +/- 3-5 cm | Manufacturing tolerance |
| Friction coefficient | 0.3 - 2.0 | Diverse floor types |
| Restitution | 0.0 - 0.5 | Bounce on contact |
| Motor strength | +/- 10-20% | Motor variation |
| Joint damping | +/- 30% | Mechanical wear |
| Action delay | 0 - 20 ms | Communication latency |
| PD gains | +/- 20% | Controller variation |
| Terrain height noise | +/- 2-5 cm | Uneven ground |
| Push force | 0 - 80 N | External perturbation |
| Gravity direction | +/- 3 degrees | IMU calibration error |
Adaptive Domain Randomization
Instead of fixed ranges, adaptive DR adjusts ranges based on performance:
# Automatic Domain Randomization (ADR)
for epoch in training:
success_rate = evaluate(policy, current_dr_ranges)
if success_rate > 0.8:
# Expand ranges (make harder)
dr_ranges *= 1.1
elif success_rate < 0.5:
# Shrink ranges (make easier)
dr_ranges *= 0.9
OpenAI used ADR for Rubik's Cube manipulation. A similar approach applies to locomotion but requires careful tuning -- overly aggressive expansion can create impossible scenarios.
Solution 3: Terrain Curriculum
Specific to locomotion, terrain curriculum is a form of domain randomization for ground geometry.
Progressive Difficulty
Phase 1 (0-500 epochs): Flat ground only
Phase 2 (500-1000 epochs): + Rough terrain (2cm noise)
Phase 3 (1000-1500 epochs): + Slopes (5-15 degrees)
Phase 4 (1500-2000 epochs): + Stairs (10-20cm steps)
Phase 5 (2000+ epochs): + Mixed (all terrains random)
Important Terrain Types
- Height noise terrain: Random height perturbations, simulate uneven ground
- Discrete obstacles: Random boxes/barriers on flat ground
- Gaps: Missing ground sections (robot must step over)
- Slopes: Inclines and declines
- Stairs: Regular steps, ascending and descending
- Stepping stones: Discrete footholds with gaps between
Real-World Success Rates
Based on published results and reported numbers from research teams:
| Platform | Task | Sim Success | Real Success | Gap |
|---|---|---|---|---|
| ANYmal-D | Rough terrain walking | 98% | ~93% | 5% |
| Unitree A1 | Stair climbing | 95% | ~91% | 4% |
| Cassie | Flat walking | 99% | ~95% | 4% |
| Digit | Warehouse walking | 92% | ~84% | 8% |
| Unitree H1 | Flat walking | 97% | ~90% | 7% |
| ANYmal-C | Parkour (mixed) | 85% | ~75% | 10% |
Observations:
- Quadruped > Bipedal: ANYmal/A1 consistently transfer better than Digit/H1
- Flat > Complex terrain: Gap increases as terrain difficulty increases
- Actuator quality matters: QDD actuators (Berkeley Humanoid) have smaller gaps than geared motors
Step-by-Step: Train → Deploy Locomotion Policy
Step 1: Setup Simulation
# Clone Humanoid-Gym or legged_gym
git clone https://github.com/roboterax/humanoid-gym
cd humanoid-gym
pip install -e .
# Or for quadruped
git clone https://github.com/leggedrobotics/legged_gym
Configure robot URDF, reward weights, DR ranges.
Step 2: Train with Progressive Curriculum
# Config example
train_cfg = {
"terrain": {
"curriculum": True,
"terrain_types": ["flat", "rough", "slope", "stairs"],
"difficulty_scale": [0.0, 0.25, 0.5, 0.75, 1.0],
},
"domain_randomization": {
"mass_range": [-0.15, 0.15], # +/- 15%
"friction_range": [0.4, 1.8],
"motor_strength_range": [0.9, 1.1],
"push_force_range": [0, 50], # Newtons
"action_delay": [0, 0.02], # seconds
},
"reward": {
"forward_vel": 1.0,
"upright": 0.5,
"energy": -0.001,
"action_rate": -0.01,
"feet_air_time": 0.5, # Encourage lifting feet
}
}
Train 2000-5000 epochs, ~2-8 hours on RTX 4090 (4096 parallel envs).
Step 3: Evaluate in Simulation
Before deployment, evaluate thoroughly in simulation:
# Test scenarios
test_scenarios = [
"flat_ground_forward_1.0ms",
"flat_ground_turning_0.5rads",
"rough_terrain_forward_0.5ms",
"slope_15deg_ascending",
"stairs_15cm_ascending",
"push_recovery_30N_lateral",
"push_recovery_50N_lateral",
]
for scenario in test_scenarios:
success_rate = evaluate(policy, scenario, n_trials=100)
print(f"{scenario}: {success_rate:.1%}")
assert success_rate > 0.85, f"FAIL: {scenario}"
Red flags in sim evaluation:
- Success rate < 90% on flat ground → policy hasn't converged
- Robot falls when pushed 30N → balance reward too low
- Shuffling gait → need periodic gait reward
- High energy consumption → energy penalty too low
Step 4: Sim-to-Sim Verification (Optional but Recommended)
Transfer policy from Isaac Gym to MuJoCo (or vice versa):
# Load policy trained in Isaac Gym
policy = load_policy("checkpoints/best_policy.pt")
# Test in MuJoCo
mujoco_env = create_mujoco_env(robot_urdf)
for episode in range(50):
obs = mujoco_env.reset()
for step in range(1000):
action = policy(obs)
obs, reward, done, info = mujoco_env.step(action)
if done:
break
print(f"Episode {episode}: survived {step} steps")
If the policy works well in both simulators, it's likely to transfer well to the real robot.
Step 5: Real-World Deployment
Deployment checklist:
- Safety harness for robot (especially for bipedal)
- Flat ground first, complex terrain after
- Start with low velocity commands (0.3 m/s)
- Monitor motor temperatures
- Record data for debugging
# Deploy script (simplified)
import robot_sdk
robot = robot_sdk.connect("192.168.1.100")
policy = load_policy("checkpoints/best_policy.pt")
# Safety limits
MAX_VEL = 0.5 # Start slow
MAX_TORQUE = 0.8 * robot.max_torque # 80% limit
while True:
obs = robot.get_observation()
action = policy(obs)
action = clip(action, -MAX_TORQUE, MAX_TORQUE)
robot.set_action(action)
robot.step()
# Safety check
if robot.is_fallen():
robot.stop()
break
Step 6: Iterate
After initial deployment, observe failure modes and return to sim:
- Robot slips on floor → increase friction randomization range
- Motor overheats → add torque penalty, reduce max velocity
- Robot unstable on real terrain → add terrain noise in sim
- Latency causes instability → increase action delay randomization
Common Failure Modes and Fixes
Failure 1: Robot Falls Immediately When Set Down
Cause: Initial state differs between sim and real. Sim usually starts from perfect standing, real robot has slight imbalance.
Fix: Randomize initial state in sim (body orientation +/- 5 degrees, velocity +/- 0.1 m/s).
Failure 2: Gait is Not Smooth, Jerky
Cause: Action delay in real system is larger than sim. Policy output changes fast but robot responds slow.
Fix: (a) Add action delay randomization (0-20ms), (b) Add action rate penalty in reward, (c) Reduce control frequency.
Failure 3: Robot Slips When Turning
Cause: Friction coefficient in sim differs from real. Policy learned based on higher friction.
Fix: Randomize friction coefficient wider (0.3-2.0). Add lateral velocity penalty.
Failure 4: Motor Overheats After a Few Minutes
Cause: Policy uses too much torque. Sim doesn't model thermal limits.
Fix: (a) Increase energy penalty weight, (b) Add torque RMS constraint, (c) Reduce max action magnitude.
Failure 5: Policy Performs Differently Each Deployment
Cause: Policy is sensitive to initial conditions or sensor noise. Not robust.
Fix: Increase domain randomization ranges, especially sensor noise and initial state randomization.
Important Papers on Sim-to-Real Locomotion
Essential papers to read:
-
Learning Agile and Dynamic Motor Skills for Legged Robots (arXiv:1901.08652) -- Hwangbo et al., 2019. Actuator network for ANYmal, foundational paper.
-
Learning Robust, Agile, Natural Legged Locomotion Skills in the Wild (arXiv:2304.10888) -- Adversarial training + teacher-student for natural locomotion on wild terrain.
-
Reinforcement Learning for Versatile, Dynamic, and Robust Bipedal Locomotion Control (arXiv:2401.16889) -- Comprehensive bipedal RL framework on Cassie.
-
Humanoid-Gym: RL for Humanoid Robot with Zero-Shot Sim2Real Transfer (arXiv:2404.05695) -- Open-source framework, verified zero-shot transfer.
-
Berkeley Humanoid: A Research Platform for Learning-based Control (arXiv:2407.21781) -- Hardware co-design makes sim-to-real easier.
-
Towards Bridging the Gap: Systematic Sim-to-Real Transfer for Diverse Legged Robots (arXiv:2509.06342) -- PACE: joint-space dynamics alignment, alternative to actuator networks.
Best Practices Summary
From experience of leading teams (ETH RSL, CMU, Berkeley, Unitree):
1. Start Simple, Add Complexity Gradually
- Flat ground walking first
- Add terrain curriculum after flat walking is stable
- Add vision after proprioceptive policy is robust
2. Actuator Modeling > Domain Randomization
- If you have torque sensor → train actuator network (best)
- If not → aggressive motor strength randomization (+/- 20%)
- QDD actuators reduce need for both
3. Sim-to-Sim Before Sim-to-Real
- Train Isaac Gym → test MuJoCo (or vice versa)
- If transfer works between two sims → likely works on real robot
4. Safety First When Deploying
- Harness for bipedal (always)
- Start low velocity, increase gradually
- Monitor motor temperature real-time
- Accessible kill switch
5. Log Everything
- Record all observations, actions, rewards in real deployment
- Compare distributions with sim data → identify gaps
- Use real data to refine sim parameters
Conclusion
Sim-to-real for locomotion combines multiple techniques: actuator networks for accurate motor modeling, domain randomization for robustness, terrain curriculum for progressive difficulty, and careful engineering for deployment. There's no single silver bullet -- each robot and task needs a different combination.
Key takeaway: Hardware co-design matters. Berkeley Humanoid with QDD actuators needs less sim-to-real effort than robots with complex geared transmissions. If you're designing a new robot, choose actuators that are easy to simulate.
Read the full series:
- Part 1: Locomotion Basics -- From ZMP to CPG
- Part 2: RL for Locomotion -- PPO, reward shaping, curriculum
- Part 3: Quadruped Locomotion -- legged_gym to Unitree Go2
- Part 4: Walk These Ways -- Adaptive locomotion
- Part 5: Robot Parkour -- Jumping and climbing
- Part 6: Bipedal Walking -- 2-legged robots with RL
Related articles in Simulation series:
- Domain Randomization Detailed -- Visual and dynamics randomization
- Complete Sim-to-Real Pipeline -- End-to-end from sim to real
Related Articles
- Domain Randomization: Technique to Reduce Sim-to-Real Gap -- Detailed on visual and dynamics randomization
- Complete Sim-to-Real Pipeline -- End-to-end pipeline from simulation to real
- Bipedal Walking: Controlling 2-Legged Robots with RL -- Platforms and reward design for bipedal
- Robot Parkour: Jumping and Climbing with RL -- Parkour as the ultimate sim-to-real test
- Simulation for Robotics: MuJoCo vs Isaac Sim vs Gazebo -- Choosing the right simulator