Sim-to-Real for Locomotion: Reality and Experience

Sim-to-Real for Locomotion: Not Just Domain Randomization

If you've read the general sim-to-real overview (Sim-to-Real Transfer: Train in Simulation, Run in Reality, Domain Randomization), you understand the basics. This post dives deep into sim-to-real specific to locomotion -- the challenges and solutions that only appear when training robots to walk, run, and jump.

Locomotion sim-to-real differs from manipulation sim-to-real in 3 key ways:

Contact dynamics more complex: Robot feet continuously contact the ground, with impact forces varying by terrain, velocity, and contact angle
Actuator dynamics more critical: Locomotion pushes actuators to their limits (high torque, high speed) -- where motor response is most nonlinear
Error accumulation faster: Each step error affects the next step. After a few seconds, small errors snowball into robot falling

Let me analyze each gap and corresponding solutions, sharing best practices from leading teams.

Primary Sim-to-Real Gaps in Locomotion

Gap 1: Actuator Dynamics

Problem: In simulation, motors are typically modeled as ideal torque sources -- you command torque, motor outputs that torque instantly. Reality is completely different:

Torque response has delay: 5-20ms latency from command to actual torque
Velocity-dependent torque: Motor weaker at high speeds (back-EMF effect)
Temperature-dependent: Motor hot → output torque decreases
Friction and backlash: Gears have friction, backlash causes hysteresis
Torque saturation: Non-linear clipping near maximum torque

This is the biggest gap for locomotion because robots continuously operate near actuator limits (jumping, running fast, recovery from perturbation).

Gap 2: Terrain Friction and Contact

Problem: Simulation uses simplified contact models (usually Coulomb friction with fixed coefficient). Reality is much more complex:

Friction varies: Wood floor (mu=0.4), carpet (mu=0.8), wet floor (mu=0.2)
Anisotropic friction: Friction differs by direction (carpets, grooved surfaces)
Deformable terrain: Soft ground, grass, sand -- robot feet sink
Foot geometry: Simulation uses simple collision shapes, real feet have complex geometry

Gap 3: Sensor Noise and Latency

Locomotion-specific issues:

IMU drift: Gyroscope drift accumulates over time, especially problematic for balance
Joint encoder noise: +/- 0.1-0.5 degrees, accumulates through kinematic chain
Foot contact detection: Binary contact sensors inaccurate -- usually threshold foot force
Communication latency: 1-5ms bus delay between sensors and controller

Gap 4: Unmodeled Dynamics

Things simulation usually ignores:

Cable routing: Wires affect joint movement
Structural flexibility: Mechanics not perfectly rigid
Thermal effects: Motor heats → performance changes over time
Battery voltage drop: Voltage decreases → motor output changes

Solution 1: Actuator Network

Actuator network is the most important technique for locomotion sim-to-real, introduced by Hwangbo et al. in "Learning Agile and Dynamic Motor Skills for Legged Robots" on ANYmal.

Concept

Instead of ideal motor model in simulation, train a neural network to predict actual motor response:

Input: [position_error_history, velocity_history, command_history]
       (window 50-100 timesteps)
Output: predicted_actual_torque

Data Collection

Place real robot on stand (feet off ground)
Send random joint commands (sinusoidal, step, random)
Record: commanded position, actual position, actual velocity, actual torque (if available)
~30 minutes data sufficient for one robot

Training Actuator Network

class ActuatorNet(nn.Module):
    def __init__(self, history_length=50):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(history_length * 3, 256),  # pos_err, vel, cmd
            nn.ELU(),
            nn.Linear(256, 128),
            nn.ELU(),
            nn.Linear(128, 1)  # predicted torque
        )

    def forward(self, pos_error_hist, vel_hist, cmd_hist):
        x = torch.cat([pos_error_hist, vel_hist, cmd_hist], dim=-1)
        return self.net(x)

Integrating into Simulation

Replace the default motor model with the actuator network:

# In simulation loop
for step in range(num_steps):
    action = policy(obs)                    # RL policy output
    # Instead of: sim.set_torque(action)
    predicted_torque = actuator_net(        # Pass through actuator network
        pos_error_history, vel_history, action_history
    )
    sim.set_torque(predicted_torque)        # Apply predicted torque
    sim.step()

Results

ANYmal with actuator network:

~500K timesteps/second simulation speed (fast enough for RL)
Significantly reduced sim-to-real gap: Policy transfers successfully without heavy domain randomization
Actuator network runs at ~50K Hz inference -- lighter than physics simulation

Limitations

Requires torque sensing on robot (not all robots have it)
Actuator network is specific to each robot -- must re-collect data and retrain for new robots
Doesn't capture temperature-dependent changes (unless data collected at multiple temperatures)

Solution 2: Domain Randomization for Locomotion

Domain randomization (DR) is a "brute force" approach -- instead of modeling the real world accurately, randomize simulation parameters so the policy is robust to all variations.

Recommended Ranges for Locomotion

Based on papers from ETH Zurich (ANYmal), CMU (Unitree), Berkeley (Cassie):

Parameter	Range	Notes
Body mass	+/- 15-25%	Simulate payload variation
CoM position	+/- 3-5 cm	Manufacturing tolerance
Friction coefficient	0.3 - 2.0	Diverse floor types
Restitution	0.0 - 0.5	Bounce on contact
Motor strength	+/- 10-20%	Motor variation
Joint damping	+/- 30%	Mechanical wear
Action delay	0 - 20 ms	Communication latency
PD gains	+/- 20%	Controller variation
Terrain height noise	+/- 2-5 cm	Uneven ground
Push force	0 - 80 N	External perturbation
Gravity direction	+/- 3 degrees	IMU calibration error

Adaptive Domain Randomization

Instead of fixed ranges, adaptive DR adjusts ranges based on performance:

# Automatic Domain Randomization (ADR)
for epoch in training:
    success_rate = evaluate(policy, current_dr_ranges)
    if success_rate > 0.8:
        # Expand ranges (make harder)
        dr_ranges *= 1.1
    elif success_rate < 0.5:
        # Shrink ranges (make easier)
        dr_ranges *= 0.9

OpenAI used ADR for Rubik's Cube manipulation. A similar approach applies to locomotion but requires careful tuning -- overly aggressive expansion can create impossible scenarios.

Solution 3: Terrain Curriculum

Specific to locomotion, terrain curriculum is a form of domain randomization for ground geometry.

Progressive Difficulty

Phase 1 (0-500 epochs):     Flat ground only
Phase 2 (500-1000 epochs):  + Rough terrain (2cm noise)
Phase 3 (1000-1500 epochs): + Slopes (5-15 degrees)
Phase 4 (1500-2000 epochs): + Stairs (10-20cm steps)
Phase 5 (2000+ epochs):     + Mixed (all terrains random)

Important Terrain Types

Height noise terrain: Random height perturbations, simulate uneven ground
Discrete obstacles: Random boxes/barriers on flat ground
Gaps: Missing ground sections (robot must step over)
Slopes: Inclines and declines
Stairs: Regular steps, ascending and descending
Stepping stones: Discrete footholds with gaps between

Real-World Success Rates

Based on published results and reported numbers from research teams:

Platform	Task	Sim Success	Real Success	Gap
ANYmal-D	Rough terrain walking	98%	~93%	5%
Unitree A1	Stair climbing	95%	~91%	4%
Cassie	Flat walking	99%	~95%	4%
Digit	Warehouse walking	92%	~84%	8%
Unitree H1	Flat walking	97%	~90%	7%
ANYmal-C	Parkour (mixed)	85%	~75%	10%

Observations:

Quadruped > Bipedal: ANYmal/A1 consistently transfer better than Digit/H1
Flat > Complex terrain: Gap increases as terrain difficulty increases
Actuator quality matters: QDD actuators (Berkeley Humanoid) have smaller gaps than geared motors

Step-by-Step: Train → Deploy Locomotion Policy

Step 1: Setup Simulation

# Clone Humanoid-Gym or legged_gym
git clone https://github.com/roboterax/humanoid-gym
cd humanoid-gym
pip install -e .

# Or for quadruped
git clone https://github.com/leggedrobotics/legged_gym

Configure robot URDF, reward weights, DR ranges.

Step 2: Train with Progressive Curriculum

# Config example
train_cfg = {
    "terrain": {
        "curriculum": True,
        "terrain_types": ["flat", "rough", "slope", "stairs"],
        "difficulty_scale": [0.0, 0.25, 0.5, 0.75, 1.0],
    },
    "domain_randomization": {
        "mass_range": [-0.15, 0.15],        # +/- 15%
        "friction_range": [0.4, 1.8],
        "motor_strength_range": [0.9, 1.1],
        "push_force_range": [0, 50],          # Newtons
        "action_delay": [0, 0.02],            # seconds
    },
    "reward": {
        "forward_vel": 1.0,
        "upright": 0.5,
        "energy": -0.001,
        "action_rate": -0.01,
        "feet_air_time": 0.5,                # Encourage lifting feet
    }
}

Train 2000-5000 epochs, ~2-8 hours on RTX 4090 (4096 parallel envs).

Step 3: Evaluate in Simulation

Before deployment, evaluate thoroughly in simulation:

# Test scenarios
test_scenarios = [
    "flat_ground_forward_1.0ms",
    "flat_ground_turning_0.5rads",
    "rough_terrain_forward_0.5ms",
    "slope_15deg_ascending",
    "stairs_15cm_ascending",
    "push_recovery_30N_lateral",
    "push_recovery_50N_lateral",
]

for scenario in test_scenarios:
    success_rate = evaluate(policy, scenario, n_trials=100)
    print(f"{scenario}: {success_rate:.1%}")
    assert success_rate > 0.85, f"FAIL: {scenario}"

Red flags in sim evaluation:

Success rate < 90% on flat ground → policy hasn't converged
Robot falls when pushed 30N → balance reward too low
Shuffling gait → need periodic gait reward
High energy consumption → energy penalty too low

Step 4: Sim-to-Sim Verification (Optional but Recommended)

Transfer policy from Isaac Gym to MuJoCo (or vice versa):

# Load policy trained in Isaac Gym
policy = load_policy("checkpoints/best_policy.pt")

# Test in MuJoCo
mujoco_env = create_mujoco_env(robot_urdf)
for episode in range(50):
    obs = mujoco_env.reset()
    for step in range(1000):
        action = policy(obs)
        obs, reward, done, info = mujoco_env.step(action)
        if done:
            break
    print(f"Episode {episode}: survived {step} steps")

If the policy works well in both simulators, it's likely to transfer well to the real robot.

Step 5: Real-World Deployment

Deployment checklist:

Safety harness for robot (especially for bipedal)
Flat ground first, complex terrain after
Start with low velocity commands (0.3 m/s)
Monitor motor temperatures
Record data for debugging

# Deploy script (simplified)
import robot_sdk

robot = robot_sdk.connect("192.168.1.100")
policy = load_policy("checkpoints/best_policy.pt")

# Safety limits
MAX_VEL = 0.5  # Start slow
MAX_TORQUE = 0.8 * robot.max_torque  # 80% limit

while True:
    obs = robot.get_observation()
    action = policy(obs)
    action = clip(action, -MAX_TORQUE, MAX_TORQUE)

    robot.set_action(action)
    robot.step()

    # Safety check
    if robot.is_fallen():
        robot.stop()
        break

Step 6: Iterate

After initial deployment, observe failure modes and return to sim:

Robot slips on floor → increase friction randomization range
Motor overheats → add torque penalty, reduce max velocity
Robot unstable on real terrain → add terrain noise in sim
Latency causes instability → increase action delay randomization

Common Failure Modes and Fixes

Failure 1: Robot Falls Immediately When Set Down

Cause: Initial state differs between sim and real. Sim usually starts from perfect standing, real robot has slight imbalance.

Fix: Randomize initial state in sim (body orientation +/- 5 degrees, velocity +/- 0.1 m/s).

Failure 2: Gait is Not Smooth, Jerky

Cause: Action delay in real system is larger than sim. Policy output changes fast but robot responds slow.

Fix: (a) Add action delay randomization (0-20ms), (b) Add action rate penalty in reward, (c) Reduce control frequency.

Failure 3: Robot Slips When Turning

Cause: Friction coefficient in sim differs from real. Policy learned based on higher friction.

Fix: Randomize friction coefficient wider (0.3-2.0). Add lateral velocity penalty.

Failure 4: Motor Overheats After a Few Minutes

Cause: Policy uses too much torque. Sim doesn't model thermal limits.

Fix: (a) Increase energy penalty weight, (b) Add torque RMS constraint, (c) Reduce max action magnitude.

Failure 5: Policy Performs Differently Each Deployment

Cause: Policy is sensitive to initial conditions or sensor noise. Not robust.

Fix: Increase domain randomization ranges, especially sensor noise and initial state randomization.

Important Papers on Sim-to-Real Locomotion

Essential papers to read:

Learning Agile and Dynamic Motor Skills for Legged Robots (arXiv:1901.08652) -- Hwangbo et al., 2019. Actuator network for ANYmal, foundational paper.
Learning Robust, Agile, Natural Legged Locomotion Skills in the Wild (arXiv:2304.10888) -- Adversarial training + teacher-student for natural locomotion on wild terrain.
Reinforcement Learning for Versatile, Dynamic, and Robust Bipedal Locomotion Control (arXiv:2401.16889) -- Comprehensive bipedal RL framework on Cassie.
Humanoid-Gym: RL for Humanoid Robot with Zero-Shot Sim2Real Transfer (arXiv:2404.05695) -- Open-source framework, verified zero-shot transfer.
Berkeley Humanoid: A Research Platform for Learning-based Control (arXiv:2407.21781) -- Hardware co-design makes sim-to-real easier.
Towards Bridging the Gap: Systematic Sim-to-Real Transfer for Diverse Legged Robots (arXiv:2509.06342) -- PACE: joint-space dynamics alignment, alternative to actuator networks.

Best Practices Summary

From experience of leading teams (ETH RSL, CMU, Berkeley, Unitree):

1. Start Simple, Add Complexity Gradually

Flat ground walking first
Add terrain curriculum after flat walking is stable
Add vision after proprioceptive policy is robust

2. Actuator Modeling > Domain Randomization

If you have torque sensor → train actuator network (best)
If not → aggressive motor strength randomization (+/- 20%)
QDD actuators reduce need for both

3. Sim-to-Sim Before Sim-to-Real

Train Isaac Gym → test MuJoCo (or vice versa)
If transfer works between two sims → likely works on real robot

4. Safety First When Deploying

Harness for bipedal (always)
Start low velocity, increase gradually
Monitor motor temperature real-time
Accessible kill switch

5. Log Everything

Record all observations, actions, rewards in real deployment
Compare distributions with sim data → identify gaps
Use real data to refine sim parameters

Conclusion

Sim-to-real for locomotion combines multiple techniques: actuator networks for accurate motor modeling, domain randomization for robustness, terrain curriculum for progressive difficulty, and careful engineering for deployment. There's no single silver bullet -- each robot and task needs a different combination.

Key takeaway: Hardware co-design matters. Berkeley Humanoid with QDD actuators needs less sim-to-real effort than robots with complex geared transmissions. If you're designing a new robot, choose actuators that are easy to simulate.

Read the full series:

Part 1: Locomotion Basics -- From ZMP to CPG
Part 2: RL for Locomotion -- PPO, reward shaping, curriculum
Part 3: Quadruped Locomotion -- legged_gym to Unitree Go2
Part 4: Walk These Ways -- Adaptive locomotion
Part 5: Robot Parkour -- Jumping and climbing
Part 6: Bipedal Walking -- 2-legged robots with RL