← Back to Blog
locomotionlocomotionsim2realrl

Sim-to-Real for Locomotion: Reality and Experience

Transferring locomotion policy from simulation to real robot — domain randomization, actuator networks, and best practices.

Nguyen Anh Tuan1 tháng 3, 202612 min read
Sim-to-Real for Locomotion: Reality and Experience

Sim-to-Real for Locomotion: Not Just Domain Randomization

If you've read the general sim-to-real overview (Sim-to-Real Transfer: Train in Simulation, Run in Reality, Domain Randomization), you understand the basics. This post dives deep into sim-to-real specific to locomotion -- the challenges and solutions that only appear when training robots to walk, run, and jump.

Locomotion sim-to-real differs from manipulation sim-to-real in 3 key ways:

  1. Contact dynamics more complex: Robot feet continuously contact the ground, with impact forces varying by terrain, velocity, and contact angle
  2. Actuator dynamics more critical: Locomotion pushes actuators to their limits (high torque, high speed) -- where motor response is most nonlinear
  3. Error accumulation faster: Each step error affects the next step. After a few seconds, small errors snowball into robot falling

Let me analyze each gap and corresponding solutions, sharing best practices from leading teams.

Primary Sim-to-Real Gaps in Locomotion

Gap 1: Actuator Dynamics

Problem: In simulation, motors are typically modeled as ideal torque sources -- you command torque, motor outputs that torque instantly. Reality is completely different:

This is the biggest gap for locomotion because robots continuously operate near actuator limits (jumping, running fast, recovery from perturbation).

Gap 2: Terrain Friction and Contact

Problem: Simulation uses simplified contact models (usually Coulomb friction with fixed coefficient). Reality is much more complex:

Gap 3: Sensor Noise and Latency

Locomotion-specific issues:

Gap 4: Unmodeled Dynamics

Things simulation usually ignores:

Gap between simulation and reality in robot locomotion

Solution 1: Actuator Network

Actuator network is the most important technique for locomotion sim-to-real, introduced by Hwangbo et al. in "Learning Agile and Dynamic Motor Skills for Legged Robots" on ANYmal.

Concept

Instead of ideal motor model in simulation, train a neural network to predict actual motor response:

Input: [position_error_history, velocity_history, command_history]
       (window 50-100 timesteps)
Output: predicted_actual_torque

Data Collection

  1. Place real robot on stand (feet off ground)
  2. Send random joint commands (sinusoidal, step, random)
  3. Record: commanded position, actual position, actual velocity, actual torque (if available)
  4. ~30 minutes data sufficient for one robot

Training Actuator Network

class ActuatorNet(nn.Module):
    def __init__(self, history_length=50):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(history_length * 3, 256),  # pos_err, vel, cmd
            nn.ELU(),
            nn.Linear(256, 128),
            nn.ELU(),
            nn.Linear(128, 1)  # predicted torque
        )

    def forward(self, pos_error_hist, vel_hist, cmd_hist):
        x = torch.cat([pos_error_hist, vel_hist, cmd_hist], dim=-1)
        return self.net(x)

Integrating into Simulation

Replace the default motor model with the actuator network:

# In simulation loop
for step in range(num_steps):
    action = policy(obs)                    # RL policy output
    # Instead of: sim.set_torque(action)
    predicted_torque = actuator_net(        # Pass through actuator network
        pos_error_history, vel_history, action_history
    )
    sim.set_torque(predicted_torque)        # Apply predicted torque
    sim.step()

Results

ANYmal with actuator network:

Limitations

Solution 2: Domain Randomization for Locomotion

Domain randomization (DR) is a "brute force" approach -- instead of modeling the real world accurately, randomize simulation parameters so the policy is robust to all variations.

Recommended Ranges for Locomotion

Based on papers from ETH Zurich (ANYmal), CMU (Unitree), Berkeley (Cassie):

Parameter Range Notes
Body mass +/- 15-25% Simulate payload variation
CoM position +/- 3-5 cm Manufacturing tolerance
Friction coefficient 0.3 - 2.0 Diverse floor types
Restitution 0.0 - 0.5 Bounce on contact
Motor strength +/- 10-20% Motor variation
Joint damping +/- 30% Mechanical wear
Action delay 0 - 20 ms Communication latency
PD gains +/- 20% Controller variation
Terrain height noise +/- 2-5 cm Uneven ground
Push force 0 - 80 N External perturbation
Gravity direction +/- 3 degrees IMU calibration error

Adaptive Domain Randomization

Instead of fixed ranges, adaptive DR adjusts ranges based on performance:

# Automatic Domain Randomization (ADR)
for epoch in training:
    success_rate = evaluate(policy, current_dr_ranges)
    if success_rate > 0.8:
        # Expand ranges (make harder)
        dr_ranges *= 1.1
    elif success_rate < 0.5:
        # Shrink ranges (make easier)
        dr_ranges *= 0.9

OpenAI used ADR for Rubik's Cube manipulation. A similar approach applies to locomotion but requires careful tuning -- overly aggressive expansion can create impossible scenarios.

Solution 3: Terrain Curriculum

Specific to locomotion, terrain curriculum is a form of domain randomization for ground geometry.

Progressive Difficulty

Phase 1 (0-500 epochs):     Flat ground only
Phase 2 (500-1000 epochs):  + Rough terrain (2cm noise)
Phase 3 (1000-1500 epochs): + Slopes (5-15 degrees)
Phase 4 (1500-2000 epochs): + Stairs (10-20cm steps)
Phase 5 (2000+ epochs):     + Mixed (all terrains random)

Important Terrain Types

  1. Height noise terrain: Random height perturbations, simulate uneven ground
  2. Discrete obstacles: Random boxes/barriers on flat ground
  3. Gaps: Missing ground sections (robot must step over)
  4. Slopes: Inclines and declines
  5. Stairs: Regular steps, ascending and descending
  6. Stepping stones: Discrete footholds with gaps between

Real-World Success Rates

Based on published results and reported numbers from research teams:

Platform Task Sim Success Real Success Gap
ANYmal-D Rough terrain walking 98% ~93% 5%
Unitree A1 Stair climbing 95% ~91% 4%
Cassie Flat walking 99% ~95% 4%
Digit Warehouse walking 92% ~84% 8%
Unitree H1 Flat walking 97% ~90% 7%
ANYmal-C Parkour (mixed) 85% ~75% 10%

Observations:

Step-by-Step: Train → Deploy Locomotion Policy

Step 1: Setup Simulation

# Clone Humanoid-Gym or legged_gym
git clone https://github.com/roboterax/humanoid-gym
cd humanoid-gym
pip install -e .

# Or for quadruped
git clone https://github.com/leggedrobotics/legged_gym

Configure robot URDF, reward weights, DR ranges.

Step 2: Train with Progressive Curriculum

# Config example
train_cfg = {
    "terrain": {
        "curriculum": True,
        "terrain_types": ["flat", "rough", "slope", "stairs"],
        "difficulty_scale": [0.0, 0.25, 0.5, 0.75, 1.0],
    },
    "domain_randomization": {
        "mass_range": [-0.15, 0.15],        # +/- 15%
        "friction_range": [0.4, 1.8],
        "motor_strength_range": [0.9, 1.1],
        "push_force_range": [0, 50],          # Newtons
        "action_delay": [0, 0.02],            # seconds
    },
    "reward": {
        "forward_vel": 1.0,
        "upright": 0.5,
        "energy": -0.001,
        "action_rate": -0.01,
        "feet_air_time": 0.5,                # Encourage lifting feet
    }
}

Train 2000-5000 epochs, ~2-8 hours on RTX 4090 (4096 parallel envs).

Step 3: Evaluate in Simulation

Before deployment, evaluate thoroughly in simulation:

# Test scenarios
test_scenarios = [
    "flat_ground_forward_1.0ms",
    "flat_ground_turning_0.5rads",
    "rough_terrain_forward_0.5ms",
    "slope_15deg_ascending",
    "stairs_15cm_ascending",
    "push_recovery_30N_lateral",
    "push_recovery_50N_lateral",
]

for scenario in test_scenarios:
    success_rate = evaluate(policy, scenario, n_trials=100)
    print(f"{scenario}: {success_rate:.1%}")
    assert success_rate > 0.85, f"FAIL: {scenario}"

Red flags in sim evaluation:

Step 4: Sim-to-Sim Verification (Optional but Recommended)

Transfer policy from Isaac Gym to MuJoCo (or vice versa):

# Load policy trained in Isaac Gym
policy = load_policy("checkpoints/best_policy.pt")

# Test in MuJoCo
mujoco_env = create_mujoco_env(robot_urdf)
for episode in range(50):
    obs = mujoco_env.reset()
    for step in range(1000):
        action = policy(obs)
        obs, reward, done, info = mujoco_env.step(action)
        if done:
            break
    print(f"Episode {episode}: survived {step} steps")

If the policy works well in both simulators, it's likely to transfer well to the real robot.

Sim-to-real deployment process for robot locomotion

Step 5: Real-World Deployment

Deployment checklist:

# Deploy script (simplified)
import robot_sdk

robot = robot_sdk.connect("192.168.1.100")
policy = load_policy("checkpoints/best_policy.pt")

# Safety limits
MAX_VEL = 0.5  # Start slow
MAX_TORQUE = 0.8 * robot.max_torque  # 80% limit

while True:
    obs = robot.get_observation()
    action = policy(obs)
    action = clip(action, -MAX_TORQUE, MAX_TORQUE)

    robot.set_action(action)
    robot.step()

    # Safety check
    if robot.is_fallen():
        robot.stop()
        break

Step 6: Iterate

After initial deployment, observe failure modes and return to sim:

  1. Robot slips on floor → increase friction randomization range
  2. Motor overheats → add torque penalty, reduce max velocity
  3. Robot unstable on real terrain → add terrain noise in sim
  4. Latency causes instability → increase action delay randomization

Common Failure Modes and Fixes

Failure 1: Robot Falls Immediately When Set Down

Cause: Initial state differs between sim and real. Sim usually starts from perfect standing, real robot has slight imbalance.

Fix: Randomize initial state in sim (body orientation +/- 5 degrees, velocity +/- 0.1 m/s).

Failure 2: Gait is Not Smooth, Jerky

Cause: Action delay in real system is larger than sim. Policy output changes fast but robot responds slow.

Fix: (a) Add action delay randomization (0-20ms), (b) Add action rate penalty in reward, (c) Reduce control frequency.

Failure 3: Robot Slips When Turning

Cause: Friction coefficient in sim differs from real. Policy learned based on higher friction.

Fix: Randomize friction coefficient wider (0.3-2.0). Add lateral velocity penalty.

Failure 4: Motor Overheats After a Few Minutes

Cause: Policy uses too much torque. Sim doesn't model thermal limits.

Fix: (a) Increase energy penalty weight, (b) Add torque RMS constraint, (c) Reduce max action magnitude.

Failure 5: Policy Performs Differently Each Deployment

Cause: Policy is sensitive to initial conditions or sensor noise. Not robust.

Fix: Increase domain randomization ranges, especially sensor noise and initial state randomization.

Important Papers on Sim-to-Real Locomotion

Essential papers to read:

  1. Learning Agile and Dynamic Motor Skills for Legged Robots (arXiv:1901.08652) -- Hwangbo et al., 2019. Actuator network for ANYmal, foundational paper.

  2. Learning Robust, Agile, Natural Legged Locomotion Skills in the Wild (arXiv:2304.10888) -- Adversarial training + teacher-student for natural locomotion on wild terrain.

  3. Reinforcement Learning for Versatile, Dynamic, and Robust Bipedal Locomotion Control (arXiv:2401.16889) -- Comprehensive bipedal RL framework on Cassie.

  4. Humanoid-Gym: RL for Humanoid Robot with Zero-Shot Sim2Real Transfer (arXiv:2404.05695) -- Open-source framework, verified zero-shot transfer.

  5. Berkeley Humanoid: A Research Platform for Learning-based Control (arXiv:2407.21781) -- Hardware co-design makes sim-to-real easier.

  6. Towards Bridging the Gap: Systematic Sim-to-Real Transfer for Diverse Legged Robots (arXiv:2509.06342) -- PACE: joint-space dynamics alignment, alternative to actuator networks.

Best Practices Summary

From experience of leading teams (ETH RSL, CMU, Berkeley, Unitree):

1. Start Simple, Add Complexity Gradually

2. Actuator Modeling > Domain Randomization

3. Sim-to-Sim Before Sim-to-Real

4. Safety First When Deploying

5. Log Everything

Conclusion

Sim-to-real for locomotion combines multiple techniques: actuator networks for accurate motor modeling, domain randomization for robustness, terrain curriculum for progressive difficulty, and careful engineering for deployment. There's no single silver bullet -- each robot and task needs a different combination.

Key takeaway: Hardware co-design matters. Berkeley Humanoid with QDD actuators needs less sim-to-real effort than robots with complex geared transmissions. If you're designing a new robot, choose actuators that are easy to simulate.

Read the full series:

Related articles in Simulation series:


Related Articles

Related Posts

TutorialSim-to-Real Pipeline: Từ training đến robot thật
simulationsim2realtutorialPart 5

Sim-to-Real Pipeline: Từ training đến robot thật

End-to-end guide: train policy trong sim, evaluate, domain randomization, deploy lên robot thật và iterate.

2/4/202615 min read
TutorialNVIDIA Isaac Lab: GPU-accelerated RL training từ zero
simulationisaac-simrlPart 3

NVIDIA Isaac Lab: GPU-accelerated RL training từ zero

Setup Isaac Lab, train locomotion policy với 4096 parallel environments và domain randomization trên GPU.

1/4/202611 min read
Deep DiveDomain Randomization: Chìa khóa Sim-to-Real Transfer
simulationsim2realrlPart 4

Domain Randomization: Chìa khóa Sim-to-Real Transfer

Lý thuyết và thực hành domain randomization — visual, dynamics, sensor randomization với case studies thành công.

26/3/202610 min read