VnRobo
AboutPricingBlogContact
🇻🇳VISign InStart Free Trial
🇻🇳VI
VnRobo logo

AI infrastructure for next-generation industrial robots.

Product

  • Features
  • Pricing
  • Knowledge Base
  • Services

Company

  • About Us
  • Blog
  • Contact

Legal

  • Privacy Policy
  • Terms of Service

© 2026 VnRobo. All rights reserved.

Made with♥in Vietnam
VnRobo
AboutPricingBlogContact
🇻🇳VISign InStart Free Trial
🇻🇳VI
  1. Home
  2. Blog
  3. Sim-to-Real for Locomotion: Reality and Experience
locomotionlocomotionsim2realrl

Sim-to-Real for Locomotion: Reality and Experience

Transferring locomotion policy from simulation to real robot — domain randomization, actuator networks, and best practices.

Nguyen Anh TuanMarch 1, 202612 min readUpdated: Jun 14, 2026
Sim-to-Real for Locomotion: Reality and Experience

Sim-to-Real for Locomotion: Not Just Domain Randomization

If you've read the general sim-to-real overview (Sim-to-Real Transfer: Train in Simulation, Run in Reality, Domain Randomization), you understand the basics. This post dives deep into sim-to-real specific to locomotion -- the challenges and solutions that only appear when training robots to walk, run, and jump.

Locomotion sim-to-real differs from manipulation sim-to-real in 3 key ways:

  1. Contact dynamics more complex: Robot feet continuously contact the ground, with impact forces varying by terrain, velocity, and contact angle
  2. Actuator dynamics more critical: Locomotion pushes actuators to their limits (high torque, high speed) -- where motor response is most nonlinear
  3. Error accumulation faster: Each step error affects the next step. After a few seconds, small errors snowball into robot falling

Let me analyze each gap and corresponding solutions, sharing best practices from leading teams.

Primary Sim-to-Real Gaps in Locomotion

Gap 1: Actuator Dynamics

Problem: In simulation, motors are typically modeled as ideal torque sources -- you command torque, motor outputs that torque instantly. Reality is completely different:

  • Torque response has delay: 5-20ms latency from command to actual torque
  • Velocity-dependent torque: Motor weaker at high speeds (back-EMF effect)
  • Temperature-dependent: Motor hot → output torque decreases
  • Friction and backlash: Gears have friction, backlash causes hysteresis
  • Torque saturation: Non-linear clipping near maximum torque

This is the biggest gap for locomotion because robots continuously operate near actuator limits (jumping, running fast, recovery from perturbation).

Gap 2: Terrain Friction and Contact

Problem: Simulation uses simplified contact models (usually Coulomb friction with fixed coefficient). Reality is much more complex:

  • Friction varies: Wood floor (mu=0.4), carpet (mu=0.8), wet floor (mu=0.2)
  • Anisotropic friction: Friction differs by direction (carpets, grooved surfaces)
  • Deformable terrain: Soft ground, grass, sand -- robot feet sink
  • Foot geometry: Simulation uses simple collision shapes, real feet have complex geometry

Gap 3: Sensor Noise and Latency

Locomotion-specific issues:

  • IMU drift: Gyroscope drift accumulates over time, especially problematic for balance
  • Joint encoder noise: +/- 0.1-0.5 degrees, accumulates through kinematic chain
  • Foot contact detection: Binary contact sensors inaccurate -- usually threshold foot force
  • Communication latency: 1-5ms bus delay between sensors and controller

Gap 4: Unmodeled Dynamics

Things simulation usually ignores:

  • Cable routing: Wires affect joint movement
  • Structural flexibility: Mechanics not perfectly rigid
  • Thermal effects: Motor heats → performance changes over time
  • Battery voltage drop: Voltage decreases → motor output changes

Gap between simulation and reality in robot locomotion
Gap between simulation and reality in robot locomotion

Solution 1: Actuator Network

Actuator network is the most important technique for locomotion sim-to-real, introduced by Hwangbo et al. in "Learning Agile and Dynamic Motor Skills for Legged Robots" on ANYmal.

Concept

Instead of ideal motor model in simulation, train a neural network to predict actual motor response:

Input: [position_error_history, velocity_history, command_history]
       (window 50-100 timesteps)
Output: predicted_actual_torque

Data Collection

  1. Place real robot on stand (feet off ground)
  2. Send random joint commands (sinusoidal, step, random)
  3. Record: commanded position, actual position, actual velocity, actual torque (if available)
  4. ~30 minutes data sufficient for one robot

Training Actuator Network

class ActuatorNet(nn.Module):
    def __init__(self, history_length=50):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(history_length * 3, 256),  # pos_err, vel, cmd
            nn.ELU(),
            nn.Linear(256, 128),
            nn.ELU(),
            nn.Linear(128, 1)  # predicted torque
        )

    def forward(self, pos_error_hist, vel_hist, cmd_hist):
        x = torch.cat([pos_error_hist, vel_hist, cmd_hist], dim=-1)
        return self.net(x)

Integrating into Simulation

Replace the default motor model with the actuator network:

# In simulation loop
for step in range(num_steps):
    action = policy(obs)                    # RL policy output
    # Instead of: sim.set_torque(action)
    predicted_torque = actuator_net(        # Pass through actuator network
        pos_error_history, vel_history, action_history
    )
    sim.set_torque(predicted_torque)        # Apply predicted torque
    sim.step()

Results

ANYmal with actuator network:

  • ~500K timesteps/second simulation speed (fast enough for RL)
  • Significantly reduced sim-to-real gap: Policy transfers successfully without heavy domain randomization
  • Actuator network runs at ~50K Hz inference -- lighter than physics simulation

Limitations

  • Requires torque sensing on robot (not all robots have it)
  • Actuator network is specific to each robot -- must re-collect data and retrain for new robots
  • Doesn't capture temperature-dependent changes (unless data collected at multiple temperatures)

Solution 2: Domain Randomization for Locomotion

Domain randomization (DR) is a "brute force" approach -- instead of modeling the real world accurately, randomize simulation parameters so the policy is robust to all variations.

Recommended Ranges for Locomotion

Based on papers from ETH Zurich (ANYmal), CMU (Unitree), Berkeley (Cassie):

Parameter Range Notes
Body mass +/- 15-25% Simulate payload variation
CoM position +/- 3-5 cm Manufacturing tolerance
Friction coefficient 0.3 - 2.0 Diverse floor types
Restitution 0.0 - 0.5 Bounce on contact
Motor strength +/- 10-20% Motor variation
Joint damping +/- 30% Mechanical wear
Action delay 0 - 20 ms Communication latency
PD gains +/- 20% Controller variation
Terrain height noise +/- 2-5 cm Uneven ground
Push force 0 - 80 N External perturbation
Gravity direction +/- 3 degrees IMU calibration error

Adaptive Domain Randomization

Instead of fixed ranges, adaptive DR adjusts ranges based on performance:

# Automatic Domain Randomization (ADR)
for epoch in training:
    success_rate = evaluate(policy, current_dr_ranges)
    if success_rate > 0.8:
        # Expand ranges (make harder)
        dr_ranges *= 1.1
    elif success_rate < 0.5:
        # Shrink ranges (make easier)
        dr_ranges *= 0.9

OpenAI used ADR for Rubik's Cube manipulation. A similar approach applies to locomotion but requires careful tuning -- overly aggressive expansion can create impossible scenarios.

Solution 3: Terrain Curriculum

Specific to locomotion, terrain curriculum is a form of domain randomization for ground geometry.

Progressive Difficulty

Phase 1 (0-500 epochs):     Flat ground only
Phase 2 (500-1000 epochs):  + Rough terrain (2cm noise)
Phase 3 (1000-1500 epochs): + Slopes (5-15 degrees)
Phase 4 (1500-2000 epochs): + Stairs (10-20cm steps)
Phase 5 (2000+ epochs):     + Mixed (all terrains random)

Important Terrain Types

  1. Height noise terrain: Random height perturbations, simulate uneven ground
  2. Discrete obstacles: Random boxes/barriers on flat ground
  3. Gaps: Missing ground sections (robot must step over)
  4. Slopes: Inclines and declines
  5. Stairs: Regular steps, ascending and descending
  6. Stepping stones: Discrete footholds with gaps between

Real-World Success Rates

Based on published results and reported numbers from research teams:

Platform Task Sim Success Real Success Gap
ANYmal-D Rough terrain walking 98% ~93% 5%
Unitree A1 Stair climbing 95% ~91% 4%
Cassie Flat walking 99% ~95% 4%
Digit Warehouse walking 92% ~84% 8%
Unitree H1 Flat walking 97% ~90% 7%
ANYmal-C Parkour (mixed) 85% ~75% 10%

Observations:

  • Quadruped > Bipedal: ANYmal/A1 consistently transfer better than Digit/H1
  • Flat > Complex terrain: Gap increases as terrain difficulty increases
  • Actuator quality matters: QDD actuators (Berkeley Humanoid) have smaller gaps than geared motors

Step-by-Step: Train → Deploy Locomotion Policy

Step 1: Setup Simulation

# Clone Humanoid-Gym or legged_gym
git clone https://github.com/roboterax/humanoid-gym
cd humanoid-gym
pip install -e .

# Or for quadruped
git clone https://github.com/leggedrobotics/legged_gym

Configure robot URDF, reward weights, DR ranges.

Step 2: Train with Progressive Curriculum

# Config example
train_cfg = {
    "terrain": {
        "curriculum": True,
        "terrain_types": ["flat", "rough", "slope", "stairs"],
        "difficulty_scale": [0.0, 0.25, 0.5, 0.75, 1.0],
    },
    "domain_randomization": {
        "mass_range": [-0.15, 0.15],        # +/- 15%
        "friction_range": [0.4, 1.8],
        "motor_strength_range": [0.9, 1.1],
        "push_force_range": [0, 50],          # Newtons
        "action_delay": [0, 0.02],            # seconds
    },
    "reward": {
        "forward_vel": 1.0,
        "upright": 0.5,
        "energy": -0.001,
        "action_rate": -0.01,
        "feet_air_time": 0.5,                # Encourage lifting feet
    }
}

Train 2000-5000 epochs, ~2-8 hours on RTX 4090 (4096 parallel envs).

Step 3: Evaluate in Simulation

Before deployment, evaluate thoroughly in simulation:

# Test scenarios
test_scenarios = [
    "flat_ground_forward_1.0ms",
    "flat_ground_turning_0.5rads",
    "rough_terrain_forward_0.5ms",
    "slope_15deg_ascending",
    "stairs_15cm_ascending",
    "push_recovery_30N_lateral",
    "push_recovery_50N_lateral",
]

for scenario in test_scenarios:
    success_rate = evaluate(policy, scenario, n_trials=100)
    print(f"{scenario}: {success_rate:.1%}")
    assert success_rate > 0.85, f"FAIL: {scenario}"

Red flags in sim evaluation:

  • Success rate < 90% on flat ground → policy hasn't converged
  • Robot falls when pushed 30N → balance reward too low
  • Shuffling gait → need periodic gait reward
  • High energy consumption → energy penalty too low

Step 4: Sim-to-Sim Verification (Optional but Recommended)

Transfer policy from Isaac Gym to MuJoCo (or vice versa):

# Load policy trained in Isaac Gym
policy = load_policy("checkpoints/best_policy.pt")

# Test in MuJoCo
mujoco_env = create_mujoco_env(robot_urdf)
for episode in range(50):
    obs = mujoco_env.reset()
    for step in range(1000):
        action = policy(obs)
        obs, reward, done, info = mujoco_env.step(action)
        if done:
            break
    print(f"Episode {episode}: survived {step} steps")

If the policy works well in both simulators, it's likely to transfer well to the real robot.

Sim-to-real deployment process for robot locomotion
Sim-to-real deployment process for robot locomotion

Step 5: Real-World Deployment

Deployment checklist:

  • Safety harness for robot (especially for bipedal)
  • Flat ground first, complex terrain after
  • Start with low velocity commands (0.3 m/s)
  • Monitor motor temperatures
  • Record data for debugging
# Deploy script (simplified)
import robot_sdk

robot = robot_sdk.connect("192.168.1.100")
policy = load_policy("checkpoints/best_policy.pt")

# Safety limits
MAX_VEL = 0.5  # Start slow
MAX_TORQUE = 0.8 * robot.max_torque  # 80% limit

while True:
    obs = robot.get_observation()
    action = policy(obs)
    action = clip(action, -MAX_TORQUE, MAX_TORQUE)

    robot.set_action(action)
    robot.step()

    # Safety check
    if robot.is_fallen():
        robot.stop()
        break

Step 6: Iterate

After initial deployment, observe failure modes and return to sim:

  1. Robot slips on floor → increase friction randomization range
  2. Motor overheats → add torque penalty, reduce max velocity
  3. Robot unstable on real terrain → add terrain noise in sim
  4. Latency causes instability → increase action delay randomization

Common Failure Modes and Fixes

Failure 1: Robot Falls Immediately When Set Down

Cause: Initial state differs between sim and real. Sim usually starts from perfect standing, real robot has slight imbalance.

Fix: Randomize initial state in sim (body orientation +/- 5 degrees, velocity +/- 0.1 m/s).

Failure 2: Gait is Not Smooth, Jerky

Cause: Action delay in real system is larger than sim. Policy output changes fast but robot responds slow.

Fix: (a) Add action delay randomization (0-20ms), (b) Add action rate penalty in reward, (c) Reduce control frequency.

Failure 3: Robot Slips When Turning

Cause: Friction coefficient in sim differs from real. Policy learned based on higher friction.

Fix: Randomize friction coefficient wider (0.3-2.0). Add lateral velocity penalty.

Failure 4: Motor Overheats After a Few Minutes

Cause: Policy uses too much torque. Sim doesn't model thermal limits.

Fix: (a) Increase energy penalty weight, (b) Add torque RMS constraint, (c) Reduce max action magnitude.

Failure 5: Policy Performs Differently Each Deployment

Cause: Policy is sensitive to initial conditions or sensor noise. Not robust.

Fix: Increase domain randomization ranges, especially sensor noise and initial state randomization.

Important Papers on Sim-to-Real Locomotion

Essential papers to read:

  1. Learning Agile and Dynamic Motor Skills for Legged Robots (arXiv:1901.08652) -- Hwangbo et al., 2019. Actuator network for ANYmal, foundational paper.

  2. Learning Robust, Agile, Natural Legged Locomotion Skills in the Wild (arXiv:2304.10888) -- Adversarial training + teacher-student for natural locomotion on wild terrain.

  3. Reinforcement Learning for Versatile, Dynamic, and Robust Bipedal Locomotion Control (arXiv:2401.16889) -- Comprehensive bipedal RL framework on Cassie.

  4. Humanoid-Gym: RL for Humanoid Robot with Zero-Shot Sim2Real Transfer (arXiv:2404.05695) -- Open-source framework, verified zero-shot transfer.

  5. Berkeley Humanoid: A Research Platform for Learning-based Control (arXiv:2407.21781) -- Hardware co-design makes sim-to-real easier.

  6. Towards Bridging the Gap: Systematic Sim-to-Real Transfer for Diverse Legged Robots (arXiv:2509.06342) -- PACE: joint-space dynamics alignment, alternative to actuator networks.

Best Practices Summary

From experience of leading teams (ETH RSL, CMU, Berkeley, Unitree):

1. Start Simple, Add Complexity Gradually

  • Flat ground walking first
  • Add terrain curriculum after flat walking is stable
  • Add vision after proprioceptive policy is robust

2. Actuator Modeling > Domain Randomization

  • If you have torque sensor → train actuator network (best)
  • If not → aggressive motor strength randomization (+/- 20%)
  • QDD actuators reduce need for both

3. Sim-to-Sim Before Sim-to-Real

  • Train Isaac Gym → test MuJoCo (or vice versa)
  • If transfer works between two sims → likely works on real robot

4. Safety First When Deploying

  • Harness for bipedal (always)
  • Start low velocity, increase gradually
  • Monitor motor temperature real-time
  • Accessible kill switch

5. Log Everything

  • Record all observations, actions, rewards in real deployment
  • Compare distributions with sim data → identify gaps
  • Use real data to refine sim parameters

Conclusion

Sim-to-real for locomotion combines multiple techniques: actuator networks for accurate motor modeling, domain randomization for robustness, terrain curriculum for progressive difficulty, and careful engineering for deployment. There's no single silver bullet -- each robot and task needs a different combination.

Key takeaway: Hardware co-design matters. Berkeley Humanoid with QDD actuators needs less sim-to-real effort than robots with complex geared transmissions. If you're designing a new robot, choose actuators that are easy to simulate.

Read the full series:

  • Part 1: Locomotion Basics -- From ZMP to CPG
  • Part 2: RL for Locomotion -- PPO, reward shaping, curriculum
  • Part 3: Quadruped Locomotion -- legged_gym to Unitree Go2
  • Part 4: Walk These Ways -- Adaptive locomotion
  • Part 5: Robot Parkour -- Jumping and climbing
  • Part 6: Bipedal Walking -- 2-legged robots with RL

Related articles in Simulation series:

  • Domain Randomization Detailed -- Visual and dynamics randomization
  • Complete Sim-to-Real Pipeline -- End-to-end from sim to real

Related Articles

  • Domain Randomization: Technique to Reduce Sim-to-Real Gap -- Detailed on visual and dynamics randomization
  • Complete Sim-to-Real Pipeline -- End-to-end pipeline from simulation to real
  • Bipedal Walking: Controlling 2-Legged Robots with RL -- Platforms and reward design for bipedal
  • Robot Parkour: Jumping and Climbing with RL -- Parkour as the ultimate sim-to-real test
  • Simulation for Robotics: MuJoCo vs Isaac Sim vs Gazebo -- Choosing the right simulator
NT

Nguyễn Anh Tuấn

Robotics & AI Engineer. Building VnRobo — sharing knowledge about robot learning, VLA models, and automation.

Khám phá VnRobo

Fleet MonitoringROS 2 IntegrationAMR Solutions
locomotion-zero-to-hero — Phần 7/7
← Bipedal Walking: Controlling 2-Legged Robots with RL

Related Posts

Research
Robot Parkour: Nhảy, leo cầu thang bằng RL
locomotionrlparkourPart 5
locomotion

Robot Parkour: Nhảy, leo cầu thang bằng RL

Phân tích Extreme Parkour và SoloParkour — cách dạy robot nhảy, leo trèo và vượt địa hình phức tạp bằng RL.

2/21/20269 min read
NT
Research
Walk These Ways: Adaptive locomotion một policy
locomotionrlresearchPart 4
locomotion

Walk These Ways: Adaptive locomotion một policy

Phân tích paper Walk-These-Ways — train 1 policy duy nhất cho nhiều kiểu di chuyển với multiplicity of behavior.

2/17/202610 min read
NT
Tutorial
Quadruped Locomotion: legged_gym đến Unitree Go2
locomotionquadrupedrlPart 3
locomotion

Quadruped Locomotion: legged_gym đến Unitree Go2

Hands-on train locomotion policy với legged_gym và deploy lên robot 4 chân Unitree Go2 — từ sim đến real.

2/13/202611 min read
NT
VnRobo logo

AI infrastructure for next-generation industrial robots.

Product

  • Features
  • Pricing
  • Knowledge Base
  • Services

Company

  • About Us
  • Blog
  • Contact

Legal

  • Privacy Policy
  • Terms of Service

© 2026 VnRobo. All rights reserved.

Made with♥in Vietnam