locomotionlocomotionsim2realrl

Sim-to-Real for Locomotion: Reality and Experience

Transferring locomotion policy from simulation to real robot — domain randomization, actuator networks, and best practices.

Nguyen Anh Tuan1 tháng 3, 202612 phút đọc
Sim-to-Real for Locomotion: Reality and Experience

Sim-to-Real for Locomotion: Not Just Domain Randomization

If you've read the general sim-to-real overview (Sim-to-Real Transfer: Train in Simulation, Run in Reality, Domain Randomization), you understand the basics. This post dives deep into sim-to-real specific to locomotion -- the challenges and solutions that only appear when training robots to walk, run, and jump.

Locomotion sim-to-real differs from manipulation sim-to-real in 3 key ways:

  1. Contact dynamics more complex: Robot feet continuously contact the ground, with impact forces varying by terrain, velocity, and contact angle
  2. Actuator dynamics more critical: Locomotion pushes actuators to their limits (high torque, high speed) -- where motor response is most nonlinear
  3. Error accumulation faster: Each step error affects the next step. After a few seconds, small errors snowball into robot falling

Let me analyze each gap and corresponding solutions, sharing best practices from leading teams.

Primary Sim-to-Real Gaps in Locomotion

Gap 1: Actuator Dynamics

Problem: In simulation, motors are typically modeled as ideal torque sources -- you command torque, motor outputs that torque instantly. Reality is completely different:

  • Torque response has delay: 5-20ms latency from command to actual torque
  • Velocity-dependent torque: Motor weaker at high speeds (back-EMF effect)
  • Temperature-dependent: Motor hot → output torque decreases
  • Friction and backlash: Gears have friction, backlash causes hysteresis
  • Torque saturation: Non-linear clipping near maximum torque

This is the biggest gap for locomotion because robots continuously operate near actuator limits (jumping, running fast, recovery from perturbation).

Gap 2: Terrain Friction and Contact

Problem: Simulation uses simplified contact models (usually Coulomb friction with fixed coefficient). Reality is much more complex:

  • Friction varies: Wood floor (mu=0.4), carpet (mu=0.8), wet floor (mu=0.2)
  • Anisotropic friction: Friction differs by direction (carpets, grooved surfaces)
  • Deformable terrain: Soft ground, grass, sand -- robot feet sink
  • Foot geometry: Simulation uses simple collision shapes, real feet have complex geometry

Gap 3: Sensor Noise and Latency

Locomotion-specific issues:

  • IMU drift: Gyroscope drift accumulates over time, especially problematic for balance
  • Joint encoder noise: +/- 0.1-0.5 degrees, accumulates through kinematic chain
  • Foot contact detection: Binary contact sensors inaccurate -- usually threshold foot force
  • Communication latency: 1-5ms bus delay between sensors and controller

Gap 4: Unmodeled Dynamics

Things simulation usually ignores:

  • Cable routing: Wires affect joint movement
  • Structural flexibility: Mechanics not perfectly rigid
  • Thermal effects: Motor heats → performance changes over time
  • Battery voltage drop: Voltage decreases → motor output changes

Gap between simulation and reality in robot locomotion

Solution 1: Actuator Network

Actuator network is the most important technique for locomotion sim-to-real, introduced by Hwangbo et al. in "Learning Agile and Dynamic Motor Skills for Legged Robots" on ANYmal.

Concept

Instead of ideal motor model in simulation, train a neural network to predict actual motor response:

Input: [position_error_history, velocity_history, command_history]
       (window 50-100 timesteps)
Output: predicted_actual_torque

Data Collection

  1. Place real robot on stand (feet off ground)
  2. Send random joint commands (sinusoidal, step, random)
  3. Record: commanded position, actual position, actual velocity, actual torque (if available)
  4. ~30 minutes data sufficient for one robot

Training Actuator Network

class ActuatorNet(nn.Module):
    def __init__(self, history_length=50):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(history_length * 3, 256),  # pos_err, vel, cmd
            nn.ELU(),
            nn.Linear(256, 128),
            nn.ELU(),
            nn.Linear(128, 1)  # predicted torque
        )

    def forward(self, pos_error_hist, vel_hist, cmd_hist):
        x = torch.cat([pos_error_hist, vel_hist, cmd_hist], dim=-1)
        return self.net(x)

Integrating into Simulation

Replace the default motor model with the actuator network:

# In simulation loop
for step in range(num_steps):
    action = policy(obs)                    # RL policy output
    # Instead of: sim.set_torque(action)
    predicted_torque = actuator_net(        # Pass through actuator network
        pos_error_history, vel_history, action_history
    )
    sim.set_torque(predicted_torque)        # Apply predicted torque
    sim.step()

Results

ANYmal with actuator network:

  • ~500K timesteps/second simulation speed (fast enough for RL)
  • Significantly reduced sim-to-real gap: Policy transfers successfully without heavy domain randomization
  • Actuator network runs at ~50K Hz inference -- lighter than physics simulation

Limitations

  • Requires torque sensing on robot (not all robots have it)
  • Actuator network is specific to each robot -- must re-collect data and retrain for new robots
  • Doesn't capture temperature-dependent changes (unless data collected at multiple temperatures)

Solution 2: Domain Randomization for Locomotion

Domain randomization (DR) is a "brute force" approach -- instead of modeling the real world accurately, randomize simulation parameters so the policy is robust to all variations.

Based on papers from ETH Zurich (ANYmal), CMU (Unitree), Berkeley (Cassie):

Parameter Range Notes
Body mass +/- 15-25% Simulate payload variation
CoM position +/- 3-5 cm Manufacturing tolerance
Friction coefficient 0.3 - 2.0 Diverse floor types
Restitution 0.0 - 0.5 Bounce on contact
Motor strength +/- 10-20% Motor variation
Joint damping +/- 30% Mechanical wear
Action delay 0 - 20 ms Communication latency
PD gains +/- 20% Controller variation
Terrain height noise +/- 2-5 cm Uneven ground
Push force 0 - 80 N External perturbation
Gravity direction +/- 3 degrees IMU calibration error

Adaptive Domain Randomization

Instead of fixed ranges, adaptive DR adjusts ranges based on performance:

# Automatic Domain Randomization (ADR)
for epoch in training:
    success_rate = evaluate(policy, current_dr_ranges)
    if success_rate > 0.8:
        # Expand ranges (make harder)
        dr_ranges *= 1.1
    elif success_rate < 0.5:
        # Shrink ranges (make easier)
        dr_ranges *= 0.9

OpenAI used ADR for Rubik's Cube manipulation. A similar approach applies to locomotion but requires careful tuning -- overly aggressive expansion can create impossible scenarios.

Solution 3: Terrain Curriculum

Specific to locomotion, terrain curriculum is a form of domain randomization for ground geometry.

Progressive Difficulty

Phase 1 (0-500 epochs):     Flat ground only
Phase 2 (500-1000 epochs):  + Rough terrain (2cm noise)
Phase 3 (1000-1500 epochs): + Slopes (5-15 degrees)
Phase 4 (1500-2000 epochs): + Stairs (10-20cm steps)
Phase 5 (2000+ epochs):     + Mixed (all terrains random)

Important Terrain Types

  1. Height noise terrain: Random height perturbations, simulate uneven ground
  2. Discrete obstacles: Random boxes/barriers on flat ground
  3. Gaps: Missing ground sections (robot must step over)
  4. Slopes: Inclines and declines
  5. Stairs: Regular steps, ascending and descending
  6. Stepping stones: Discrete footholds with gaps between

Real-World Success Rates

Based on published results and reported numbers from research teams:

Platform Task Sim Success Real Success Gap
ANYmal-D Rough terrain walking 98% ~93% 5%
Unitree A1 Stair climbing 95% ~91% 4%
Cassie Flat walking 99% ~95% 4%
Digit Warehouse walking 92% ~84% 8%
Unitree H1 Flat walking 97% ~90% 7%
ANYmal-C Parkour (mixed) 85% ~75% 10%

Observations:

  • Quadruped > Bipedal: ANYmal/A1 consistently transfer better than Digit/H1
  • Flat > Complex terrain: Gap increases as terrain difficulty increases
  • Actuator quality matters: QDD actuators (Berkeley Humanoid) have smaller gaps than geared motors

Step-by-Step: Train → Deploy Locomotion Policy

Step 1: Setup Simulation

# Clone Humanoid-Gym or legged_gym
git clone https://github.com/roboterax/humanoid-gym
cd humanoid-gym
pip install -e .

# Or for quadruped
git clone https://github.com/leggedrobotics/legged_gym

Configure robot URDF, reward weights, DR ranges.

Step 2: Train with Progressive Curriculum

# Config example
train_cfg = {
    "terrain": {
        "curriculum": True,
        "terrain_types": ["flat", "rough", "slope", "stairs"],
        "difficulty_scale": [0.0, 0.25, 0.5, 0.75, 1.0],
    },
    "domain_randomization": {
        "mass_range": [-0.15, 0.15],        # +/- 15%
        "friction_range": [0.4, 1.8],
        "motor_strength_range": [0.9, 1.1],
        "push_force_range": [0, 50],          # Newtons
        "action_delay": [0, 0.02],            # seconds
    },
    "reward": {
        "forward_vel": 1.0,
        "upright": 0.5,
        "energy": -0.001,
        "action_rate": -0.01,
        "feet_air_time": 0.5,                # Encourage lifting feet
    }
}

Train 2000-5000 epochs, ~2-8 hours on RTX 4090 (4096 parallel envs).

Step 3: Evaluate in Simulation

Before deployment, evaluate thoroughly in simulation:

# Test scenarios
test_scenarios = [
    "flat_ground_forward_1.0ms",
    "flat_ground_turning_0.5rads",
    "rough_terrain_forward_0.5ms",
    "slope_15deg_ascending",
    "stairs_15cm_ascending",
    "push_recovery_30N_lateral",
    "push_recovery_50N_lateral",
]

for scenario in test_scenarios:
    success_rate = evaluate(policy, scenario, n_trials=100)
    print(f"{scenario}: {success_rate:.1%}")
    assert success_rate > 0.85, f"FAIL: {scenario}"

Red flags in sim evaluation:

  • Success rate < 90% on flat ground → policy hasn't converged
  • Robot falls when pushed 30N → balance reward too low
  • Shuffling gait → need periodic gait reward
  • High energy consumption → energy penalty too low

Transfer policy from Isaac Gym to MuJoCo (or vice versa):

# Load policy trained in Isaac Gym
policy = load_policy("checkpoints/best_policy.pt")

# Test in MuJoCo
mujoco_env = create_mujoco_env(robot_urdf)
for episode in range(50):
    obs = mujoco_env.reset()
    for step in range(1000):
        action = policy(obs)
        obs, reward, done, info = mujoco_env.step(action)
        if done:
            break
    print(f"Episode {episode}: survived {step} steps")

If the policy works well in both simulators, it's likely to transfer well to the real robot.

Sim-to-real deployment process for robot locomotion

Step 5: Real-World Deployment

Deployment checklist:

  • Safety harness for robot (especially for bipedal)
  • Flat ground first, complex terrain after
  • Start with low velocity commands (0.3 m/s)
  • Monitor motor temperatures
  • Record data for debugging
# Deploy script (simplified)
import robot_sdk

robot = robot_sdk.connect("192.168.1.100")
policy = load_policy("checkpoints/best_policy.pt")

# Safety limits
MAX_VEL = 0.5  # Start slow
MAX_TORQUE = 0.8 * robot.max_torque  # 80% limit

while True:
    obs = robot.get_observation()
    action = policy(obs)
    action = clip(action, -MAX_TORQUE, MAX_TORQUE)

    robot.set_action(action)
    robot.step()

    # Safety check
    if robot.is_fallen():
        robot.stop()
        break

Step 6: Iterate

After initial deployment, observe failure modes and return to sim:

  1. Robot slips on floor → increase friction randomization range
  2. Motor overheats → add torque penalty, reduce max velocity
  3. Robot unstable on real terrain → add terrain noise in sim
  4. Latency causes instability → increase action delay randomization

Common Failure Modes and Fixes

Failure 1: Robot Falls Immediately When Set Down

Cause: Initial state differs between sim and real. Sim usually starts from perfect standing, real robot has slight imbalance.

Fix: Randomize initial state in sim (body orientation +/- 5 degrees, velocity +/- 0.1 m/s).

Failure 2: Gait is Not Smooth, Jerky

Cause: Action delay in real system is larger than sim. Policy output changes fast but robot responds slow.

Fix: (a) Add action delay randomization (0-20ms), (b) Add action rate penalty in reward, (c) Reduce control frequency.

Failure 3: Robot Slips When Turning

Cause: Friction coefficient in sim differs from real. Policy learned based on higher friction.

Fix: Randomize friction coefficient wider (0.3-2.0). Add lateral velocity penalty.

Failure 4: Motor Overheats After a Few Minutes

Cause: Policy uses too much torque. Sim doesn't model thermal limits.

Fix: (a) Increase energy penalty weight, (b) Add torque RMS constraint, (c) Reduce max action magnitude.

Failure 5: Policy Performs Differently Each Deployment

Cause: Policy is sensitive to initial conditions or sensor noise. Not robust.

Fix: Increase domain randomization ranges, especially sensor noise and initial state randomization.

Important Papers on Sim-to-Real Locomotion

Essential papers to read:

  1. Learning Agile and Dynamic Motor Skills for Legged Robots (arXiv:1901.08652) -- Hwangbo et al., 2019. Actuator network for ANYmal, foundational paper.

  2. Learning Robust, Agile, Natural Legged Locomotion Skills in the Wild (arXiv:2304.10888) -- Adversarial training + teacher-student for natural locomotion on wild terrain.

  3. Reinforcement Learning for Versatile, Dynamic, and Robust Bipedal Locomotion Control (arXiv:2401.16889) -- Comprehensive bipedal RL framework on Cassie.

  4. Humanoid-Gym: RL for Humanoid Robot with Zero-Shot Sim2Real Transfer (arXiv:2404.05695) -- Open-source framework, verified zero-shot transfer.

  5. Berkeley Humanoid: A Research Platform for Learning-based Control (arXiv:2407.21781) -- Hardware co-design makes sim-to-real easier.

  6. Towards Bridging the Gap: Systematic Sim-to-Real Transfer for Diverse Legged Robots (arXiv:2509.06342) -- PACE: joint-space dynamics alignment, alternative to actuator networks.

Best Practices Summary

From experience of leading teams (ETH RSL, CMU, Berkeley, Unitree):

1. Start Simple, Add Complexity Gradually

  • Flat ground walking first
  • Add terrain curriculum after flat walking is stable
  • Add vision after proprioceptive policy is robust

2. Actuator Modeling > Domain Randomization

  • If you have torque sensor → train actuator network (best)
  • If not → aggressive motor strength randomization (+/- 20%)
  • QDD actuators reduce need for both

3. Sim-to-Sim Before Sim-to-Real

  • Train Isaac Gym → test MuJoCo (or vice versa)
  • If transfer works between two sims → likely works on real robot

4. Safety First When Deploying

  • Harness for bipedal (always)
  • Start low velocity, increase gradually
  • Monitor motor temperature real-time
  • Accessible kill switch

5. Log Everything

  • Record all observations, actions, rewards in real deployment
  • Compare distributions with sim data → identify gaps
  • Use real data to refine sim parameters

Conclusion

Sim-to-real for locomotion combines multiple techniques: actuator networks for accurate motor modeling, domain randomization for robustness, terrain curriculum for progressive difficulty, and careful engineering for deployment. There's no single silver bullet -- each robot and task needs a different combination.

Key takeaway: Hardware co-design matters. Berkeley Humanoid with QDD actuators needs less sim-to-real effort than robots with complex geared transmissions. If you're designing a new robot, choose actuators that are easy to simulate.

Read the full series:

Related articles in Simulation series:


NT

Nguyễn Anh Tuấn

Robotics & AI Engineer. Building VnRobo — sharing knowledge about robot learning, VLA models, and automation.

Bài viết liên quan

NEWTutorial
Tự Build Robot Hình Người Dưới $5000 với Berkeley Humanoid Lite
humanoidreinforcement-learning3d-printingsim-to-realisaac-gymopen-sourcelocomotion

Tự Build Robot Hình Người Dưới $5000 với Berkeley Humanoid Lite

Hướng dẫn chi tiết xây dựng Berkeley Humanoid Lite — robot humanoid in 3D mã nguồn mở từ UC Berkeley, 24 DOF, locomotion bằng RL sim-to-real.

12/4/202612 phút đọc
NEWDeep Dive
WholebodyVLA Open-Source: Hướng Dẫn Kiến Trúc & Code
vlahumanoidloco-manipulationiclrrlopen-sourceisaac-lab

WholebodyVLA Open-Source: Hướng Dẫn Kiến Trúc & Code

Deep-dive vào codebase WholebodyVLA — kiến trúc latent action, LMO RL policy, và cách xây dựng pipeline whole-body loco-manipulation cho humanoid.

12/4/202619 phút đọc
NEWTutorial
Hướng dẫn fine-tune NVIDIA GR00T N1
vlahumanoidnvidiaisaac-labfine-tuningdeep-learninggrootsim2real

Hướng dẫn fine-tune NVIDIA GR00T N1

Hướng dẫn chi tiết fine-tune VLA model GR00T N1 cho humanoid robot với Isaac Lab và dữ liệu AGIBOT World — từ cài đặt đến inference.

12/4/202612 phút đọc