Sim-to-Real Transfer — Why Train in Simulation?
Sim-to-real transfer is the technique of training an AI model in simulation and deploying it directly to a real robot. This is one of the hottest topics in robotics research today, for one simple reason: collecting data on real robots is extremely expensive and slow.
In simulation, you can run 10,000 robots in parallel, each trying 1,000 episodes/hour. On a real robot? 1 robot, 10 episodes/hour, and each failure risks breaking hardware. Simulation is 1 million times faster and infinitely safer.
But there's one big problem: reality gap — a model trained in simulation doesn't work in the real world.
The Reality Gap — The Core Problem
Reality gap comes from differences between simulation and real world:
Visual Gap
| Simulation | Reality |
|---|---|
| Perfect lighting | Shadows, reflections, glare |
| Clean textures | Scratches, dust, wear |
| Exact object models | Shape variations, deformation |
| No motion blur | Camera blur, rolling shutter |
| Perfect camera model | Lens distortion, noise |
Dynamics Gap
| Simulation | Reality |
|---|---|
| Rigid body physics | Soft contacts, deformation |
| Perfect friction model | Variable friction, surface conditions |
| No actuator delay | Motor latency, communication delay |
| Exact mass/inertia | Manufacturing tolerances |
| Deterministic | Stochastic, non-repeatable |
A policy that runs perfectly in simulation can fail completely on a real robot just because friction coefficient is 10% off or camera exposure differs.
Technique 1: Domain Randomization
Domain randomization (DR) is the first and most popular technique for bridging reality gap. Idea: if the model has seen enough variations in simulation, reality is just another variation.
Visual Domain Randomization
Randomize everything related to rendering:
"""
Visual Domain Randomization for robot manipulation
Using Isaac Lab / Isaac Sim
"""
import omni.isaac.lab as lab
from omni.isaac.lab.envs import ManagerBasedRLEnv
import numpy as np
class VisualRandomizationConfig:
"""Config for visual domain randomization."""
def __init__(self):
# Lighting randomization
self.light_intensity_range = (300, 3000) # lux
self.light_color_range = (0.8, 1.2) # RGB multiplier
self.num_random_lights = (1, 5) # Number of random lights
self.light_position_range = (-2.0, 2.0) # meters
# Camera randomization
self.camera_fov_range = (55, 75) # degrees
self.camera_position_noise = 0.02 # meters
self.camera_rotation_noise = 3.0 # degrees
# Texture randomization
self.table_color_range = (0.2, 0.9) # RGB
self.object_color_range = (0.1, 1.0)
self.randomize_background = True
# Distractor objects
self.num_distractors = (0, 5)
self.distractor_scale_range = (0.5, 2.0)
def apply_visual_randomization(env, config: VisualRandomizationConfig):
"""Apply visual randomization on each episode reset."""
# Random lighting
num_lights = np.random.randint(*config.num_random_lights)
for i in range(num_lights):
intensity = np.random.uniform(*config.light_intensity_range)
color = np.random.uniform(*config.light_color_range, size=3)
position = np.random.uniform(
config.light_position_range,
config.light_position_range,
size=3,
)
env.scene.add_light(
position=position,
intensity=intensity,
color=color,
)
# Random camera pose (small perturbation)
cam_pos_noise = np.random.normal(0, config.camera_position_noise, size=3)
cam_rot_noise = np.random.normal(0, config.camera_rotation_noise, size=3)
env.scene.camera.set_pose(
position=env.scene.camera.default_position + cam_pos_noise,
rotation=env.scene.camera.default_rotation + cam_rot_noise,
)
# Random table texture
table_color = np.random.uniform(*config.table_color_range, size=3)
env.scene.table.set_color(table_color)
return env
Dynamics Domain Randomization
Randomize physical parameters — more important than visual for control tasks:
class DynamicsRandomizationConfig:
"""Config for dynamics domain randomization."""
def __init__(self):
# Object properties
self.mass_scale_range = (0.5, 2.0) # Multiplier on nominal mass
self.friction_range = (0.3, 1.5) # Coefficient of friction
self.restitution_range = (0.0, 0.5) # Bounciness
# Robot properties
self.joint_damping_scale = (0.8, 1.2)
self.joint_friction_scale = (0.5, 2.0)
self.actuator_strength_scale = (0.85, 1.15) # ±15%
# Control properties
self.action_delay_range = (0, 3) # Frames of delay
self.action_noise_std = 0.02 # Gaussian noise on actions
self.observation_noise_std = 0.01 # Noise on sensor readings
# Gravity
self.gravity_noise = 0.05 # m/s^2
def apply_dynamics_randomization(env, config: DynamicsRandomizationConfig):
"""Apply dynamics randomization on each episode."""
# Randomize object mass
mass_scale = np.random.uniform(*config.mass_scale_range)
env.scene.target_object.set_mass(
env.scene.target_object.default_mass * mass_scale
)
# Randomize friction
friction = np.random.uniform(*config.friction_range)
env.scene.target_object.set_friction(friction)
env.scene.table.set_friction(friction * np.random.uniform(0.8, 1.2))
# Randomize actuator strength
for joint in env.robot.joints:
strength_scale = np.random.uniform(*config.actuator_strength_scale)
joint.set_max_force(joint.default_max_force * strength_scale)
# Random action delay
env.action_delay = np.random.randint(*config.action_delay_range)
return env
Automatic Domain Randomization (ADR)
OpenAI pushed DR further with ADR — automatically expand randomization range when policy masters current range:
Initially: friction ∈ [0.9, 1.1] (close to realistic)
Policy masters → expand: friction ∈ [0.7, 1.3]
Policy masters → expand: friction ∈ [0.3, 1.5]
...
Finally: friction ∈ [0.1, 2.0] (extremely diverse)
Result: policy becomes so robust that reality is just "easy mode" compared to training.
Technique 2: System Identification (SysID)
Instead of randomizing everything, SysID precisely measures physical parameters of the real robot, then sets simulation to match:
"""
System Identification workflow
Measure real robot → calibrate simulation
"""
import numpy as np
from scipy.optimize import minimize
def measure_joint_dynamics(robot, joint_id: int, num_trials: int = 20):
"""
Measure response of 1 joint on real robot.
Send step command, record trajectory.
"""
trajectories = []
for _ in range(num_trials):
# Send step command
robot.set_joint_position(joint_id, target=1.0)
# Record response
traj = []
for t in range(100): # 100 timesteps
pos = robot.get_joint_position(joint_id)
vel = robot.get_joint_velocity(joint_id)
torque = robot.get_joint_torque(joint_id)
traj.append([pos, vel, torque])
trajectories.append(np.array(traj))
return np.array(trajectories) # shape: (trials, timesteps, 3)
def optimize_sim_parameters(real_trajectories, sim_env):
"""
Optimize simulation parameters to match real robot.
"""
def cost_function(params):
damping, friction, stiffness = params
# Set simulation parameters
sim_env.set_joint_damping(damping)
sim_env.set_joint_friction(friction)
sim_env.set_joint_stiffness(stiffness)
# Run same trajectory in sim
sim_traj = sim_env.step_response(target=1.0, timesteps=100)
# MSE between sim and real
error = np.mean((real_trajectories.mean(axis=0) - sim_traj) ** 2)
return error
# Optimize
result = minimize(
cost_function,
x0=[1.0, 0.1, 100.0], # Initial guess
method="Nelder-Mead",
options={"maxiter": 1000},
)
print(f"Optimized params: damping={result.x[0]:.3f}, "
f"friction={result.x[1]:.3f}, stiffness={result.x[2]:.1f}")
return result.x
SysID vs Domain Randomization
| Domain Randomization | System Identification | |
|---|---|---|
| Approach | Train on many variations | Match sim exactly with real |
| Effort | Low (just set ranges) | High (need to measure robot) |
| Robustness | High (diverse training) | Lower (overfits to 1 config) |
| Best for | Vision + diverse environments | Precise control + known robot |
| Combine? | Yes — SysID center + DR around it |
Best practice: Use SysID to find nominal parameters, then apply DR around those values. Example: SysID measures friction = 0.8, then DR range = [0.5, 1.1] instead of [0.1, 2.0].
Technique 3: Curriculum Learning
Instead of throwing policy into randomized environment immediately, gradually increase difficulty:
"""
Curriculum Learning for sim-to-real transfer
Gradually increase domain randomization range with training progress
"""
class CurriculumScheduler:
def __init__(self, total_steps: int = 1_000_000):
self.total_steps = total_steps
def get_randomization_scale(self, current_step: int) -> float:
"""
Return scale factor 0→1 for randomization ranges.
0 = no randomization (easy), 1 = full randomization (hard)
"""
progress = current_step / self.total_steps
# Smooth ramp-up
return min(1.0, progress * 2) # Full randomization at 50% training
def get_task_difficulty(self, current_step: int) -> dict:
scale = self.get_randomization_scale(current_step)
return {
# Visual randomization
"light_intensity_range": (
1000 - 700 * scale,
1000 + 2000 * scale,
),
"camera_noise": 0.02 * scale,
"num_distractors": int(5 * scale),
# Dynamics randomization
"friction_range": (
0.8 - 0.5 * scale,
0.8 + 0.7 * scale,
),
"mass_scale_range": (
1.0 - 0.5 * scale,
1.0 + 1.0 * scale,
),
"action_delay": int(3 * scale),
"action_noise": 0.02 * scale,
# Task difficulty
"object_position_range": 0.1 + 0.3 * scale, # meters
"goal_tolerance": 0.05 - 0.03 * scale, # tighter goal
}
# Use in training loop
scheduler = CurriculumScheduler(total_steps=2_000_000)
for step in range(2_000_000):
difficulty = scheduler.get_task_difficulty(step)
env.set_randomization(difficulty)
obs = env.reset()
action = policy(obs)
next_obs, reward, done, info = env.step(action)
# ... update policy
Simulation Tools Comparison
| Isaac Lab | MuJoCo | Gazebo | |
|---|---|---|---|
| Engine | PhysX 5 (GPU) | MuJoCo (CPU/GPU) | ODE/Bullet/DART |
| GPU parallel | 10,000+ envs | 1,000+ (MJX) | No |
| Rendering | RTX ray-tracing | Basic OpenGL | OGRE/Ignition |
| ROS integration | Isaac ROS | ros2_mujoco | Native |
| Domain Rand | Built-in, extensive | Manual | Plugin-based |
| License | Free (NVIDIA) | Apache 2.0 | Apache 2.0 |
| Best for | Large-scale RL, visual sim-to-real | Contact-rich manipulation | ROS 2 integration, multi-robot |
| Learning curve | Steep (Omniverse) | Medium | Medium |
| Typical FPS | 100K+ steps/s | 50K+ steps/s | 1K steps/s |
Which tool to use?
- Isaac Lab: You have NVIDIA GPU, need massive parallelism (RL training), or visual sim-to-real
- MuJoCo: Contact-rich manipulation tasks, research benchmark, or need accurate contact physics
- Gazebo: Prototype quickly with ROS 2, multi-robot simulation, or don't need RL training
End-to-End: Train Manipulation Policy in Isaac Lab
Example: train robot arm pick-and-place in Isaac Lab, transfer to real robot.
Step 1: Setup Environment
"""
Isaac Lab environment for pick-and-place
GPU-accelerated, 4096 parallel envs
"""
import omni.isaac.lab as lab
from omni.isaac.lab.envs import ManagerBasedRLEnvCfg
from omni.isaac.lab.assets import ArticulationCfg, RigidObjectCfg
from omni.isaac.lab.managers import (
ObservationGroupCfg,
RewardTermCfg,
EventTermCfg,
)
class PickPlaceEnvCfg(ManagerBasedRLEnvCfg):
"""Config for pick-and-place environment."""
# Scene
num_envs = 4096
env_spacing = 2.0
# Robot: Franka Panda
robot = ArticulationCfg(
prim_path="/World/envs/env_.*/Robot",
spawn=lab.sim.schemas.FrankaCfg(),
actuators={
"panda_shoulder": lab.actuators.ImplicitActuatorCfg(
joint_names_expr=["panda_joint[1-4]"],
stiffness=80.0,
damping=4.0,
),
"panda_forearm": lab.actuators.ImplicitActuatorCfg(
joint_names_expr=["panda_joint[5-7]"],
stiffness=40.0,
damping=2.0,
),
},
)
# Target object
object = RigidObjectCfg(
prim_path="/World/envs/env_.*/Object",
spawn=lab.sim.schemas.CuboidCfg(
size=(0.04, 0.04, 0.04),
mass=0.1,
),
)
# Observations
observations = ObservationGroupCfg(
policy=ObservationGroupCfg(
joint_pos=True,
joint_vel=True,
object_pos=True,
object_rot=True,
goal_pos=True,
)
)
# Rewards
rewards = {
"reaching": RewardTermCfg(
func=reaching_reward,
weight=1.0,
),
"grasping": RewardTermCfg(
func=grasping_reward,
weight=5.0,
),
"placing": RewardTermCfg(
func=placing_reward,
weight=10.0,
),
}
# Domain Randomization events
events = {
"reset_object_position": EventTermCfg(
func=randomize_object_position,
mode="reset",
params={"range": 0.15},
),
"randomize_mass": EventTermCfg(
func=randomize_object_mass,
mode="reset",
params={"scale_range": (0.5, 2.0)},
),
"randomize_friction": EventTermCfg(
func=randomize_friction,
mode="reset",
params={"range": (0.4, 1.2)},
),
}
Step 2: Train with PPO
# Train 4096 parallel envs, 50M timesteps
# On RTX 4090: ~2 hours
python -m omni.isaac.lab_tasks.train \
--task Isaac-Pick-Place-v0 \
--num_envs 4096 \
--max_iterations 5000 \
--algo ppo \
--headless
# Evaluate in sim
python -m omni.isaac.lab_tasks.play \
--task Isaac-Pick-Place-v0 \
--checkpoint logs/ppo/model_5000.pt
Step 3: Transfer to Real Robot
"""
Deploy trained policy to Franka Panda
Using Isaac ROS + Franka ROS 2 driver
"""
import rclpy
from rclpy.node import Node
import torch
import numpy as np
from sensor_msgs.msg import JointState
from geometry_msgs.msg import PoseStamped
class PolicyDeployNode(Node):
def __init__(self):
super().__init__("policy_deploy")
# Load trained policy
self.policy = torch.jit.load("/models/pick_place_policy.pt")
self.policy.eval()
# Observation buffer
self.joint_pos = np.zeros(7)
self.joint_vel = np.zeros(7)
self.object_pos = np.zeros(3)
# Subscribers
self.create_subscription(
JointState, "/franka/joint_states", self.joint_callback, 10
)
self.create_subscription(
PoseStamped, "/object/pose", self.object_callback, 10
)
# Publisher
self.cmd_pub = self.create_publisher(
JointState, "/franka/joint_commands", 10
)
# Control loop at 20 Hz
self.create_timer(0.05, self.control_loop)
def joint_callback(self, msg):
self.joint_pos = np.array(msg.position[:7])
self.joint_vel = np.array(msg.velocity[:7])
def object_callback(self, msg):
self.object_pos = np.array([
msg.pose.position.x,
msg.pose.position.y,
msg.pose.position.z,
])
def control_loop(self):
# Build observation (same format as simulation)
obs = np.concatenate([
self.joint_pos, # 7
self.joint_vel, # 7
self.object_pos, # 3
self.goal_pos, # 3 (set beforehand)
])
# Inference
with torch.no_grad():
obs_tensor = torch.FloatTensor(obs).unsqueeze(0)
action = self.policy(obs_tensor).squeeze().numpy()
# Safety clamp
action = np.clip(action, -0.05, 0.05) # Max delta per step
# Publish command
cmd = JointState()
cmd.position = (self.joint_pos + action[:7]).tolist()
self.cmd_pub.publish(cmd)
Success Stories
OpenAI Rubik's Cube (2019)
Solving Rubik's Cube with a Robot Hand — train entirely in simulation with Automatic Domain Randomization, deploy zero-shot to real Shadow Dexterous Hand. This is the most impressive sim-to-real demo for dexterous manipulation ever.
Key insights:
- ADR randomize hundreds of parameters simultaneously
- Policy develops emergent meta-learning — self-adapts to new dynamics
- Train for 1 year on cluster, but unprecedented results
Agility Robotics Digit Locomotion
Real-World Humanoid Locomotion with Reinforcement Learning — train causal transformer policy in simulation, deploy zero-shot to real Digit humanoid. Robot walks on multiple terrains never seen in training.
Key insights:
- Use observation history (proprioceptive) instead of single frame
- Teacher-student distillation: teacher has privileged info, student uses only sensors
- Massive domain randomization on terrain, friction, mass distribution
Boston Dynamics Spot + Isaac Lab
NVIDIA showcase training Spot quadruped locomotion in Isaac Lab with thousands of parallel environments, deploy zero-shot to real robot.
Best Practices Summary
-
Start with SysID — measure friction, mass, joint dynamics of real robot. Set simulation nominal values to match.
-
Apply DR around nominal — randomize ±30% around SysID values, don't randomize blindly.
-
Curriculum learning — start with little randomization, increase gradually. Policy learns task first, robustness second.
-
Observation design matters more than reward — use proprioceptive (joints, IMU) instead of just vision. Proprioceptive transfers better.
-
Action space: Use delta position control instead of torque control. Position control absorbs more dynamics mismatch.
-
Low-pass filter actions — smooth out jerky actions from policy. Real robots can't handle high-frequency commands.
-
Safety wrapper — always have velocity/torque limits, workspace boundaries, and emergency stop outside the policy.
-
Evaluate in sim first — if policy fails at 50% randomization range, it will fail on real robot. Target: >90% success rate in full DR range.
Conclusion
Sim-to-real transfer is the most important enabler for robot learning today. Combine domain randomization + system identification + curriculum learning for best results. Tools like Isaac Lab and MuJoCo are mature enough for production use.
Recommended pipeline:
SysID (measure real robot)
→ Setup sim (Isaac Lab / MuJoCo)
→ Train with DR + curriculum (4096 parallel envs)
→ Evaluate in sim (>90% success at full DR)
→ Deploy zero-shot to real robot
→ Fine-tune with real data if needed
Related Articles
- Foundation Models for Robots: RT-2, Octo, OpenVLA in Practice — Combine sim-to-real with foundation models
- Edge AI with NVIDIA Jetson — Deploy model to edge device after sim-to-real
- SLAM and Navigation for Autonomous Robots — Navigation stack often uses sim-to-real for local planner