← Back to Blog
aiai-perceptionresearchrobotics

Sim-to-Real Transfer: Train in Simulation, Run in Reality

Techniques for transferring models from simulation to real robots — domain randomization, system identification, and best practices.

Nguyen Anh Tuan1 tháng 4, 202612 min read
Sim-to-Real Transfer: Train in Simulation, Run in Reality

Sim-to-Real Transfer — Why Train in Simulation?

Sim-to-real transfer is the technique of training an AI model in simulation and deploying it directly to a real robot. This is one of the hottest topics in robotics research today, for one simple reason: collecting data on real robots is extremely expensive and slow.

In simulation, you can run 10,000 robots in parallel, each trying 1,000 episodes/hour. On a real robot? 1 robot, 10 episodes/hour, and each failure risks breaking hardware. Simulation is 1 million times faster and infinitely safer.

But there's one big problem: reality gap — a model trained in simulation doesn't work in the real world.

Robot simulation environment for training AI perception

The Reality Gap — The Core Problem

Reality gap comes from differences between simulation and real world:

Visual Gap

Simulation Reality
Perfect lighting Shadows, reflections, glare
Clean textures Scratches, dust, wear
Exact object models Shape variations, deformation
No motion blur Camera blur, rolling shutter
Perfect camera model Lens distortion, noise

Dynamics Gap

Simulation Reality
Rigid body physics Soft contacts, deformation
Perfect friction model Variable friction, surface conditions
No actuator delay Motor latency, communication delay
Exact mass/inertia Manufacturing tolerances
Deterministic Stochastic, non-repeatable

A policy that runs perfectly in simulation can fail completely on a real robot just because friction coefficient is 10% off or camera exposure differs.

Technique 1: Domain Randomization

Domain randomization (DR) is the first and most popular technique for bridging reality gap. Idea: if the model has seen enough variations in simulation, reality is just another variation.

Paper: Tobin et al., 2017 — Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World

Visual Domain Randomization

Randomize everything related to rendering:

"""
Visual Domain Randomization for robot manipulation
Using Isaac Lab / Isaac Sim
"""
import omni.isaac.lab as lab
from omni.isaac.lab.envs import ManagerBasedRLEnv
import numpy as np


class VisualRandomizationConfig:
    """Config for visual domain randomization."""

    def __init__(self):
        # Lighting randomization
        self.light_intensity_range = (300, 3000)      # lux
        self.light_color_range = (0.8, 1.2)            # RGB multiplier
        self.num_random_lights = (1, 5)                 # Number of random lights
        self.light_position_range = (-2.0, 2.0)        # meters

        # Camera randomization
        self.camera_fov_range = (55, 75)                # degrees
        self.camera_position_noise = 0.02               # meters
        self.camera_rotation_noise = 3.0                # degrees

        # Texture randomization
        self.table_color_range = (0.2, 0.9)             # RGB
        self.object_color_range = (0.1, 1.0)
        self.randomize_background = True

        # Distractor objects
        self.num_distractors = (0, 5)
        self.distractor_scale_range = (0.5, 2.0)


def apply_visual_randomization(env, config: VisualRandomizationConfig):
    """Apply visual randomization on each episode reset."""

    # Random lighting
    num_lights = np.random.randint(*config.num_random_lights)
    for i in range(num_lights):
        intensity = np.random.uniform(*config.light_intensity_range)
        color = np.random.uniform(*config.light_color_range, size=3)
        position = np.random.uniform(
            config.light_position_range,
            config.light_position_range,
            size=3,
        )
        env.scene.add_light(
            position=position,
            intensity=intensity,
            color=color,
        )

    # Random camera pose (small perturbation)
    cam_pos_noise = np.random.normal(0, config.camera_position_noise, size=3)
    cam_rot_noise = np.random.normal(0, config.camera_rotation_noise, size=3)
    env.scene.camera.set_pose(
        position=env.scene.camera.default_position + cam_pos_noise,
        rotation=env.scene.camera.default_rotation + cam_rot_noise,
    )

    # Random table texture
    table_color = np.random.uniform(*config.table_color_range, size=3)
    env.scene.table.set_color(table_color)

    return env

Dynamics Domain Randomization

Randomize physical parameters — more important than visual for control tasks:

class DynamicsRandomizationConfig:
    """Config for dynamics domain randomization."""

    def __init__(self):
        # Object properties
        self.mass_scale_range = (0.5, 2.0)          # Multiplier on nominal mass
        self.friction_range = (0.3, 1.5)             # Coefficient of friction
        self.restitution_range = (0.0, 0.5)          # Bounciness

        # Robot properties
        self.joint_damping_scale = (0.8, 1.2)
        self.joint_friction_scale = (0.5, 2.0)
        self.actuator_strength_scale = (0.85, 1.15)  # ±15%

        # Control properties
        self.action_delay_range = (0, 3)              # Frames of delay
        self.action_noise_std = 0.02                  # Gaussian noise on actions
        self.observation_noise_std = 0.01             # Noise on sensor readings

        # Gravity
        self.gravity_noise = 0.05                     # m/s^2


def apply_dynamics_randomization(env, config: DynamicsRandomizationConfig):
    """Apply dynamics randomization on each episode."""

    # Randomize object mass
    mass_scale = np.random.uniform(*config.mass_scale_range)
    env.scene.target_object.set_mass(
        env.scene.target_object.default_mass * mass_scale
    )

    # Randomize friction
    friction = np.random.uniform(*config.friction_range)
    env.scene.target_object.set_friction(friction)
    env.scene.table.set_friction(friction * np.random.uniform(0.8, 1.2))

    # Randomize actuator strength
    for joint in env.robot.joints:
        strength_scale = np.random.uniform(*config.actuator_strength_scale)
        joint.set_max_force(joint.default_max_force * strength_scale)

    # Random action delay
    env.action_delay = np.random.randint(*config.action_delay_range)

    return env

Automatic Domain Randomization (ADR)

OpenAI pushed DR further with ADR — automatically expand randomization range when policy masters current range:

Initially:  friction ∈ [0.9, 1.1]  (close to realistic)
Policy masters → expand: friction ∈ [0.7, 1.3]
Policy masters → expand: friction ∈ [0.3, 1.5]
...
Finally: friction ∈ [0.1, 2.0]  (extremely diverse)

Result: policy becomes so robust that reality is just "easy mode" compared to training.

Technique 2: System Identification (SysID)

Instead of randomizing everything, SysID precisely measures physical parameters of the real robot, then sets simulation to match:

"""
System Identification workflow
Measure real robot → calibrate simulation
"""
import numpy as np
from scipy.optimize import minimize


def measure_joint_dynamics(robot, joint_id: int, num_trials: int = 20):
    """
    Measure response of 1 joint on real robot.
    Send step command, record trajectory.
    """
    trajectories = []
    for _ in range(num_trials):
        # Send step command
        robot.set_joint_position(joint_id, target=1.0)

        # Record response
        traj = []
        for t in range(100):  # 100 timesteps
            pos = robot.get_joint_position(joint_id)
            vel = robot.get_joint_velocity(joint_id)
            torque = robot.get_joint_torque(joint_id)
            traj.append([pos, vel, torque])

        trajectories.append(np.array(traj))

    return np.array(trajectories)  # shape: (trials, timesteps, 3)


def optimize_sim_parameters(real_trajectories, sim_env):
    """
    Optimize simulation parameters to match real robot.
    """
    def cost_function(params):
        damping, friction, stiffness = params

        # Set simulation parameters
        sim_env.set_joint_damping(damping)
        sim_env.set_joint_friction(friction)
        sim_env.set_joint_stiffness(stiffness)

        # Run same trajectory in sim
        sim_traj = sim_env.step_response(target=1.0, timesteps=100)

        # MSE between sim and real
        error = np.mean((real_trajectories.mean(axis=0) - sim_traj) ** 2)
        return error

    # Optimize
    result = minimize(
        cost_function,
        x0=[1.0, 0.1, 100.0],  # Initial guess
        method="Nelder-Mead",
        options={"maxiter": 1000},
    )

    print(f"Optimized params: damping={result.x[0]:.3f}, "
          f"friction={result.x[1]:.3f}, stiffness={result.x[2]:.1f}")
    return result.x

SysID vs Domain Randomization

Domain Randomization System Identification
Approach Train on many variations Match sim exactly with real
Effort Low (just set ranges) High (need to measure robot)
Robustness High (diverse training) Lower (overfits to 1 config)
Best for Vision + diverse environments Precise control + known robot
Combine? Yes — SysID center + DR around it

Best practice: Use SysID to find nominal parameters, then apply DR around those values. Example: SysID measures friction = 0.8, then DR range = [0.5, 1.1] instead of [0.1, 2.0].

Technique 3: Curriculum Learning

Instead of throwing policy into randomized environment immediately, gradually increase difficulty:

"""
Curriculum Learning for sim-to-real transfer
Gradually increase domain randomization range with training progress
"""

class CurriculumScheduler:
    def __init__(self, total_steps: int = 1_000_000):
        self.total_steps = total_steps

    def get_randomization_scale(self, current_step: int) -> float:
        """
        Return scale factor 0→1 for randomization ranges.
        0 = no randomization (easy), 1 = full randomization (hard)
        """
        progress = current_step / self.total_steps
        # Smooth ramp-up
        return min(1.0, progress * 2)  # Full randomization at 50% training

    def get_task_difficulty(self, current_step: int) -> dict:
        scale = self.get_randomization_scale(current_step)

        return {
            # Visual randomization
            "light_intensity_range": (
                1000 - 700 * scale,
                1000 + 2000 * scale,
            ),
            "camera_noise": 0.02 * scale,
            "num_distractors": int(5 * scale),

            # Dynamics randomization
            "friction_range": (
                0.8 - 0.5 * scale,
                0.8 + 0.7 * scale,
            ),
            "mass_scale_range": (
                1.0 - 0.5 * scale,
                1.0 + 1.0 * scale,
            ),
            "action_delay": int(3 * scale),
            "action_noise": 0.02 * scale,

            # Task difficulty
            "object_position_range": 0.1 + 0.3 * scale,  # meters
            "goal_tolerance": 0.05 - 0.03 * scale,        # tighter goal
        }


# Use in training loop
scheduler = CurriculumScheduler(total_steps=2_000_000)

for step in range(2_000_000):
    difficulty = scheduler.get_task_difficulty(step)
    env.set_randomization(difficulty)

    obs = env.reset()
    action = policy(obs)
    next_obs, reward, done, info = env.step(action)
    # ... update policy

Simulation environment for robot training with domain randomization

Simulation Tools Comparison

Isaac Lab MuJoCo Gazebo
Engine PhysX 5 (GPU) MuJoCo (CPU/GPU) ODE/Bullet/DART
GPU parallel 10,000+ envs 1,000+ (MJX) No
Rendering RTX ray-tracing Basic OpenGL OGRE/Ignition
ROS integration Isaac ROS ros2_mujoco Native
Domain Rand Built-in, extensive Manual Plugin-based
License Free (NVIDIA) Apache 2.0 Apache 2.0
Best for Large-scale RL, visual sim-to-real Contact-rich manipulation ROS 2 integration, multi-robot
Learning curve Steep (Omniverse) Medium Medium
Typical FPS 100K+ steps/s 50K+ steps/s 1K steps/s

Which tool to use?

End-to-End: Train Manipulation Policy in Isaac Lab

Example: train robot arm pick-and-place in Isaac Lab, transfer to real robot.

Step 1: Setup Environment

"""
Isaac Lab environment for pick-and-place
GPU-accelerated, 4096 parallel envs
"""
import omni.isaac.lab as lab
from omni.isaac.lab.envs import ManagerBasedRLEnvCfg
from omni.isaac.lab.assets import ArticulationCfg, RigidObjectCfg
from omni.isaac.lab.managers import (
    ObservationGroupCfg,
    RewardTermCfg,
    EventTermCfg,
)


class PickPlaceEnvCfg(ManagerBasedRLEnvCfg):
    """Config for pick-and-place environment."""

    # Scene
    num_envs = 4096
    env_spacing = 2.0

    # Robot: Franka Panda
    robot = ArticulationCfg(
        prim_path="/World/envs/env_.*/Robot",
        spawn=lab.sim.schemas.FrankaCfg(),
        actuators={
            "panda_shoulder": lab.actuators.ImplicitActuatorCfg(
                joint_names_expr=["panda_joint[1-4]"],
                stiffness=80.0,
                damping=4.0,
            ),
            "panda_forearm": lab.actuators.ImplicitActuatorCfg(
                joint_names_expr=["panda_joint[5-7]"],
                stiffness=40.0,
                damping=2.0,
            ),
        },
    )

    # Target object
    object = RigidObjectCfg(
        prim_path="/World/envs/env_.*/Object",
        spawn=lab.sim.schemas.CuboidCfg(
            size=(0.04, 0.04, 0.04),
            mass=0.1,
        ),
    )

    # Observations
    observations = ObservationGroupCfg(
        policy=ObservationGroupCfg(
            joint_pos=True,
            joint_vel=True,
            object_pos=True,
            object_rot=True,
            goal_pos=True,
        )
    )

    # Rewards
    rewards = {
        "reaching": RewardTermCfg(
            func=reaching_reward,
            weight=1.0,
        ),
        "grasping": RewardTermCfg(
            func=grasping_reward,
            weight=5.0,
        ),
        "placing": RewardTermCfg(
            func=placing_reward,
            weight=10.0,
        ),
    }

    # Domain Randomization events
    events = {
        "reset_object_position": EventTermCfg(
            func=randomize_object_position,
            mode="reset",
            params={"range": 0.15},
        ),
        "randomize_mass": EventTermCfg(
            func=randomize_object_mass,
            mode="reset",
            params={"scale_range": (0.5, 2.0)},
        ),
        "randomize_friction": EventTermCfg(
            func=randomize_friction,
            mode="reset",
            params={"range": (0.4, 1.2)},
        ),
    }

Step 2: Train with PPO

# Train 4096 parallel envs, 50M timesteps
# On RTX 4090: ~2 hours
python -m omni.isaac.lab_tasks.train \
    --task Isaac-Pick-Place-v0 \
    --num_envs 4096 \
    --max_iterations 5000 \
    --algo ppo \
    --headless

# Evaluate in sim
python -m omni.isaac.lab_tasks.play \
    --task Isaac-Pick-Place-v0 \
    --checkpoint logs/ppo/model_5000.pt

Step 3: Transfer to Real Robot

"""
Deploy trained policy to Franka Panda
Using Isaac ROS + Franka ROS 2 driver
"""
import rclpy
from rclpy.node import Node
import torch
import numpy as np
from sensor_msgs.msg import JointState
from geometry_msgs.msg import PoseStamped


class PolicyDeployNode(Node):
    def __init__(self):
        super().__init__("policy_deploy")

        # Load trained policy
        self.policy = torch.jit.load("/models/pick_place_policy.pt")
        self.policy.eval()

        # Observation buffer
        self.joint_pos = np.zeros(7)
        self.joint_vel = np.zeros(7)
        self.object_pos = np.zeros(3)

        # Subscribers
        self.create_subscription(
            JointState, "/franka/joint_states", self.joint_callback, 10
        )
        self.create_subscription(
            PoseStamped, "/object/pose", self.object_callback, 10
        )

        # Publisher
        self.cmd_pub = self.create_publisher(
            JointState, "/franka/joint_commands", 10
        )

        # Control loop at 20 Hz
        self.create_timer(0.05, self.control_loop)

    def joint_callback(self, msg):
        self.joint_pos = np.array(msg.position[:7])
        self.joint_vel = np.array(msg.velocity[:7])

    def object_callback(self, msg):
        self.object_pos = np.array([
            msg.pose.position.x,
            msg.pose.position.y,
            msg.pose.position.z,
        ])

    def control_loop(self):
        # Build observation (same format as simulation)
        obs = np.concatenate([
            self.joint_pos,      # 7
            self.joint_vel,      # 7
            self.object_pos,     # 3
            self.goal_pos,       # 3 (set beforehand)
        ])

        # Inference
        with torch.no_grad():
            obs_tensor = torch.FloatTensor(obs).unsqueeze(0)
            action = self.policy(obs_tensor).squeeze().numpy()

        # Safety clamp
        action = np.clip(action, -0.05, 0.05)  # Max delta per step

        # Publish command
        cmd = JointState()
        cmd.position = (self.joint_pos + action[:7]).tolist()
        self.cmd_pub.publish(cmd)

Success Stories

OpenAI Rubik's Cube (2019)

Solving Rubik's Cube with a Robot Hand — train entirely in simulation with Automatic Domain Randomization, deploy zero-shot to real Shadow Dexterous Hand. This is the most impressive sim-to-real demo for dexterous manipulation ever.

Key insights:

Agility Robotics Digit Locomotion

Real-World Humanoid Locomotion with Reinforcement Learning — train causal transformer policy in simulation, deploy zero-shot to real Digit humanoid. Robot walks on multiple terrains never seen in training.

Key insights:

Boston Dynamics Spot + Isaac Lab

NVIDIA showcase training Spot quadruped locomotion in Isaac Lab with thousands of parallel environments, deploy zero-shot to real robot.

Best Practices Summary

  1. Start with SysID — measure friction, mass, joint dynamics of real robot. Set simulation nominal values to match.

  2. Apply DR around nominal — randomize ±30% around SysID values, don't randomize blindly.

  3. Curriculum learning — start with little randomization, increase gradually. Policy learns task first, robustness second.

  4. Observation design matters more than reward — use proprioceptive (joints, IMU) instead of just vision. Proprioceptive transfers better.

  5. Action space: Use delta position control instead of torque control. Position control absorbs more dynamics mismatch.

  6. Low-pass filter actions — smooth out jerky actions from policy. Real robots can't handle high-frequency commands.

  7. Safety wrapper — always have velocity/torque limits, workspace boundaries, and emergency stop outside the policy.

  8. Evaluate in sim first — if policy fails at 50% randomization range, it will fail on real robot. Target: >90% success rate in full DR range.

Conclusion

Sim-to-real transfer is the most important enabler for robot learning today. Combine domain randomization + system identification + curriculum learning for best results. Tools like Isaac Lab and MuJoCo are mature enough for production use.

Recommended pipeline:

SysID (measure real robot)
  → Setup sim (Isaac Lab / MuJoCo)
  → Train with DR + curriculum (4096 parallel envs)
  → Evaluate in sim (>90% success at full DR)
  → Deploy zero-shot to real robot
  → Fine-tune with real data if needed

Related Articles

Related Posts

IROS 2026: Papers navigation và manipulation đáng theo dõi
researchconferencerobotics

IROS 2026: Papers navigation và manipulation đáng theo dõi

Phân tích papers nổi bật về autonomous navigation và manipulation — chuẩn bị cho IROS 2026 Pittsburgh.

2/4/20267 min read
IROS 2026 Preview: Những gì đáng chờ đợi
researchconferencerobotics

IROS 2026 Preview: Những gì đáng chờ đợi

IROS 2026 Pittsburgh — preview workshops, competitions và nghiên cứu navigation, manipulation hàng đầu.

30/3/20267 min read
ResearchEmbodied AI 2026: Toàn cảnh và xu hướng
ai-perceptionresearchrobotics

Embodied AI 2026: Toàn cảnh và xu hướng

Tổng quan embodied AI -- từ foundation models, sim-to-real đến robot learning tại scale với open-source tools.

25/3/202612 min read