simulationsimulationsim2realrl

Domain Randomization: The Key to Sim-to-Real Transfer

Theory and practice of domain randomization — visual, dynamics, sensor randomization with successful case studies.

Nguyen Anh Tuan26 tháng 3, 20269 phút đọc
Domain Randomization: The Key to Sim-to-Real Transfer

What is Domain Randomization and Why Does It Work?

Domain Randomization (DR) is a technique for training AI policies on multiple variations of simulation instead of a single configuration. The core idea is simple: if a model has seen enough variations in sim, then reality is just one more variation.

Instead of trying to make simulation match reality 100% (which is nearly impossible), DR accepts that sim will never be perfect -- but if you train on a distribution wide enough, the policy will learn to generalize.

The original paper by Tobin et al. (2017) -- Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World -- proved that with sufficient randomization, an object detector trained entirely in sim can detect objects in the real world with high accuracy.

Why Does DR Work? -- Bayesian Perspective

From a Bayesian viewpoint, DR is equivalent to training on the posterior distribution of environment parameters:

P(policy works | real world) = integral P(policy works | env_params) * P(env_params) d(env_params)

When randomization range is wide enough, P(env_params) covers real-world params, so the policy naturally becomes robust. Reality is just one sample from the distribution the policy was trained on.

Domain randomization visualization for robot training

3 Types of Domain Randomization

1. Visual Domain Randomization

Visual DR randomizes everything related to what the camera sees. This is the most important type of DR for vision-based policies.

Parameters to randomize:

Parameter Reference Range Reason
Light intensity 200-5000 lux Real-world lighting changes continuously
Light color 3000-8000K From tungsten to daylight
Light position Entire workspace Shadow patterns change
Camera FOV 55-75 degrees Lens variations
Camera noise sigma 0.01-0.05 Sensor noise
Texture colors Full RGB range Surface appearance varies
Background Random images Distractor background
Distractor objects 0-10 objects Clutter in real environment

Implementation in Isaac Lab using YAML config:

# config/visual_randomization.yaml
# Isaac Lab visual domain randomization config

visual_randomization:
  lighting:
    intensity:
      distribution: "uniform"
      range: [300.0, 3000.0]
    color_temperature:
      distribution: "uniform"
      range: [3000.0, 7000.0]
    num_random_lights:
      distribution: "uniform_int"
      range: [1, 5]
    position:
      distribution: "uniform"
      range: [[-2.0, -2.0, 1.0], [2.0, 2.0, 3.0]]

  camera:
    fov:
      distribution: "uniform"
      range: [55.0, 75.0]
    position_noise:
      distribution: "gaussian"
      mean: 0.0
      std: 0.02
    rotation_noise_deg:
      distribution: "gaussian"
      mean: 0.0
      std: 3.0

  textures:
    table_color:
      distribution: "uniform"
      range: [[0.2, 0.2, 0.2], [0.9, 0.9, 0.9]]
    object_color:
      distribution: "uniform"
      range: [[0.1, 0.1, 0.1], [1.0, 1.0, 1.0]]
    randomize_background: true

  distractors:
    num_objects:
      distribution: "uniform_int"
      range: [0, 5]
    scale:
      distribution: "uniform"
      range: [0.02, 0.1]

2. Dynamics Domain Randomization

Dynamics DR randomizes physics parameters -- this is the most important type for control policies since it directly affects robot behavior.

"""
Dynamics Domain Randomization implementation.
Compatible with Isaac Lab ManagerBasedRLEnv.
"""
import numpy as np
from dataclasses import dataclass, field


@dataclass
class DynamicsRandomizationConfig:
    """Dynamics DR config with validated ranges."""

    # Object properties
    mass_scale_range: tuple = (0.5, 2.0)
    friction_range: tuple = (0.3, 1.5)
    restitution_range: tuple = (0.0, 0.5)

    # Robot joint properties
    joint_damping_scale: tuple = (0.8, 1.2)
    joint_friction_scale: tuple = (0.5, 2.0)
    actuator_strength_scale: tuple = (0.85, 1.15)

    # Control latency and noise
    action_delay_frames: tuple = (0, 3)
    action_noise_std: float = 0.02
    observation_noise_std: float = 0.01

    # Environment
    gravity_noise: float = 0.05  # m/s^2 variation around 9.81


@dataclass
class RandomizedParams:
    """Container for randomized parameters."""

    mass_scale: float = 1.0
    friction: float = 0.8
    restitution: float = 0.1
    damping_scale: float = 1.0
    joint_friction_scale: float = 1.0
    actuator_scale: float = 1.0
    action_delay: int = 0
    gravity: np.ndarray = field(
        default_factory=lambda: np.array([0.0, 0.0, -9.81])
    )


def sample_dynamics_params(
    config: DynamicsRandomizationConfig,
    rng: np.random.Generator | None = None,
) -> RandomizedParams:
    """Sample a set of parameters from DR config."""
    if rng is None:
        rng = np.random.default_rng()

    gravity_z = -9.81 + rng.normal(0, config.gravity_noise)

    return RandomizedParams(
        mass_scale=rng.uniform(*config.mass_scale_range),
        friction=rng.uniform(*config.friction_range),
        restitution=rng.uniform(*config.restitution_range),
        damping_scale=rng.uniform(*config.joint_damping_scale),
        joint_friction_scale=rng.uniform(*config.joint_friction_scale),
        actuator_scale=rng.uniform(*config.actuator_strength_scale),
        action_delay=rng.integers(*config.action_delay_frames),
        gravity=np.array([0.0, 0.0, gravity_z]),
    )


def apply_dynamics_params(env, params: RandomizedParams):
    """Apply randomized parameters to environment."""
    # Object physics
    env.scene.target_object.set_mass(
        env.scene.target_object.default_mass * params.mass_scale
    )
    env.scene.target_object.set_friction(params.friction)
    env.scene.target_object.set_restitution(params.restitution)

    # Robot joints
    for joint in env.robot.joints:
        joint.set_damping(joint.default_damping * params.damping_scale)
        joint.set_friction(
            joint.default_friction * params.joint_friction_scale
        )
        joint.set_max_force(
            joint.default_max_force * params.actuator_scale
        )

    # Gravity
    env.sim.set_gravity(params.gravity)

    # Action delay buffer
    env.action_delay = params.action_delay

    return env

3. Sensor Domain Randomization

Sensor DR simulates real sensor imperfections -- noise, delay, dropout, bias. This type is often overlooked but very important for sim-to-real transfer.

"""
Sensor Domain Randomization.
Wrap observation pipeline to add realistic noise.
"""
import numpy as np
from dataclasses import dataclass


@dataclass
class SensorRandomizationConfig:
    """Config for sensor noise and imperfections."""

    # Joint encoder noise
    joint_position_noise_std: float = 0.005    # radians
    joint_velocity_noise_std: float = 0.05     # rad/s
    joint_position_bias_range: tuple = (-0.01, 0.01)

    # IMU noise (based on MPU-6050 datasheet)
    imu_accel_noise_std: float = 0.1           # m/s^2
    imu_gyro_noise_std: float = 0.01           # rad/s
    imu_bias_instability: float = 0.001        # random walk

    # Force/torque sensor
    ft_noise_std: float = 0.5                  # Newton
    ft_dropout_prob: float = 0.01              # 1% chance of signal loss

    # Communication delay
    observation_delay_frames: tuple = (0, 2)


class SensorRandomizer:
    """Apply realistic sensor noise to observations."""

    def __init__(
        self,
        config: SensorRandomizationConfig,
        seed: int = 42,
    ):
        self.config = config
        self.rng = np.random.default_rng(seed)
        self.joint_bias = None
        self.imu_bias = None
        self._reset_biases()

    def _reset_biases(self):
        """Reset systematic biases each episode."""
        self.joint_bias = self.rng.uniform(
            *self.config.joint_position_bias_range, size=7
        )
        self.imu_bias = self.rng.normal(0, 0.01, size=6)

    def add_joint_noise(
        self, joint_pos: np.ndarray, joint_vel: np.ndarray
    ) -> tuple[np.ndarray, np.ndarray]:
        """Add noise to joint encoder readings."""
        noisy_pos = (
            joint_pos
            + self.rng.normal(
                0, self.config.joint_position_noise_std,
                size=joint_pos.shape,
            )
            + self.joint_bias[: len(joint_pos)]
        )
        noisy_vel = joint_vel + self.rng.normal(
            0, self.config.joint_velocity_noise_std,
            size=joint_vel.shape,
        )
        return noisy_pos, noisy_vel

    def add_imu_noise(
        self, accel: np.ndarray, gyro: np.ndarray
    ) -> tuple[np.ndarray, np.ndarray]:
        """Add noise to IMU readings with bias drift."""
        noisy_accel = accel + self.rng.normal(
            0, self.config.imu_accel_noise_std, size=3
        )
        noisy_gyro = gyro + self.rng.normal(
            0, self.config.imu_gyro_noise_std, size=3
        )
        # Bias instability (random walk)
        self.imu_bias[:3] += self.rng.normal(
            0, self.config.imu_bias_instability, size=3
        )
        self.imu_bias[3:] += self.rng.normal(
            0, self.config.imu_bias_instability, size=3
        )
        return noisy_accel + self.imu_bias[:3], noisy_gyro + self.imu_bias[3:]

    def add_ft_noise(self, force_torque: np.ndarray) -> np.ndarray:
        """Add noise and dropout to force/torque sensor."""
        noisy_ft = force_torque + self.rng.normal(
            0, self.config.ft_noise_std, size=force_torque.shape
        )
        if self.rng.random() < self.config.ft_dropout_prob:
            return np.zeros_like(force_torque)
        return noisy_ft

    def randomize_observation(self, obs: dict) -> dict:
        """Apply all sensor noise to observation dict."""
        noisy_obs = {}

        if "joint_pos" in obs and "joint_vel" in obs:
            noisy_obs["joint_pos"], noisy_obs["joint_vel"] = (
                self.add_joint_noise(obs["joint_pos"], obs["joint_vel"])
            )

        if "imu_accel" in obs and "imu_gyro" in obs:
            noisy_obs["imu_accel"], noisy_obs["imu_gyro"] = (
                self.add_imu_noise(obs["imu_accel"], obs["imu_gyro"])
            )

        if "force_torque" in obs:
            noisy_obs["force_torque"] = self.add_ft_noise(
                obs["force_torque"]
            )

        # Copy over non-randomized keys
        for key in obs:
            if key not in noisy_obs:
                noisy_obs[key] = obs[key]

        return noisy_obs

Sensor noise visualization in robotics simulation

Successful Case Studies

OpenAI Rubik's Cube (2019)

Paper: Solving Rubik's Cube with a Robot Hand

This is the most impressive demo of domain randomization. OpenAI trained the Shadow Dexterous Hand entirely in simulation using Automatic Domain Randomization (ADR) -- a technique that automatically expands randomization range once policy masters the current range.

Results:

  • Hundreds of parameters randomized simultaneously: friction, mass, damping, lighting, camera, textures
  • Policy develops emergent adaptation -- can adapt in real-time to new dynamics
  • Sim-to-real success rate: ~84% for finger repositioning -- unprecedented results for dexterous manipulation

ADR workflow:

Step 1: Start with narrow range -- friction in [0.9, 1.1]
Step 2: Train policy until success rate > 80%
Step 3: Expand range -- friction in [0.7, 1.3]
Step 4: Repeat until max range
Result: friction in [0.1, 2.0] -- policy so robust real world is "easy mode"

Agility Digit Humanoid Walking (2024)

Paper: Real-World Humanoid Locomotion with Reinforcement Learning

Agility Robotics trained locomotion policy for Digit humanoid robot:

  • Trained in Isaac Gym with 4096 parallel environments
  • DR on: terrain (flat, rough, slopes), friction (0.3-1.5), mass distribution (+-20%), motor strength (+-15%)
  • Used teacher-student distillation: teacher has privileged terrain info, student uses only proprioception
  • Sim-to-real success rate: ~91% on diverse real-world terrains

ANYmal Parkour (2023)

Paper: Extreme Parkour with Legged Robots

ETH Zurich trained ANYmal quadruped to traverse extreme terrain:

  • DR on terrain geometry, friction (0.2-1.5), external perturbations (random pushes)
  • Curriculum from flat ground to parkour obstacles
  • Sim-to-real success rate: ~93% for basic locomotion, ~87% for parkour tasks

Metrics and Evaluation

Metric Description Target
Sim success rate (no DR) Baseline performance >95%
Sim success rate (full DR) Robust performance >85%
Sim-to-real success rate Effective transfer >80%
Sim-to-real gap Difference sim vs real <15%
Adaptation time Time to adapt on real <10 episodes

Anti-patterns: Common Mistakes

1. Over-randomization

Randomizing too wide will prevent the policy from learning anything. For example randomizing friction from 0.01 to 10.0 -- this range is nonsensical since real-world friction is only 0.2-1.5.

Solution: Use System Identification to find nominal values, then randomize ±30-50% around them.

2. Correlated Parameters

In reality, many parameters are correlated. For example: heavy objects usually have higher friction (metal vs plastic). Independent randomization creates nonsensical combinations.

# WRONG: randomize independently
mass = np.random.uniform(0.1, 5.0)
friction = np.random.uniform(0.1, 2.0)

# RIGHT: randomize with correlation by material
material = np.random.choice(["plastic", "metal", "rubber", "wood"])
material_params = {
    "plastic": {"mass_range": (0.05, 0.5), "friction_range": (0.3, 0.6)},
    "metal":   {"mass_range": (0.5, 5.0),  "friction_range": (0.2, 0.5)},
    "rubber":  {"mass_range": (0.1, 1.0),  "friction_range": (0.8, 1.5)},
    "wood":    {"mass_range": (0.1, 2.0),  "friction_range": (0.4, 0.8)},
}
params = material_params[material]
mass = np.random.uniform(*params["mass_range"])
friction = np.random.uniform(*params["friction_range"])

3. No Curriculum

Throwing policy into full randomization from the start often prevents convergence. Always start with narrow range, increase gradually as training progresses.

4. Only Randomize Visuals, Ignore Dynamics

Many people focus only on visual DR because it's easier to implement. But for control tasks, dynamics DR matters much more. Policies failing due to 20% friction difference is more common than failing due to different lighting.

Robot arm in manufacturing environment

References

Important papers on domain randomization:

  1. Tobin et al., 2017 -- Domain Randomization for Sim-to-Real Transfer -- Foundational paper
  2. OpenAI, 2019 -- Solving Rubik's Cube (ADR) -- Automatic Domain Randomization
  3. Mehta et al., 2020 -- Active Domain Randomization -- Smarter DR parameter selection
  4. Muratore et al., 2022 -- Robot Learning from Randomized Simulations -- Comprehensive survey

NT

Nguyễn Anh Tuấn

Robotics & AI Engineer. Building VnRobo — sharing knowledge about robot learning, VLA models, and automation.

Bài viết liên quan

NEWTutorial
Genie Sim 3.0: Huấn luyện Humanoid với AGIBOT
simulationhumanoidisaac-simgenie-simagibotsim-to-realreinforcement-learning

Genie Sim 3.0: Huấn luyện Humanoid với AGIBOT

Hướng dẫn chi tiết dựng môi trường simulation với Genie Sim 3.0 — nền tảng open-source từ AGIBOT trên Isaac Sim để huấn luyện robot humanoid.

12/4/202611 phút đọc
NEWDeep Dive
WholebodyVLA Open-Source: Hướng Dẫn Kiến Trúc & Code
vlahumanoidloco-manipulationiclrrlopen-sourceisaac-lab

WholebodyVLA Open-Source: Hướng Dẫn Kiến Trúc & Code

Deep-dive vào codebase WholebodyVLA — kiến trúc latent action, LMO RL policy, và cách xây dựng pipeline whole-body loco-manipulation cho humanoid.

12/4/202619 phút đọc
NEWTutorial
Hướng dẫn fine-tune NVIDIA GR00T N1
vlahumanoidnvidiaisaac-labfine-tuningdeep-learninggrootsim2real

Hướng dẫn fine-tune NVIDIA GR00T N1

Hướng dẫn chi tiết fine-tune VLA model GR00T N1 cho humanoid robot với Isaac Lab và dữ liệu AGIBOT World — từ cài đặt đến inference.

12/4/202612 phút đọc