What is Domain Randomization and Why Does It Work?
Domain Randomization (DR) is a technique for training AI policies on multiple variations of simulation instead of a single configuration. The core idea is simple: if a model has seen enough variations in sim, then reality is just one more variation.
Instead of trying to make simulation match reality 100% (which is nearly impossible), DR accepts that sim will never be perfect -- but if you train on a distribution wide enough, the policy will learn to generalize.
The original paper by Tobin et al. (2017) -- Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World -- proved that with sufficient randomization, an object detector trained entirely in sim can detect objects in the real world with high accuracy.
Why Does DR Work? -- Bayesian Perspective
From a Bayesian viewpoint, DR is equivalent to training on the posterior distribution of environment parameters:
P(policy works | real world) = integral P(policy works | env_params) * P(env_params) d(env_params)
When randomization range is wide enough, P(env_params) covers real-world params, so the policy naturally becomes robust. Reality is just one sample from the distribution the policy was trained on.
3 Types of Domain Randomization
1. Visual Domain Randomization
Visual DR randomizes everything related to what the camera sees. This is the most important type of DR for vision-based policies.
Parameters to randomize:
| Parameter | Reference Range | Reason |
|---|---|---|
| Light intensity | 200-5000 lux | Real-world lighting changes continuously |
| Light color | 3000-8000K | From tungsten to daylight |
| Light position | Entire workspace | Shadow patterns change |
| Camera FOV | 55-75 degrees | Lens variations |
| Camera noise | sigma 0.01-0.05 | Sensor noise |
| Texture colors | Full RGB range | Surface appearance varies |
| Background | Random images | Distractor background |
| Distractor objects | 0-10 objects | Clutter in real environment |
Implementation in Isaac Lab using YAML config:
# config/visual_randomization.yaml
# Isaac Lab visual domain randomization config
visual_randomization:
lighting:
intensity:
distribution: "uniform"
range: [300.0, 3000.0]
color_temperature:
distribution: "uniform"
range: [3000.0, 7000.0]
num_random_lights:
distribution: "uniform_int"
range: [1, 5]
position:
distribution: "uniform"
range: [[-2.0, -2.0, 1.0], [2.0, 2.0, 3.0]]
camera:
fov:
distribution: "uniform"
range: [55.0, 75.0]
position_noise:
distribution: "gaussian"
mean: 0.0
std: 0.02
rotation_noise_deg:
distribution: "gaussian"
mean: 0.0
std: 3.0
textures:
table_color:
distribution: "uniform"
range: [[0.2, 0.2, 0.2], [0.9, 0.9, 0.9]]
object_color:
distribution: "uniform"
range: [[0.1, 0.1, 0.1], [1.0, 1.0, 1.0]]
randomize_background: true
distractors:
num_objects:
distribution: "uniform_int"
range: [0, 5]
scale:
distribution: "uniform"
range: [0.02, 0.1]
2. Dynamics Domain Randomization
Dynamics DR randomizes physics parameters -- this is the most important type for control policies since it directly affects robot behavior.
"""
Dynamics Domain Randomization implementation.
Compatible with Isaac Lab ManagerBasedRLEnv.
"""
import numpy as np
from dataclasses import dataclass, field
@dataclass
class DynamicsRandomizationConfig:
"""Dynamics DR config with validated ranges."""
# Object properties
mass_scale_range: tuple = (0.5, 2.0)
friction_range: tuple = (0.3, 1.5)
restitution_range: tuple = (0.0, 0.5)
# Robot joint properties
joint_damping_scale: tuple = (0.8, 1.2)
joint_friction_scale: tuple = (0.5, 2.0)
actuator_strength_scale: tuple = (0.85, 1.15)
# Control latency and noise
action_delay_frames: tuple = (0, 3)
action_noise_std: float = 0.02
observation_noise_std: float = 0.01
# Environment
gravity_noise: float = 0.05 # m/s^2 variation around 9.81
@dataclass
class RandomizedParams:
"""Container for randomized parameters."""
mass_scale: float = 1.0
friction: float = 0.8
restitution: float = 0.1
damping_scale: float = 1.0
joint_friction_scale: float = 1.0
actuator_scale: float = 1.0
action_delay: int = 0
gravity: np.ndarray = field(
default_factory=lambda: np.array([0.0, 0.0, -9.81])
)
def sample_dynamics_params(
config: DynamicsRandomizationConfig,
rng: np.random.Generator | None = None,
) -> RandomizedParams:
"""Sample a set of parameters from DR config."""
if rng is None:
rng = np.random.default_rng()
gravity_z = -9.81 + rng.normal(0, config.gravity_noise)
return RandomizedParams(
mass_scale=rng.uniform(*config.mass_scale_range),
friction=rng.uniform(*config.friction_range),
restitution=rng.uniform(*config.restitution_range),
damping_scale=rng.uniform(*config.joint_damping_scale),
joint_friction_scale=rng.uniform(*config.joint_friction_scale),
actuator_scale=rng.uniform(*config.actuator_strength_scale),
action_delay=rng.integers(*config.action_delay_frames),
gravity=np.array([0.0, 0.0, gravity_z]),
)
def apply_dynamics_params(env, params: RandomizedParams):
"""Apply randomized parameters to environment."""
# Object physics
env.scene.target_object.set_mass(
env.scene.target_object.default_mass * params.mass_scale
)
env.scene.target_object.set_friction(params.friction)
env.scene.target_object.set_restitution(params.restitution)
# Robot joints
for joint in env.robot.joints:
joint.set_damping(joint.default_damping * params.damping_scale)
joint.set_friction(
joint.default_friction * params.joint_friction_scale
)
joint.set_max_force(
joint.default_max_force * params.actuator_scale
)
# Gravity
env.sim.set_gravity(params.gravity)
# Action delay buffer
env.action_delay = params.action_delay
return env
3. Sensor Domain Randomization
Sensor DR simulates real sensor imperfections -- noise, delay, dropout, bias. This type is often overlooked but very important for sim-to-real transfer.
"""
Sensor Domain Randomization.
Wrap observation pipeline to add realistic noise.
"""
import numpy as np
from dataclasses import dataclass
@dataclass
class SensorRandomizationConfig:
"""Config for sensor noise and imperfections."""
# Joint encoder noise
joint_position_noise_std: float = 0.005 # radians
joint_velocity_noise_std: float = 0.05 # rad/s
joint_position_bias_range: tuple = (-0.01, 0.01)
# IMU noise (based on MPU-6050 datasheet)
imu_accel_noise_std: float = 0.1 # m/s^2
imu_gyro_noise_std: float = 0.01 # rad/s
imu_bias_instability: float = 0.001 # random walk
# Force/torque sensor
ft_noise_std: float = 0.5 # Newton
ft_dropout_prob: float = 0.01 # 1% chance of signal loss
# Communication delay
observation_delay_frames: tuple = (0, 2)
class SensorRandomizer:
"""Apply realistic sensor noise to observations."""
def __init__(
self,
config: SensorRandomizationConfig,
seed: int = 42,
):
self.config = config
self.rng = np.random.default_rng(seed)
self.joint_bias = None
self.imu_bias = None
self._reset_biases()
def _reset_biases(self):
"""Reset systematic biases each episode."""
self.joint_bias = self.rng.uniform(
*self.config.joint_position_bias_range, size=7
)
self.imu_bias = self.rng.normal(0, 0.01, size=6)
def add_joint_noise(
self, joint_pos: np.ndarray, joint_vel: np.ndarray
) -> tuple[np.ndarray, np.ndarray]:
"""Add noise to joint encoder readings."""
noisy_pos = (
joint_pos
+ self.rng.normal(
0, self.config.joint_position_noise_std,
size=joint_pos.shape,
)
+ self.joint_bias[: len(joint_pos)]
)
noisy_vel = joint_vel + self.rng.normal(
0, self.config.joint_velocity_noise_std,
size=joint_vel.shape,
)
return noisy_pos, noisy_vel
def add_imu_noise(
self, accel: np.ndarray, gyro: np.ndarray
) -> tuple[np.ndarray, np.ndarray]:
"""Add noise to IMU readings with bias drift."""
noisy_accel = accel + self.rng.normal(
0, self.config.imu_accel_noise_std, size=3
)
noisy_gyro = gyro + self.rng.normal(
0, self.config.imu_gyro_noise_std, size=3
)
# Bias instability (random walk)
self.imu_bias[:3] += self.rng.normal(
0, self.config.imu_bias_instability, size=3
)
self.imu_bias[3:] += self.rng.normal(
0, self.config.imu_bias_instability, size=3
)
return noisy_accel + self.imu_bias[:3], noisy_gyro + self.imu_bias[3:]
def add_ft_noise(self, force_torque: np.ndarray) -> np.ndarray:
"""Add noise and dropout to force/torque sensor."""
noisy_ft = force_torque + self.rng.normal(
0, self.config.ft_noise_std, size=force_torque.shape
)
if self.rng.random() < self.config.ft_dropout_prob:
return np.zeros_like(force_torque)
return noisy_ft
def randomize_observation(self, obs: dict) -> dict:
"""Apply all sensor noise to observation dict."""
noisy_obs = {}
if "joint_pos" in obs and "joint_vel" in obs:
noisy_obs["joint_pos"], noisy_obs["joint_vel"] = (
self.add_joint_noise(obs["joint_pos"], obs["joint_vel"])
)
if "imu_accel" in obs and "imu_gyro" in obs:
noisy_obs["imu_accel"], noisy_obs["imu_gyro"] = (
self.add_imu_noise(obs["imu_accel"], obs["imu_gyro"])
)
if "force_torque" in obs:
noisy_obs["force_torque"] = self.add_ft_noise(
obs["force_torque"]
)
# Copy over non-randomized keys
for key in obs:
if key not in noisy_obs:
noisy_obs[key] = obs[key]
return noisy_obs
Successful Case Studies
OpenAI Rubik's Cube (2019)
Paper: Solving Rubik's Cube with a Robot Hand
This is the most impressive demo of domain randomization. OpenAI trained the Shadow Dexterous Hand entirely in simulation using Automatic Domain Randomization (ADR) -- a technique that automatically expands randomization range once policy masters the current range.
Results:
- Hundreds of parameters randomized simultaneously: friction, mass, damping, lighting, camera, textures
- Policy develops emergent adaptation -- can adapt in real-time to new dynamics
- Sim-to-real success rate: ~84% for finger repositioning -- unprecedented results for dexterous manipulation
ADR workflow:
Step 1: Start with narrow range -- friction in [0.9, 1.1]
Step 2: Train policy until success rate > 80%
Step 3: Expand range -- friction in [0.7, 1.3]
Step 4: Repeat until max range
Result: friction in [0.1, 2.0] -- policy so robust real world is "easy mode"
Agility Digit Humanoid Walking (2024)
Paper: Real-World Humanoid Locomotion with Reinforcement Learning
Agility Robotics trained locomotion policy for Digit humanoid robot:
- Trained in Isaac Gym with 4096 parallel environments
- DR on: terrain (flat, rough, slopes), friction (0.3-1.5), mass distribution (+-20%), motor strength (+-15%)
- Used teacher-student distillation: teacher has privileged terrain info, student uses only proprioception
- Sim-to-real success rate: ~91% on diverse real-world terrains
ANYmal Parkour (2023)
Paper: Extreme Parkour with Legged Robots
ETH Zurich trained ANYmal quadruped to traverse extreme terrain:
- DR on terrain geometry, friction (0.2-1.5), external perturbations (random pushes)
- Curriculum from flat ground to parkour obstacles
- Sim-to-real success rate: ~93% for basic locomotion, ~87% for parkour tasks
Metrics and Evaluation
| Metric | Description | Target |
|---|---|---|
| Sim success rate (no DR) | Baseline performance | >95% |
| Sim success rate (full DR) | Robust performance | >85% |
| Sim-to-real success rate | Effective transfer | >80% |
| Sim-to-real gap | Difference sim vs real | <15% |
| Adaptation time | Time to adapt on real | <10 episodes |
Anti-patterns: Common Mistakes
1. Over-randomization
Randomizing too wide will prevent the policy from learning anything. For example randomizing friction from 0.01 to 10.0 -- this range is nonsensical since real-world friction is only 0.2-1.5.
Solution: Use System Identification to find nominal values, then randomize ±30-50% around them.
2. Correlated Parameters
In reality, many parameters are correlated. For example: heavy objects usually have higher friction (metal vs plastic). Independent randomization creates nonsensical combinations.
# WRONG: randomize independently
mass = np.random.uniform(0.1, 5.0)
friction = np.random.uniform(0.1, 2.0)
# RIGHT: randomize with correlation by material
material = np.random.choice(["plastic", "metal", "rubber", "wood"])
material_params = {
"plastic": {"mass_range": (0.05, 0.5), "friction_range": (0.3, 0.6)},
"metal": {"mass_range": (0.5, 5.0), "friction_range": (0.2, 0.5)},
"rubber": {"mass_range": (0.1, 1.0), "friction_range": (0.8, 1.5)},
"wood": {"mass_range": (0.1, 2.0), "friction_range": (0.4, 0.8)},
}
params = material_params[material]
mass = np.random.uniform(*params["mass_range"])
friction = np.random.uniform(*params["friction_range"])
3. No Curriculum
Throwing policy into full randomization from the start often prevents convergence. Always start with narrow range, increase gradually as training progresses.
4. Only Randomize Visuals, Ignore Dynamics
Many people focus only on visual DR because it's easier to implement. But for control tasks, dynamics DR matters much more. Policies failing due to 20% friction difference is more common than failing due to different lighting.
References
Important papers on domain randomization:
- Tobin et al., 2017 -- Domain Randomization for Sim-to-Real Transfer -- Foundational paper
- OpenAI, 2019 -- Solving Rubik's Cube (ADR) -- Automatic Domain Randomization
- Mehta et al., 2020 -- Active Domain Randomization -- Smarter DR parameter selection
- Muratore et al., 2022 -- Robot Learning from Randomized Simulations -- Comprehensive survey
Related Articles
- Sim-to-Real Transfer: Train in Simulation, Run in Reality -- Overview of sim-to-real transfer and key techniques
- Sim-to-Real Pipeline: From Training to Real Robot -- End-to-end guide to deploy policies on real robots
- Digital Twins and ROS 2: Simulation in Manufacturing -- Application of simulation in industry
- RL for Bipedal Walking: From MuJoCo to Reality -- Case study of RL locomotion