Isaac Lab -- Unified Framework for Robot Learning
In Part 1 I introduced Isaac Sim/Lab, and in Part 2 I guided you through MuJoCo from zero. Now arrives the most exciting part: train locomotion policies with 4,096 parallel environments on GPU using Isaac Lab.
Isaac Lab (formerly Isaac Gym + Orbit) is NVIDIA's unified framework for robot learning. It combines:
- PhysX 5 (GPU-accelerated physics)
- RTX rendering (photorealistic)
- Domain randomization (built-in)
- RL training (RSL-RL, rl_games, Stable-Baselines3)
Latest version: Isaac Lab 2.2 (GA at SIGGRAPH 2025), running on Isaac Sim 5.0.
System Requirements
Before starting, verify:
| Requirement | Minimum | Recommended |
|---|---|---|
| GPU | NVIDIA RTX 3070 (8GB) | RTX 4080/4090 (16GB+) |
| VRAM | 8 GB | 16 GB+ |
| RAM | 32 GB | 64 GB |
| OS | Ubuntu 22.04 | Ubuntu 22.04 |
| Driver | NVIDIA 535+ | 550+ |
| CUDA | 12.1+ | 12.4+ |
| Disk | 50 GB free | 100 GB+ (Omniverse + assets) |
Important: Isaac Lab does NOT support Windows native or macOS. Only Ubuntu is officially supported. WSL2 can run it but is not recommended for training.
Step 1: Install Isaac Sim and Isaac Lab
1a. Install Isaac Sim 5.0
# Method 1: Pip install (recommended for Isaac Lab)
pip install isaacsim==5.0.0 --extra-index-url https://pypi.nvidia.com
# Method 2: Omniverse Launcher
# Download from https://developer.nvidia.com/omniverse
# Install Isaac Sim from Launcher
1b. Clone and Install Isaac Lab
# Clone Isaac Lab
git clone https://github.com/isaac-sim/IsaacLab.git
cd IsaacLab
# Install Isaac Lab and dependencies
./isaaclab.sh --install
# Choose: rsl_rl (for locomotion training)
# Verify installation
./isaaclab.sh -p scripts/tutorials/00_sim/create_empty.py
If you see the Omniverse viewport appear with an empty scene, installation was successful.
1c. Verify GPU
"""
Check GPU and Isaac Sim are ready.
"""
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"VRAM: {torch.cuda.get_device_properties(0).total_mem / 1e9:.1f} GB")
# Test Isaac Sim import
from isaacsim import SimulationApp
app = SimulationApp({"headless": True})
print("Isaac Sim initialized successfully!")
app.close()
Step 2: Understand Isaac Lab Architecture
Isaac Lab uses Manager-Based architecture:
ManagerBasedRLEnv
├── SceneManager # Spawn robots, objects, lights
│ ├── ArticulationCfg # Robot definition
│ ├── RigidObjectCfg # Objects
│ └── SensorsCfg # Cameras, LiDAR, contact
├── ObservationManager # Create observation vector
├── ActionManager # Process actions from policy
├── RewardManager # Calculate reward signals
├── TerminationManager # Check episode done
├── EventManager # Domain randomization
└── CurriculumManager # Gradually increase difficulty
Each manager is configured via Python dataclasses -- no XML/YAML needed.
Step 3: Create Locomotion Environment
This is the environment config for quadruped locomotion training. The robot will learn to walk on flat terrain with velocity commands.
"""
Isaac Lab Quadruped Locomotion Environment Config
File: envs/quadruped_flat_env_cfg.py
"""
import math
from dataclasses import MISSING
import isaaclab.sim as sim_utils
from isaaclab.assets import ArticulationCfg, AssetBaseCfg
from isaaclab.envs import ManagerBasedRLEnvCfg
from isaaclab.managers import (
CurriculumTermCfg,
EventTermCfg,
ObservationGroupCfg,
ObservationTermCfg,
RewardTermCfg,
SceneEntityCfg,
TerminationTermCfg,
)
from isaaclab.scene import InteractiveSceneCfg
from isaaclab.terrains import TerrainImporterCfg
from isaaclab.utils import configclass
from isaaclab.utils.noise import AdditiveUniformNoiseCfg
import isaaclab_tasks.manager_based.locomotion.velocity.mdp as mdp
@configclass
class QuadrupedFlatSceneCfg(InteractiveSceneCfg):
"""Scene configuration -- ground plane + robot."""
# Ground plane
ground = AssetBaseCfg(
prim_path="/World/defaultGroundPlane",
spawn=sim_utils.GroundPlaneCfg(),
)
# Distant light
light = AssetBaseCfg(
prim_path="/World/Light",
spawn=sim_utils.DistantLightCfg(
intensity=3000.0,
color=(0.75, 0.75, 0.75),
),
)
# Robot: ANYmal-D quadruped
robot: ArticulationCfg = ArticulationCfg(
prim_path="{ENV_REGEX_NS}/Robot",
spawn=sim_utils.UsdFileCfg(
usd_path=(
"isaaclab_assets/Robots/ANYbotics/ANYmal-D/anymal_d.usd"
),
activate_contact_sensors=True,
rigid_props=sim_utils.RigidBodyPropertiesCfg(
disable_gravity=False,
retain_accelerations=False,
),
articulation_props=sim_utils.ArticulationRootPropertiesCfg(
enabled_self_collisions=False,
solver_position_iteration_count=4,
solver_velocity_iteration_count=4,
),
),
init_state=ArticulationCfg.InitialStateCfg(
pos=(0.0, 0.0, 0.6),
joint_pos={
".*HAA": 0.0, # Hip Abduction/Adduction
".*HFE": 0.4, # Hip Flexion/Extension
".*KFE": -0.8, # Knee Flexion/Extension
},
),
)
@configclass
class ObservationsCfg:
"""Observation space -- what the policy sees."""
@configclass
class PolicyCfg(ObservationGroupCfg):
# Base velocity (3) -- linear velocity
base_lin_vel = ObservationTermCfg(
func=mdp.base_lin_vel,
noise=AdditiveUniformNoiseCfg(n_min=-0.1, n_max=0.1),
)
# Base angular velocity (3)
base_ang_vel = ObservationTermCfg(
func=mdp.base_ang_vel,
noise=AdditiveUniformNoiseCfg(n_min=-0.2, n_max=0.2),
)
# Projected gravity (3) -- orientation info
projected_gravity = ObservationTermCfg(
func=mdp.projected_gravity,
noise=AdditiveUniformNoiseCfg(n_min=-0.05, n_max=0.05),
)
# Velocity command (3) -- target vx, vy, yaw_rate
velocity_commands = ObservationTermCfg(
func=mdp.generated_commands,
params={"command_name": "base_velocity"},
)
# Joint positions (12)
joint_pos = ObservationTermCfg(
func=mdp.joint_pos_rel,
noise=AdditiveUniformNoiseCfg(n_min=-0.01, n_max=0.01),
)
# Joint velocities (12)
joint_vel = ObservationTermCfg(
func=mdp.joint_vel_rel,
noise=AdditiveUniformNoiseCfg(n_min=-1.5, n_max=1.5),
)
# Previous actions (12)
actions = ObservationTermCfg(func=mdp.last_action)
policy: PolicyCfg = PolicyCfg()
@configclass
class RewardsCfg:
"""Reward terms -- guide the policy."""
# === Positive rewards ===
# Track linear velocity command
track_lin_vel_xy_exp = RewardTermCfg(
func=mdp.track_lin_vel_xy_exp,
weight=1.5,
params={"command_name": "base_velocity", "std": math.exp(-0.25)},
)
# Track angular velocity command
track_ang_vel_z_exp = RewardTermCfg(
func=mdp.track_ang_vel_z_exp,
weight=0.75,
params={"command_name": "base_velocity", "std": math.exp(-0.25)},
)
# === Negative rewards (penalties) ===
# Penalize z-axis linear velocity (bouncing)
lin_vel_z_l2 = RewardTermCfg(
func=mdp.lin_vel_z_l2, weight=-2.0
)
# Penalize xy angular velocity (rolling/pitching)
ang_vel_xy_l2 = RewardTermCfg(
func=mdp.ang_vel_xy_l2, weight=-0.05
)
# Penalize joint torques
joint_torques_l2 = RewardTermCfg(
func=mdp.joint_torques_l2, weight=-0.0001
)
# Penalize joint acceleration
joint_acc_l2 = RewardTermCfg(
func=mdp.joint_acc_l2, weight=-2.5e-7
)
# Penalize action rate (smooth actions)
action_rate_l2 = RewardTermCfg(
func=mdp.action_rate_l2, weight=-0.01
)
# Penalize undesired contacts (body touching ground)
undesired_contacts = RewardTermCfg(
func=mdp.undesired_contacts,
weight=-1.0,
params={
"sensor_cfg": SceneEntityCfg(
"contact_forces", body_names=".*THIGH"
),
"threshold": 1.0,
},
)
@configclass
class TerminationsCfg:
"""When to end an episode."""
time_out = TerminationTermCfg(
func=mdp.time_out, time_out=True
)
base_contact = TerminationTermCfg(
func=mdp.illegal_contact,
params={
"sensor_cfg": SceneEntityCfg(
"contact_forces", body_names="base"
),
"threshold": 1.0,
},
)
Step 4: Domain Randomization Config
This is the most important part for sim-to-real transfer. Isaac Lab has a built-in event system for domain randomization:
"""
Domain Randomization Events
File: envs/quadruped_flat_env_cfg.py (continued)
"""
@configclass
class EventsCfg:
"""Domain randomization -- key to sim-to-real transfer."""
# === Reset events (each episode) ===
# Random initial base position/orientation
reset_base = EventTermCfg(
func=mdp.reset_root_state_uniform,
mode="reset",
params={
"pose_range": {
"x": (-0.5, 0.5),
"y": (-0.5, 0.5),
"yaw": (-3.14, 3.14),
},
"velocity_range": {
"x": (-0.5, 0.5),
"y": (-0.5, 0.5),
"z": (-0.5, 0.5),
},
},
)
# Random initial joint positions
reset_joints = EventTermCfg(
func=mdp.reset_joints_by_offset,
mode="reset",
params={
"position_range": (-0.2, 0.2),
"velocity_range": (-0.5, 0.5),
},
)
# === Interval events (periodically within episode) ===
# Random external push (test balance recovery)
push_robot = EventTermCfg(
func=mdp.push_by_setting_velocity,
mode="interval",
interval_range_s=(10.0, 15.0),
params={
"velocity_range": {
"x": (-0.5, 0.5),
"y": (-0.5, 0.5),
},
},
)
# === Startup events (once when creating env) ===
# Randomize physics material (friction)
physics_material = EventTermCfg(
func=mdp.randomize_rigid_body_material,
mode="startup",
params={
"asset_cfg": SceneEntityCfg("robot", body_names=".*"),
"static_friction_range": (0.4, 1.2),
"dynamic_friction_range": (0.4, 1.0),
"restitution_range": (0.0, 0.4),
"num_buckets": 64,
},
)
# Randomize robot mass
add_base_mass = EventTermCfg(
func=mdp.randomize_rigid_body_mass,
mode="startup",
params={
"asset_cfg": SceneEntityCfg("robot", body_names="base"),
"mass_distribution_params": (-3.0, 3.0),
"operation": "add",
},
)
# Randomize joint properties
randomize_actuator_gains = EventTermCfg(
func=mdp.randomize_actuator_gains,
mode="startup",
params={
"asset_cfg": SceneEntityCfg("robot", joint_names=".*"),
"stiffness_distribution_params": (0.8, 1.2),
"damping_distribution_params": (0.8, 1.2),
"operation": "scale",
},
)
Step 5: Train with RSL-RL
RSL-RL Config
"""
RSL-RL PPO training config.
File: agents/rsl_rl_ppo_cfg.py
"""
from isaaclab_rl.rsl_rl import RslRlOnPolicyRunnerCfg, RslRlPpoCfg
QUADRUPED_PPO_CFG = RslRlOnPolicyRunnerCfg(
num_steps_per_env=24,
max_iterations=1500,
save_interval=100,
experiment_name="quadruped_flat",
empirical_normalization=False,
policy=RslRlPpoCfg(
# Actor-Critic MLP
actor_hidden_dims=[512, 256, 128],
critic_hidden_dims=[512, 256, 128],
activation="elu",
# PPO hyperparameters
init_noise_std=1.0,
num_learning_epochs=5,
num_mini_batches=4,
clip_param=0.2,
discount_factor=0.99,
gae_lambda=0.95,
learning_rate=1e-3,
max_grad_norm=1.0,
value_loss_coef=1.0,
entropy_coef=0.01,
# Schedule
desired_kl=0.01,
schedule="adaptive",
),
)
Launch training
# Train with 4096 parallel environments (headless mode)
cd IsaacLab
./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py \
--task Isaac-Velocity-Flat-Anymal-D-v0 \
--num_envs 4096 \
--headless \
--max_iterations 1500
# Performance benchmark:
# RTX 4090: ~90,000 FPS -> ~1500 iterations in ~20 minutes
# RTX 3080: ~50,000 FPS -> ~1500 iterations in ~35 minutes
Training output will look like:
[INFO] Starting training...
[INFO] Env: Isaac-Velocity-Flat-Anymal-D-v0 | Num envs: 4096
[INFO] Policy: MLP [512, 256, 128] | Activation: ELU
---
Iter | Reward | Ep. Len | FPS | LR
0 | -12.34 | 200 | 89542 | 1.00e-03
100 | 2.15 | 350 | 91203 | 8.50e-04
200 | 8.73 | 500 | 90876 | 6.20e-04
500 | 15.42 | 800 | 91105 | 3.10e-04
1000 | 19.87 | 1000 | 90234 | 1.50e-04
1500 | 21.56 | 1000 | 91342 | 1.00e-04
---
[INFO] Training complete! Model saved to: logs/rsl_rl/quadruped_flat/
Monitor with TensorBoard
# In a different terminal
tensorboard --logdir logs/rsl_rl/quadruped_flat/ --port 6006
# Open browser: http://localhost:6006
Important metrics to track:
- Mean reward: Gradually increases and converges -- sign of successful training
- Episode length: Increases (robot doesn't fall early)
- Policy loss: Decreases gradually and stabilizes
- Value loss: Decreases gradually
- Learning rate: Decreases per adaptive schedule
- KL divergence: Stays around
desired_kl(0.01)
Step 6: Evaluate and Visualize
# Play trained policy with rendering
./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/play.py \
--task Isaac-Velocity-Flat-Anymal-D-v0 \
--num_envs 32 \
--checkpoint logs/rsl_rl/quadruped_flat/model_1500.pt
Or evaluate in Python:
"""
Evaluate trained policy and print statistics.
"""
import torch
import numpy as np
from isaaclab.envs import ManagerBasedRLEnv
def evaluate_policy(env, policy, num_episodes=100):
"""Run policy and calculate success metrics."""
episode_rewards = []
episode_lengths = []
obs, _ = env.reset()
current_rewards = torch.zeros(env.num_envs, device=env.device)
current_lengths = torch.zeros(env.num_envs, device=env.device)
while len(episode_rewards) < num_episodes:
with torch.no_grad():
actions = policy(obs)
obs, rewards, dones, truncated, infos = env.step(actions)
current_rewards += rewards
current_lengths += 1
# Collect finished episodes
done_mask = dones | truncated
if done_mask.any():
done_indices = done_mask.nonzero(as_tuple=False).squeeze(-1)
for idx in done_indices:
episode_rewards.append(current_rewards[idx].item())
episode_lengths.append(current_lengths[idx].item())
current_rewards[done_indices] = 0
current_lengths[done_indices] = 0
# Statistics
rewards_arr = np.array(episode_rewards[:num_episodes])
lengths_arr = np.array(episode_lengths[:num_episodes])
print(f"\n=== Evaluation ({num_episodes} episodes) ===")
print(f"Reward: {rewards_arr.mean():.2f} +/- {rewards_arr.std():.2f}")
print(f"Ep Length: {lengths_arr.mean():.0f} +/- {lengths_arr.std():.0f}")
print(f"Max reward: {rewards_arr.max():.2f}")
print(f"Min reward: {rewards_arr.min():.2f}")
return rewards_arr, lengths_arr
Step 7: Export Policy for Deployment
"""
Export trained policy to TorchScript for deployment.
Can run on Jetson, robot controller, or ROS 2 node.
"""
import torch
import os
def export_policy(checkpoint_path, output_path, obs_dim=48, act_dim=12):
"""Export RSL-RL policy to TorchScript."""
# Load checkpoint
checkpoint = torch.load(checkpoint_path, map_location="cpu")
print(f"Loaded checkpoint: {checkpoint_path}")
# Create dummy input for tracing
dummy_obs = torch.randn(1, obs_dim)
# Load actor network
from rsl_rl.modules import ActorCritic
actor_critic = ActorCritic(
num_obs=obs_dim,
num_actions=act_dim,
actor_hidden_dims=[512, 256, 128],
critic_hidden_dims=[512, 256, 128],
activation="elu",
)
actor_critic.load_state_dict(checkpoint["model_state_dict"])
actor_critic.eval()
# Export only actor (don't need critic for inference)
class PolicyWrapper(torch.nn.Module):
def __init__(self, actor):
super().__init__()
self.actor = actor
def forward(self, obs):
return self.actor(obs)
wrapper = PolicyWrapper(actor_critic.actor)
traced = torch.jit.trace(wrapper, dummy_obs)
traced.save(output_path)
# Verify
loaded = torch.jit.load(output_path)
test_out = loaded(dummy_obs)
print(f"Exported to: {output_path}")
print(f"Input shape: {dummy_obs.shape}")
print(f"Output shape: {test_out.shape}")
print(f"File size: {os.path.getsize(output_path) / 1024:.1f} KB")
# Export
export_policy(
checkpoint_path="logs/rsl_rl/quadruped_flat/model_1500.pt",
output_path="quadruped_policy.pt",
obs_dim=48,
act_dim=12,
)
Exported policy can be deployed via:
- ROS 2 node (see sim-to-real article for details)
- NVIDIA Jetson (see Edge AI article)
- Isaac ROS for NVIDIA robot stack
Tips and Best Practices
1. VRAM Management
# Check VRAM usage during training
nvidia-smi --query-gpu=memory.used,memory.total --format=csv -l 5
# Reduce num_envs if running out of VRAM:
# RTX 3070 (8GB): --num_envs 2048
# RTX 3080 (10GB): --num_envs 3072
# RTX 4080 (16GB): --num_envs 4096
# RTX 4090 (24GB): --num_envs 8192
2. Reward Tuning
- Start with positive rewards first (track velocity), add penalties after
- Use exponential tracking rewards (
track_lin_vel_xy_exp) instead of L2 -- more stable - Action rate penalty is critical -- without it policy will output jerky actions
3. Domain Randomization
- Start with minimal randomization, increase gradually once policy learns basic locomotion
- Friction randomization is most important for sim-to-real
- External push tests balance recovery -- critical for real-world robustness
- Mass randomization of +-3kg for quadruped is reasonable
4. Debug
# Run with rendering to see what robot does
./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py \
--task Isaac-Velocity-Flat-Anymal-D-v0 \
--num_envs 16
# Don't use --headless -> viewport will appear
# Reduce num_envs to observe individual robots
Summary
In this article, you learned:
- Install Isaac Sim 5.0 + Isaac Lab 2.2
- Architecture of Isaac Lab's Manager-Based design
- Create environment for quadruped locomotion
- Domain randomization config (friction, mass, push, actuator gains)
- Train with RSL-RL PPO, 4,096 parallel envs, ~20 minutes on RTX 4090
- Evaluate and export policy for deployment
Complete pipeline:
Isaac Lab env config
→ Domain randomization setup
→ RSL-RL PPO training (4096 envs, ~1500 iters)
→ TensorBoard monitoring
→ Evaluate (>90% success)
→ Export TorchScript
→ Deploy (ROS 2 / Jetson / Isaac ROS)
Isaac Lab is currently the most powerful tool for large-scale robot RL training. Combined with good domain randomization, you can train policies that transfer zero-shot to real robots.
Related Articles
- Simulation for Robotics: MuJoCo vs Isaac Sim vs Gazebo -- Compare 3 leading simulators
- Getting Started with MuJoCo: Installation to First Robot Simulation -- Hands-on MuJoCo tutorial
- Sim-to-Real Transfer: Train in Simulation, Run in Reality -- Domain randomization and sim-to-real best practices
- RL for Bipedal Walking -- Reinforcement learning for bipedal robots
- Edge AI with NVIDIA Jetson -- Deploy models to edge devices