← Back to Blog
simulationsimulationisaac-simrl

NVIDIA Isaac Lab: GPU-Accelerated RL Training from Zero

Setup Isaac Lab, train locomotion policies with 4,096 parallel environments and domain randomization on GPU.

Nguyen Anh Tuan1 tháng 4, 202611 min read
NVIDIA Isaac Lab: GPU-Accelerated RL Training from Zero

Isaac Lab -- Unified Framework for Robot Learning

In Part 1 I introduced Isaac Sim/Lab, and in Part 2 I guided you through MuJoCo from zero. Now arrives the most exciting part: train locomotion policies with 4,096 parallel environments on GPU using Isaac Lab.

Isaac Lab (formerly Isaac Gym + Orbit) is NVIDIA's unified framework for robot learning. It combines:

Latest version: Isaac Lab 2.2 (GA at SIGGRAPH 2025), running on Isaac Sim 5.0.

System Requirements

Before starting, verify:

Requirement Minimum Recommended
GPU NVIDIA RTX 3070 (8GB) RTX 4080/4090 (16GB+)
VRAM 8 GB 16 GB+
RAM 32 GB 64 GB
OS Ubuntu 22.04 Ubuntu 22.04
Driver NVIDIA 535+ 550+
CUDA 12.1+ 12.4+
Disk 50 GB free 100 GB+ (Omniverse + assets)

Important: Isaac Lab does NOT support Windows native or macOS. Only Ubuntu is officially supported. WSL2 can run it but is not recommended for training.

Step 1: Install Isaac Sim and Isaac Lab

1a. Install Isaac Sim 5.0

# Method 1: Pip install (recommended for Isaac Lab)
pip install isaacsim==5.0.0 --extra-index-url https://pypi.nvidia.com

# Method 2: Omniverse Launcher
# Download from https://developer.nvidia.com/omniverse
# Install Isaac Sim from Launcher

1b. Clone and Install Isaac Lab

# Clone Isaac Lab
git clone https://github.com/isaac-sim/IsaacLab.git
cd IsaacLab

# Install Isaac Lab and dependencies
./isaaclab.sh --install
# Choose: rsl_rl (for locomotion training)

# Verify installation
./isaaclab.sh -p scripts/tutorials/00_sim/create_empty.py

If you see the Omniverse viewport appear with an empty scene, installation was successful.

1c. Verify GPU

"""
Check GPU and Isaac Sim are ready.
"""
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"VRAM: {torch.cuda.get_device_properties(0).total_mem / 1e9:.1f} GB")

# Test Isaac Sim import
from isaacsim import SimulationApp
app = SimulationApp({"headless": True})
print("Isaac Sim initialized successfully!")
app.close()

NVIDIA GPU cluster for robot learning training

Step 2: Understand Isaac Lab Architecture

Isaac Lab uses Manager-Based architecture:

ManagerBasedRLEnv
├── SceneManager         # Spawn robots, objects, lights
│   ├── ArticulationCfg  # Robot definition
│   ├── RigidObjectCfg   # Objects
│   └── SensorsCfg       # Cameras, LiDAR, contact
├── ObservationManager   # Create observation vector
├── ActionManager        # Process actions from policy
├── RewardManager        # Calculate reward signals
├── TerminationManager   # Check episode done
├── EventManager         # Domain randomization
└── CurriculumManager    # Gradually increase difficulty

Each manager is configured via Python dataclasses -- no XML/YAML needed.

Step 3: Create Locomotion Environment

This is the environment config for quadruped locomotion training. The robot will learn to walk on flat terrain with velocity commands.

"""
Isaac Lab Quadruped Locomotion Environment Config
File: envs/quadruped_flat_env_cfg.py
"""
import math
from dataclasses import MISSING

import isaaclab.sim as sim_utils
from isaaclab.assets import ArticulationCfg, AssetBaseCfg
from isaaclab.envs import ManagerBasedRLEnvCfg
from isaaclab.managers import (
    CurriculumTermCfg,
    EventTermCfg,
    ObservationGroupCfg,
    ObservationTermCfg,
    RewardTermCfg,
    SceneEntityCfg,
    TerminationTermCfg,
)
from isaaclab.scene import InteractiveSceneCfg
from isaaclab.terrains import TerrainImporterCfg
from isaaclab.utils import configclass
from isaaclab.utils.noise import AdditiveUniformNoiseCfg

import isaaclab_tasks.manager_based.locomotion.velocity.mdp as mdp


@configclass
class QuadrupedFlatSceneCfg(InteractiveSceneCfg):
    """Scene configuration -- ground plane + robot."""

    # Ground plane
    ground = AssetBaseCfg(
        prim_path="/World/defaultGroundPlane",
        spawn=sim_utils.GroundPlaneCfg(),
    )

    # Distant light
    light = AssetBaseCfg(
        prim_path="/World/Light",
        spawn=sim_utils.DistantLightCfg(
            intensity=3000.0,
            color=(0.75, 0.75, 0.75),
        ),
    )

    # Robot: ANYmal-D quadruped
    robot: ArticulationCfg = ArticulationCfg(
        prim_path="{ENV_REGEX_NS}/Robot",
        spawn=sim_utils.UsdFileCfg(
            usd_path=(
                "isaaclab_assets/Robots/ANYbotics/ANYmal-D/anymal_d.usd"
            ),
            activate_contact_sensors=True,
            rigid_props=sim_utils.RigidBodyPropertiesCfg(
                disable_gravity=False,
                retain_accelerations=False,
            ),
            articulation_props=sim_utils.ArticulationRootPropertiesCfg(
                enabled_self_collisions=False,
                solver_position_iteration_count=4,
                solver_velocity_iteration_count=4,
            ),
        ),
        init_state=ArticulationCfg.InitialStateCfg(
            pos=(0.0, 0.0, 0.6),
            joint_pos={
                ".*HAA": 0.0,     # Hip Abduction/Adduction
                ".*HFE": 0.4,     # Hip Flexion/Extension
                ".*KFE": -0.8,    # Knee Flexion/Extension
            },
        ),
    )


@configclass
class ObservationsCfg:
    """Observation space -- what the policy sees."""

    @configclass
    class PolicyCfg(ObservationGroupCfg):
        # Base velocity (3) -- linear velocity
        base_lin_vel = ObservationTermCfg(
            func=mdp.base_lin_vel,
            noise=AdditiveUniformNoiseCfg(n_min=-0.1, n_max=0.1),
        )
        # Base angular velocity (3)
        base_ang_vel = ObservationTermCfg(
            func=mdp.base_ang_vel,
            noise=AdditiveUniformNoiseCfg(n_min=-0.2, n_max=0.2),
        )
        # Projected gravity (3) -- orientation info
        projected_gravity = ObservationTermCfg(
            func=mdp.projected_gravity,
            noise=AdditiveUniformNoiseCfg(n_min=-0.05, n_max=0.05),
        )
        # Velocity command (3) -- target vx, vy, yaw_rate
        velocity_commands = ObservationTermCfg(
            func=mdp.generated_commands,
            params={"command_name": "base_velocity"},
        )
        # Joint positions (12)
        joint_pos = ObservationTermCfg(
            func=mdp.joint_pos_rel,
            noise=AdditiveUniformNoiseCfg(n_min=-0.01, n_max=0.01),
        )
        # Joint velocities (12)
        joint_vel = ObservationTermCfg(
            func=mdp.joint_vel_rel,
            noise=AdditiveUniformNoiseCfg(n_min=-1.5, n_max=1.5),
        )
        # Previous actions (12)
        actions = ObservationTermCfg(func=mdp.last_action)

    policy: PolicyCfg = PolicyCfg()


@configclass
class RewardsCfg:
    """Reward terms -- guide the policy."""

    # === Positive rewards ===
    # Track linear velocity command
    track_lin_vel_xy_exp = RewardTermCfg(
        func=mdp.track_lin_vel_xy_exp,
        weight=1.5,
        params={"command_name": "base_velocity", "std": math.exp(-0.25)},
    )
    # Track angular velocity command
    track_ang_vel_z_exp = RewardTermCfg(
        func=mdp.track_ang_vel_z_exp,
        weight=0.75,
        params={"command_name": "base_velocity", "std": math.exp(-0.25)},
    )

    # === Negative rewards (penalties) ===
    # Penalize z-axis linear velocity (bouncing)
    lin_vel_z_l2 = RewardTermCfg(
        func=mdp.lin_vel_z_l2, weight=-2.0
    )
    # Penalize xy angular velocity (rolling/pitching)
    ang_vel_xy_l2 = RewardTermCfg(
        func=mdp.ang_vel_xy_l2, weight=-0.05
    )
    # Penalize joint torques
    joint_torques_l2 = RewardTermCfg(
        func=mdp.joint_torques_l2, weight=-0.0001
    )
    # Penalize joint acceleration
    joint_acc_l2 = RewardTermCfg(
        func=mdp.joint_acc_l2, weight=-2.5e-7
    )
    # Penalize action rate (smooth actions)
    action_rate_l2 = RewardTermCfg(
        func=mdp.action_rate_l2, weight=-0.01
    )
    # Penalize undesired contacts (body touching ground)
    undesired_contacts = RewardTermCfg(
        func=mdp.undesired_contacts,
        weight=-1.0,
        params={
            "sensor_cfg": SceneEntityCfg(
                "contact_forces", body_names=".*THIGH"
            ),
            "threshold": 1.0,
        },
    )


@configclass
class TerminationsCfg:
    """When to end an episode."""

    time_out = TerminationTermCfg(
        func=mdp.time_out, time_out=True
    )
    base_contact = TerminationTermCfg(
        func=mdp.illegal_contact,
        params={
            "sensor_cfg": SceneEntityCfg(
                "contact_forces", body_names="base"
            ),
            "threshold": 1.0,
        },
    )

Step 4: Domain Randomization Config

This is the most important part for sim-to-real transfer. Isaac Lab has a built-in event system for domain randomization:

"""
Domain Randomization Events
File: envs/quadruped_flat_env_cfg.py (continued)
"""

@configclass
class EventsCfg:
    """Domain randomization -- key to sim-to-real transfer."""

    # === Reset events (each episode) ===

    # Random initial base position/orientation
    reset_base = EventTermCfg(
        func=mdp.reset_root_state_uniform,
        mode="reset",
        params={
            "pose_range": {
                "x": (-0.5, 0.5),
                "y": (-0.5, 0.5),
                "yaw": (-3.14, 3.14),
            },
            "velocity_range": {
                "x": (-0.5, 0.5),
                "y": (-0.5, 0.5),
                "z": (-0.5, 0.5),
            },
        },
    )

    # Random initial joint positions
    reset_joints = EventTermCfg(
        func=mdp.reset_joints_by_offset,
        mode="reset",
        params={
            "position_range": (-0.2, 0.2),
            "velocity_range": (-0.5, 0.5),
        },
    )

    # === Interval events (periodically within episode) ===

    # Random external push (test balance recovery)
    push_robot = EventTermCfg(
        func=mdp.push_by_setting_velocity,
        mode="interval",
        interval_range_s=(10.0, 15.0),
        params={
            "velocity_range": {
                "x": (-0.5, 0.5),
                "y": (-0.5, 0.5),
            },
        },
    )

    # === Startup events (once when creating env) ===

    # Randomize physics material (friction)
    physics_material = EventTermCfg(
        func=mdp.randomize_rigid_body_material,
        mode="startup",
        params={
            "asset_cfg": SceneEntityCfg("robot", body_names=".*"),
            "static_friction_range": (0.4, 1.2),
            "dynamic_friction_range": (0.4, 1.0),
            "restitution_range": (0.0, 0.4),
            "num_buckets": 64,
        },
    )

    # Randomize robot mass
    add_base_mass = EventTermCfg(
        func=mdp.randomize_rigid_body_mass,
        mode="startup",
        params={
            "asset_cfg": SceneEntityCfg("robot", body_names="base"),
            "mass_distribution_params": (-3.0, 3.0),
            "operation": "add",
        },
    )

    # Randomize joint properties
    randomize_actuator_gains = EventTermCfg(
        func=mdp.randomize_actuator_gains,
        mode="startup",
        params={
            "asset_cfg": SceneEntityCfg("robot", joint_names=".*"),
            "stiffness_distribution_params": (0.8, 1.2),
            "damping_distribution_params": (0.8, 1.2),
            "operation": "scale",
        },
    )

Step 5: Train with RSL-RL

RSL-RL Config

"""
RSL-RL PPO training config.
File: agents/rsl_rl_ppo_cfg.py
"""
from isaaclab_rl.rsl_rl import RslRlOnPolicyRunnerCfg, RslRlPpoCfg


QUADRUPED_PPO_CFG = RslRlOnPolicyRunnerCfg(
    num_steps_per_env=24,
    max_iterations=1500,
    save_interval=100,
    experiment_name="quadruped_flat",
    empirical_normalization=False,
    policy=RslRlPpoCfg(
        # Actor-Critic MLP
        actor_hidden_dims=[512, 256, 128],
        critic_hidden_dims=[512, 256, 128],
        activation="elu",
        # PPO hyperparameters
        init_noise_std=1.0,
        num_learning_epochs=5,
        num_mini_batches=4,
        clip_param=0.2,
        discount_factor=0.99,
        gae_lambda=0.95,
        learning_rate=1e-3,
        max_grad_norm=1.0,
        value_loss_coef=1.0,
        entropy_coef=0.01,
        # Schedule
        desired_kl=0.01,
        schedule="adaptive",
    ),
)

Launch training

# Train with 4096 parallel environments (headless mode)
cd IsaacLab

./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py \
    --task Isaac-Velocity-Flat-Anymal-D-v0 \
    --num_envs 4096 \
    --headless \
    --max_iterations 1500

# Performance benchmark:
# RTX 4090: ~90,000 FPS -> ~1500 iterations in ~20 minutes
# RTX 3080: ~50,000 FPS -> ~1500 iterations in ~35 minutes

Training output will look like:

[INFO] Starting training...
[INFO] Env: Isaac-Velocity-Flat-Anymal-D-v0 | Num envs: 4096
[INFO] Policy: MLP [512, 256, 128] | Activation: ELU
---
Iter    | Reward    | Ep. Len | FPS     | LR
0       | -12.34    | 200     | 89542   | 1.00e-03
100     | 2.15      | 350     | 91203   | 8.50e-04
200     | 8.73      | 500     | 90876   | 6.20e-04
500     | 15.42     | 800     | 91105   | 3.10e-04
1000    | 19.87     | 1000    | 90234   | 1.50e-04
1500    | 21.56     | 1000    | 91342   | 1.00e-04
---
[INFO] Training complete! Model saved to: logs/rsl_rl/quadruped_flat/

TensorBoard monitoring training progress of robot locomotion policy

Monitor with TensorBoard

# In a different terminal
tensorboard --logdir logs/rsl_rl/quadruped_flat/ --port 6006
# Open browser: http://localhost:6006

Important metrics to track:

Step 6: Evaluate and Visualize

# Play trained policy with rendering
./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/play.py \
    --task Isaac-Velocity-Flat-Anymal-D-v0 \
    --num_envs 32 \
    --checkpoint logs/rsl_rl/quadruped_flat/model_1500.pt

Or evaluate in Python:

"""
Evaluate trained policy and print statistics.
"""
import torch
import numpy as np
from isaaclab.envs import ManagerBasedRLEnv


def evaluate_policy(env, policy, num_episodes=100):
    """Run policy and calculate success metrics."""
    episode_rewards = []
    episode_lengths = []

    obs, _ = env.reset()
    current_rewards = torch.zeros(env.num_envs, device=env.device)
    current_lengths = torch.zeros(env.num_envs, device=env.device)

    while len(episode_rewards) < num_episodes:
        with torch.no_grad():
            actions = policy(obs)

        obs, rewards, dones, truncated, infos = env.step(actions)
        current_rewards += rewards
        current_lengths += 1

        # Collect finished episodes
        done_mask = dones | truncated
        if done_mask.any():
            done_indices = done_mask.nonzero(as_tuple=False).squeeze(-1)
            for idx in done_indices:
                episode_rewards.append(current_rewards[idx].item())
                episode_lengths.append(current_lengths[idx].item())
            current_rewards[done_indices] = 0
            current_lengths[done_indices] = 0

    # Statistics
    rewards_arr = np.array(episode_rewards[:num_episodes])
    lengths_arr = np.array(episode_lengths[:num_episodes])

    print(f"\n=== Evaluation ({num_episodes} episodes) ===")
    print(f"Reward: {rewards_arr.mean():.2f} +/- {rewards_arr.std():.2f}")
    print(f"Ep Length: {lengths_arr.mean():.0f} +/- {lengths_arr.std():.0f}")
    print(f"Max reward: {rewards_arr.max():.2f}")
    print(f"Min reward: {rewards_arr.min():.2f}")

    return rewards_arr, lengths_arr

Step 7: Export Policy for Deployment

"""
Export trained policy to TorchScript for deployment.
Can run on Jetson, robot controller, or ROS 2 node.
"""
import torch
import os


def export_policy(checkpoint_path, output_path, obs_dim=48, act_dim=12):
    """Export RSL-RL policy to TorchScript."""

    # Load checkpoint
    checkpoint = torch.load(checkpoint_path, map_location="cpu")
    print(f"Loaded checkpoint: {checkpoint_path}")

    # Create dummy input for tracing
    dummy_obs = torch.randn(1, obs_dim)

    # Load actor network
    from rsl_rl.modules import ActorCritic

    actor_critic = ActorCritic(
        num_obs=obs_dim,
        num_actions=act_dim,
        actor_hidden_dims=[512, 256, 128],
        critic_hidden_dims=[512, 256, 128],
        activation="elu",
    )
    actor_critic.load_state_dict(checkpoint["model_state_dict"])
    actor_critic.eval()

    # Export only actor (don't need critic for inference)
    class PolicyWrapper(torch.nn.Module):
        def __init__(self, actor):
            super().__init__()
            self.actor = actor

        def forward(self, obs):
            return self.actor(obs)

    wrapper = PolicyWrapper(actor_critic.actor)
    traced = torch.jit.trace(wrapper, dummy_obs)
    traced.save(output_path)

    # Verify
    loaded = torch.jit.load(output_path)
    test_out = loaded(dummy_obs)
    print(f"Exported to: {output_path}")
    print(f"Input shape: {dummy_obs.shape}")
    print(f"Output shape: {test_out.shape}")
    print(f"File size: {os.path.getsize(output_path) / 1024:.1f} KB")


# Export
export_policy(
    checkpoint_path="logs/rsl_rl/quadruped_flat/model_1500.pt",
    output_path="quadruped_policy.pt",
    obs_dim=48,
    act_dim=12,
)

Exported policy can be deployed via:

Tips and Best Practices

1. VRAM Management

# Check VRAM usage during training
nvidia-smi --query-gpu=memory.used,memory.total --format=csv -l 5

# Reduce num_envs if running out of VRAM:
# RTX 3070 (8GB):  --num_envs 2048
# RTX 3080 (10GB): --num_envs 3072
# RTX 4080 (16GB): --num_envs 4096
# RTX 4090 (24GB): --num_envs 8192

2. Reward Tuning

3. Domain Randomization

4. Debug

# Run with rendering to see what robot does
./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py \
    --task Isaac-Velocity-Flat-Anymal-D-v0 \
    --num_envs 16

# Don't use --headless -> viewport will appear
# Reduce num_envs to observe individual robots

Summary

In this article, you learned:

  1. Install Isaac Sim 5.0 + Isaac Lab 2.2
  2. Architecture of Isaac Lab's Manager-Based design
  3. Create environment for quadruped locomotion
  4. Domain randomization config (friction, mass, push, actuator gains)
  5. Train with RSL-RL PPO, 4,096 parallel envs, ~20 minutes on RTX 4090
  6. Evaluate and export policy for deployment

Complete pipeline:

Isaac Lab env config
  → Domain randomization setup
    → RSL-RL PPO training (4096 envs, ~1500 iters)
      → TensorBoard monitoring
        → Evaluate (>90% success)
          → Export TorchScript
            → Deploy (ROS 2 / Jetson / Isaac ROS)

Isaac Lab is currently the most powerful tool for large-scale robot RL training. Combined with good domain randomization, you can train policies that transfer zero-shot to real robots.


Related Articles

Related Posts

Deep DiveDigital Twins và ROS 2: Simulation trong sản xuất
simulationros2digital-twinPart 6

Digital Twins và ROS 2: Simulation trong sản xuất

Ứng dụng simulation trong công nghiệp — digital twins, ROS 2 + Gazebo/Isaac integration cho nhà máy thông minh.

3/4/202611 min read
TutorialSim-to-Real Pipeline: Từ training đến robot thật
simulationsim2realtutorialPart 5

Sim-to-Real Pipeline: Từ training đến robot thật

End-to-end guide: train policy trong sim, evaluate, domain randomization, deploy lên robot thật và iterate.

2/4/202615 min read
TutorialBắt đầu với MuJoCo: Cài đặt đến mô phỏng robot đầu tiên
simulationmujocotutorialPart 2

Bắt đầu với MuJoCo: Cài đặt đến mô phỏng robot đầu tiên

Hands-on tutorial MuJoCo — cài đặt, tạo MJCF model, simulate robot arm và visualize kết quả với Python.

30/3/202611 min read