← Quay lại Blog
simulationsimulationisaac-simrl

NVIDIA Isaac Lab: GPU-accelerated RL training từ zero

Setup Isaac Lab, train locomotion policy với 4096 parallel environments và domain randomization trên GPU.

Nguyễn Anh Tuấn1 tháng 4, 202611 phút đọc
NVIDIA Isaac Lab: GPU-accelerated RL training từ zero

Isaac Lab -- Framework thống nhất cho robot learning

Trong Part 1 mình đã giới thiệu Isaac Sim/Lab, và trong Part 2 mình hướng dẫn MuJoCo từ zero. Bây giờ đến phần thú vị nhất: train locomotion policy với 4,096 parallel environments trên GPU bằng Isaac Lab.

Isaac Lab (trước đây là Isaac Gym + Orbit) là unified framework của NVIDIA cho robot learning. Nó kết hợp:

Version mới nhất: Isaac Lab 2.2 (GA tại SIGGRAPH 2025), chạy trên Isaac Sim 5.0.

Yêu cầu hệ thống

Trước khi bắt đầu, hãy kiểm tra:

Yêu cầu Tối thiểu Khuyến nghị
GPU NVIDIA RTX 3070 (8GB) RTX 4080/4090 (16GB+)
VRAM 8 GB 16 GB+
RAM 32 GB 64 GB
OS Ubuntu 22.04 Ubuntu 22.04
Driver NVIDIA 535+ 550+
CUDA 12.1+ 12.4+
Disk 50 GB free 100 GB+ (Omniverse + assets)

Quan trọng: Isaac Lab không hỗ trợ Windows native hoặc macOS. Chỉ Ubuntu được support chính thức. WSL2 có thể chạy nhưng không khuyến nghị cho training.

Bước 1: Cài đặt Isaac Sim và Isaac Lab

1a. Cài đặt Isaac Sim 5.0

# Cách 1: Pip install (khuyến nghị cho Isaac Lab)
pip install isaacsim==5.0.0 --extra-index-url https://pypi.nvidia.com

# Cách 2: Omniverse Launcher
# Download từ https://developer.nvidia.com/omniverse
# Install Isaac Sim từ Launcher

1b. Clone và cài đặt Isaac Lab

# Clone Isaac Lab
git clone https://github.com/isaac-sim/IsaacLab.git
cd IsaacLab

# Install Isaac Lab và dependencies
./isaaclab.sh --install
# Chọn: rsl_rl (cho locomotion training)

# Verify installation
./isaaclab.sh -p scripts/tutorials/00_sim/create_empty.py

Nếu thấy Omniverse viewport hiện lên với empty scene, cài đặt thành công.

1c. Verify GPU

"""
Kiểm tra GPU và Isaac Sim sẵn sàng.
"""
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"VRAM: {torch.cuda.get_device_properties(0).total_mem / 1e9:.1f} GB")

# Test Isaac Sim import
from isaacsim import SimulationApp
app = SimulationApp({"headless": True})
print("Isaac Sim initialized successfully!")
app.close()

NVIDIA GPU cluster cho robot learning training

Bước 2: Hiểu kiến trúc Isaac Lab

Isaac Lab dùng Manager-Based architecture:

ManagerBasedRLEnv
├── SceneManager         # Spawn robots, objects, lights
│   ├── ArticulationCfg  # Robot definition
│   ├── RigidObjectCfg   # Objects
│   └── SensorsCfg       # Cameras, LiDAR, contact
├── ObservationManager   # Tạo observation vector
├── ActionManager        # Process actions từ policy
├── RewardManager        # Tính reward signals
├── TerminationManager   # Check episode done
├── EventManager         # Domain randomization
└── CurriculumManager    # Tăng độ khó dần

Mỗi manager được config qua Python dataclasses -- không cần XML/YAML.

Bước 3: Tạo Locomotion Environment

Đây là environment config cho quadruped locomotion training. Robot sẽ học đi trên mặt phẳng với velocity command.

"""
Isaac Lab Quadruped Locomotion Environment Config
File: envs/quadruped_flat_env_cfg.py
"""
import math
from dataclasses import MISSING

import isaaclab.sim as sim_utils
from isaaclab.assets import ArticulationCfg, AssetBaseCfg
from isaaclab.envs import ManagerBasedRLEnvCfg
from isaaclab.managers import (
    CurriculumTermCfg,
    EventTermCfg,
    ObservationGroupCfg,
    ObservationTermCfg,
    RewardTermCfg,
    SceneEntityCfg,
    TerminationTermCfg,
)
from isaaclab.scene import InteractiveSceneCfg
from isaaclab.terrains import TerrainImporterCfg
from isaaclab.utils import configclass
from isaaclab.utils.noise import AdditiveUniformNoiseCfg

import isaaclab_tasks.manager_based.locomotion.velocity.mdp as mdp


@configclass
class QuadrupedFlatSceneCfg(InteractiveSceneCfg):
    """Scene configuration -- ground plane + robot."""

    # Ground plane
    ground = AssetBaseCfg(
        prim_path="/World/defaultGroundPlane",
        spawn=sim_utils.GroundPlaneCfg(),
    )

    # Distant light
    light = AssetBaseCfg(
        prim_path="/World/Light",
        spawn=sim_utils.DistantLightCfg(
            intensity=3000.0,
            color=(0.75, 0.75, 0.75),
        ),
    )

    # Robot: ANYmal-D quadruped
    robot: ArticulationCfg = ArticulationCfg(
        prim_path="{ENV_REGEX_NS}/Robot",
        spawn=sim_utils.UsdFileCfg(
            usd_path=(
                "isaaclab_assets/Robots/ANYbotics/ANYmal-D/anymal_d.usd"
            ),
            activate_contact_sensors=True,
            rigid_props=sim_utils.RigidBodyPropertiesCfg(
                disable_gravity=False,
                retain_accelerations=False,
            ),
            articulation_props=sim_utils.ArticulationRootPropertiesCfg(
                enabled_self_collisions=False,
                solver_position_iteration_count=4,
                solver_velocity_iteration_count=4,
            ),
        ),
        init_state=ArticulationCfg.InitialStateCfg(
            pos=(0.0, 0.0, 0.6),
            joint_pos={
                ".*HAA": 0.0,     # Hip Abduction/Adduction
                ".*HFE": 0.4,     # Hip Flexion/Extension
                ".*KFE": -0.8,    # Knee Flexion/Extension
            },
        ),
    )


@configclass
class ObservationsCfg:
    """Observation space -- what the policy sees."""

    @configclass
    class PolicyCfg(ObservationGroupCfg):
        # Base velocity (3) -- linear velocity
        base_lin_vel = ObservationTermCfg(
            func=mdp.base_lin_vel,
            noise=AdditiveUniformNoiseCfg(n_min=-0.1, n_max=0.1),
        )
        # Base angular velocity (3)
        base_ang_vel = ObservationTermCfg(
            func=mdp.base_ang_vel,
            noise=AdditiveUniformNoiseCfg(n_min=-0.2, n_max=0.2),
        )
        # Projected gravity (3) -- orientation info
        projected_gravity = ObservationTermCfg(
            func=mdp.projected_gravity,
            noise=AdditiveUniformNoiseCfg(n_min=-0.05, n_max=0.05),
        )
        # Velocity command (3) -- target vx, vy, yaw_rate
        velocity_commands = ObservationTermCfg(
            func=mdp.generated_commands,
            params={"command_name": "base_velocity"},
        )
        # Joint positions (12)
        joint_pos = ObservationTermCfg(
            func=mdp.joint_pos_rel,
            noise=AdditiveUniformNoiseCfg(n_min=-0.01, n_max=0.01),
        )
        # Joint velocities (12)
        joint_vel = ObservationTermCfg(
            func=mdp.joint_vel_rel,
            noise=AdditiveUniformNoiseCfg(n_min=-1.5, n_max=1.5),
        )
        # Previous actions (12)
        actions = ObservationTermCfg(func=mdp.last_action)

    policy: PolicyCfg = PolicyCfg()


@configclass
class RewardsCfg:
    """Reward terms -- guide the policy."""

    # === Positive rewards ===
    # Track linear velocity command
    track_lin_vel_xy_exp = RewardTermCfg(
        func=mdp.track_lin_vel_xy_exp,
        weight=1.5,
        params={"command_name": "base_velocity", "std": math.exp(-0.25)},
    )
    # Track angular velocity command
    track_ang_vel_z_exp = RewardTermCfg(
        func=mdp.track_ang_vel_z_exp,
        weight=0.75,
        params={"command_name": "base_velocity", "std": math.exp(-0.25)},
    )

    # === Negative rewards (penalties) ===
    # Penalize z-axis linear velocity (bouncing)
    lin_vel_z_l2 = RewardTermCfg(
        func=mdp.lin_vel_z_l2, weight=-2.0
    )
    # Penalize xy angular velocity (rolling/pitching)
    ang_vel_xy_l2 = RewardTermCfg(
        func=mdp.ang_vel_xy_l2, weight=-0.05
    )
    # Penalize joint torques
    joint_torques_l2 = RewardTermCfg(
        func=mdp.joint_torques_l2, weight=-0.0001
    )
    # Penalize joint acceleration
    joint_acc_l2 = RewardTermCfg(
        func=mdp.joint_acc_l2, weight=-2.5e-7
    )
    # Penalize action rate (smooth actions)
    action_rate_l2 = RewardTermCfg(
        func=mdp.action_rate_l2, weight=-0.01
    )
    # Penalize undesired contacts (body touching ground)
    undesired_contacts = RewardTermCfg(
        func=mdp.undesired_contacts,
        weight=-1.0,
        params={
            "sensor_cfg": SceneEntityCfg(
                "contact_forces", body_names=".*THIGH"
            ),
            "threshold": 1.0,
        },
    )


@configclass
class TerminationsCfg:
    """When to end an episode."""

    time_out = TerminationTermCfg(
        func=mdp.time_out, time_out=True
    )
    base_contact = TerminationTermCfg(
        func=mdp.illegal_contact,
        params={
            "sensor_cfg": SceneEntityCfg(
                "contact_forces", body_names="base"
            ),
            "threshold": 1.0,
        },
    )

Bước 4: Domain Randomization Config

Đây là phần quan trọng nhất cho sim-to-real transfer. Isaac Lab có built-in event system cho domain randomization:

"""
Domain Randomization Events
File: envs/quadruped_flat_env_cfg.py (tiếp tục)
"""

@configclass
class EventsCfg:
    """Domain randomization -- chìa khóa cho sim-to-real transfer."""

    # === Reset events (mỗi episode) ===

    # Random initial base position/orientation
    reset_base = EventTermCfg(
        func=mdp.reset_root_state_uniform,
        mode="reset",
        params={
            "pose_range": {
                "x": (-0.5, 0.5),
                "y": (-0.5, 0.5),
                "yaw": (-3.14, 3.14),
            },
            "velocity_range": {
                "x": (-0.5, 0.5),
                "y": (-0.5, 0.5),
                "z": (-0.5, 0.5),
            },
        },
    )

    # Random initial joint positions
    reset_joints = EventTermCfg(
        func=mdp.reset_joints_by_offset,
        mode="reset",
        params={
            "position_range": (-0.2, 0.2),
            "velocity_range": (-0.5, 0.5),
        },
    )

    # === Interval events (định kỳ trong episode) ===

    # Random external push (test balance recovery)
    push_robot = EventTermCfg(
        func=mdp.push_by_setting_velocity,
        mode="interval",
        interval_range_s=(10.0, 15.0),
        params={
            "velocity_range": {
                "x": (-0.5, 0.5),
                "y": (-0.5, 0.5),
            },
        },
    )

    # === Startup events (1 lần khi tạo env) ===

    # Randomize physics material (friction)
    physics_material = EventTermCfg(
        func=mdp.randomize_rigid_body_material,
        mode="startup",
        params={
            "asset_cfg": SceneEntityCfg("robot", body_names=".*"),
            "static_friction_range": (0.4, 1.2),
            "dynamic_friction_range": (0.4, 1.0),
            "restitution_range": (0.0, 0.4),
            "num_buckets": 64,
        },
    )

    # Randomize robot mass
    add_base_mass = EventTermCfg(
        func=mdp.randomize_rigid_body_mass,
        mode="startup",
        params={
            "asset_cfg": SceneEntityCfg("robot", body_names="base"),
            "mass_distribution_params": (-3.0, 3.0),
            "operation": "add",
        },
    )

    # Randomize joint properties
    randomize_actuator_gains = EventTermCfg(
        func=mdp.randomize_actuator_gains,
        mode="startup",
        params={
            "asset_cfg": SceneEntityCfg("robot", joint_names=".*"),
            "stiffness_distribution_params": (0.8, 1.2),
            "damping_distribution_params": (0.8, 1.2),
            "operation": "scale",
        },
    )

Bước 5: Train với RSL-RL

RSL-RL Config

"""
RSL-RL PPO training config.
File: agents/rsl_rl_ppo_cfg.py
"""
from isaaclab_rl.rsl_rl import RslRlOnPolicyRunnerCfg, RslRlPpoCfg


QUADRUPED_PPO_CFG = RslRlOnPolicyRunnerCfg(
    num_steps_per_env=24,
    max_iterations=1500,
    save_interval=100,
    experiment_name="quadruped_flat",
    empirical_normalization=False,
    policy=RslRlPpoCfg(
        # Actor-Critic MLP
        actor_hidden_dims=[512, 256, 128],
        critic_hidden_dims=[512, 256, 128],
        activation="elu",
        # PPO hyperparameters
        init_noise_std=1.0,
        num_learning_epochs=5,
        num_mini_batches=4,
        clip_param=0.2,
        discount_factor=0.99,
        gae_lambda=0.95,
        learning_rate=1e-3,
        max_grad_norm=1.0,
        value_loss_coef=1.0,
        entropy_coef=0.01,
        # Schedule
        desired_kl=0.01,
        schedule="adaptive",
    ),
)

Launch training

# Train với 4096 parallel environments (headless mode)
cd IsaacLab

./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py \
    --task Isaac-Velocity-Flat-Anymal-D-v0 \
    --num_envs 4096 \
    --headless \
    --max_iterations 1500

# Performance benchmark:
# RTX 4090: ~90,000 FPS -> ~1500 iterations in ~20 minutes
# RTX 3080: ~50,000 FPS -> ~1500 iterations in ~35 minutes

Training output sẽ giống như:

[INFO] Starting training...
[INFO] Env: Isaac-Velocity-Flat-Anymal-D-v0 | Num envs: 4096
[INFO] Policy: MLP [512, 256, 128] | Activation: ELU
---
Iter    | Reward    | Ep. Len | FPS     | LR
0       | -12.34    | 200     | 89542   | 1.00e-03
100     | 2.15      | 350     | 91203   | 8.50e-04
200     | 8.73      | 500     | 90876   | 6.20e-04
500     | 15.42     | 800     | 91105   | 3.10e-04
1000    | 19.87     | 1000    | 90234   | 1.50e-04
1500    | 21.56     | 1000    | 91342   | 1.00e-04
---
[INFO] Training complete! Model saved to: logs/rsl_rl/quadruped_flat/

TensorBoard monitoring training progress của robot locomotion policy

Monitor với TensorBoard

# Trong terminal khác
tensorboard --logdir logs/rsl_rl/quadruped_flat/ --port 6006
# Mở browser: http://localhost:6006

Các metrics quan trọng cần theo dõi:

Bước 6: Evaluate và Visualize

# Play trained policy với rendering
./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/play.py \
    --task Isaac-Velocity-Flat-Anymal-D-v0 \
    --num_envs 32 \
    --checkpoint logs/rsl_rl/quadruped_flat/model_1500.pt

Hoặc evaluate trong Python:

"""
Evaluate trained policy và in statistics.
"""
import torch
import numpy as np
from isaaclab.envs import ManagerBasedRLEnv


def evaluate_policy(env, policy, num_episodes=100):
    """Run policy và tính success metrics."""
    episode_rewards = []
    episode_lengths = []

    obs, _ = env.reset()
    current_rewards = torch.zeros(env.num_envs, device=env.device)
    current_lengths = torch.zeros(env.num_envs, device=env.device)

    while len(episode_rewards) < num_episodes:
        with torch.no_grad():
            actions = policy(obs)

        obs, rewards, dones, truncated, infos = env.step(actions)
        current_rewards += rewards
        current_lengths += 1

        # Collect finished episodes
        done_mask = dones | truncated
        if done_mask.any():
            done_indices = done_mask.nonzero(as_tuple=False).squeeze(-1)
            for idx in done_indices:
                episode_rewards.append(current_rewards[idx].item())
                episode_lengths.append(current_lengths[idx].item())
            current_rewards[done_indices] = 0
            current_lengths[done_indices] = 0

    # Statistics
    rewards_arr = np.array(episode_rewards[:num_episodes])
    lengths_arr = np.array(episode_lengths[:num_episodes])

    print(f"\n=== Evaluation ({num_episodes} episodes) ===")
    print(f"Reward: {rewards_arr.mean():.2f} +/- {rewards_arr.std():.2f}")
    print(f"Ep Length: {lengths_arr.mean():.0f} +/- {lengths_arr.std():.0f}")
    print(f"Max reward: {rewards_arr.max():.2f}")
    print(f"Min reward: {rewards_arr.min():.2f}")

    return rewards_arr, lengths_arr

Bước 7: Export policy cho deployment

"""
Export trained policy sang TorchScript cho deployment.
Có thể chạy trên Jetson, robot controller, hoặc ROS 2 node.
"""
import torch


def export_policy(checkpoint_path, output_path, obs_dim=48, act_dim=12):
    """Export RSL-RL policy sang TorchScript."""

    # Load checkpoint
    checkpoint = torch.load(checkpoint_path, map_location="cpu")
    print(f"Loaded checkpoint: {checkpoint_path}")

    # Tạo dummy input để trace
    dummy_obs = torch.randn(1, obs_dim)

    # Load actor network
    from rsl_rl.modules import ActorCritic

    actor_critic = ActorCritic(
        num_obs=obs_dim,
        num_actions=act_dim,
        actor_hidden_dims=[512, 256, 128],
        critic_hidden_dims=[512, 256, 128],
        activation="elu",
    )
    actor_critic.load_state_dict(checkpoint["model_state_dict"])
    actor_critic.eval()

    # Export chỉ actor (không cần critic cho inference)
    class PolicyWrapper(torch.nn.Module):
        def __init__(self, actor):
            super().__init__()
            self.actor = actor

        def forward(self, obs):
            return self.actor(obs)

    wrapper = PolicyWrapper(actor_critic.actor)
    traced = torch.jit.trace(wrapper, dummy_obs)
    traced.save(output_path)

    # Verify
    loaded = torch.jit.load(output_path)
    test_out = loaded(dummy_obs)
    print(f"Exported to: {output_path}")
    print(f"Input shape: {dummy_obs.shape}")
    print(f"Output shape: {test_out.shape}")
    print(f"File size: {os.path.getsize(output_path) / 1024:.1f} KB")


# Export
export_policy(
    checkpoint_path="logs/rsl_rl/quadruped_flat/model_1500.pt",
    output_path="quadruped_policy.pt",
    obs_dim=48,
    act_dim=12,
)

Exported policy có thể deploy qua:

Tips và Best Practices

1. VRAM management

# Kiểm tra VRAM usage trong training
nvidia-smi --query-gpu=memory.used,memory.total --format=csv -l 5

# Giảm num_envs nếu hết VRAM:
# RTX 3070 (8GB):  --num_envs 2048
# RTX 3080 (10GB): --num_envs 3072
# RTX 4080 (16GB): --num_envs 4096
# RTX 4090 (24GB): --num_envs 8192

2. Reward tuning

3. Domain randomization

4. Debug

# Chạy với rendering để xem robot làm gì
./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py \
    --task Isaac-Velocity-Flat-Anymal-D-v0 \
    --num_envs 16

# Không dùng --headless -> sẽ hiện viewport
# Giảm num_envs để có thể quan sát từng robot

Tổng kết

Trong bài này, bạn đã học:

  1. Cài đặt Isaac Sim 5.0 + Isaac Lab 2.2
  2. Kiến trúc Manager-Based của Isaac Lab
  3. Tạo environment cho quadruped locomotion
  4. Domain randomization config (friction, mass, push, actuator gains)
  5. Train với RSL-RL PPO, 4,096 parallel envs, ~20 phút trên RTX 4090
  6. Evaluateexport policy cho deployment

Pipeline đầy đủ:

Isaac Lab env config
  → Domain randomization setup
    → RSL-RL PPO training (4096 envs, ~1500 iters)
      → TensorBoard monitoring
        → Evaluate (>90% success)
          → Export TorchScript
            → Deploy (ROS 2 / Jetson / Isaac ROS)

Isaac Lab là tool mạnh nhất hiện nay cho large-scale robot RL training. Kết hợp với domain randomization tốt, bạn có thể train policies transfer zero-shot sang robot thật.


Bài viết liên quan

Bài viết liên quan

Deep DiveDigital Twins và ROS 2: Simulation trong sản xuất
simulationros2digital-twinPhần 6

Digital Twins và ROS 2: Simulation trong sản xuất

Ứng dụng simulation trong công nghiệp — digital twins, ROS 2 + Gazebo/Isaac integration cho nhà máy thông minh.

3/4/202611 phút đọc
TutorialSim-to-Real Pipeline: Từ training đến robot thật
simulationsim2realtutorialPhần 5

Sim-to-Real Pipeline: Từ training đến robot thật

End-to-end guide: train policy trong sim, evaluate, domain randomization, deploy lên robot thật và iterate.

2/4/202615 phút đọc
TutorialBắt đầu với MuJoCo: Cài đặt đến mô phỏng robot đầu tiên
simulationmujocotutorialPhần 2

Bắt đầu với MuJoCo: Cài đặt đến mô phỏng robot đầu tiên

Hands-on tutorial MuJoCo — cài đặt, tạo MJCF model, simulate robot arm và visualize kết quả với Python.

30/3/202611 phút đọc