Isaac Lab -- Framework thống nhất cho robot learning
Trong Part 1 mình đã giới thiệu Isaac Sim/Lab, và trong Part 2 mình hướng dẫn MuJoCo từ zero. Bây giờ đến phần thú vị nhất: train locomotion policy với 4,096 parallel environments trên GPU bằng Isaac Lab.
Isaac Lab (trước đây là Isaac Gym + Orbit) là unified framework của NVIDIA cho robot learning. Nó kết hợp:
- PhysX 5 (GPU-accelerated physics)
- RTX rendering (photorealistic)
- Domain randomization (built-in)
- RL training (RSL-RL, rl_games, Stable-Baselines3)
Version mới nhất: Isaac Lab 2.2 (GA tại SIGGRAPH 2025), chạy trên Isaac Sim 5.0.
Yêu cầu hệ thống
Trước khi bắt đầu, hãy kiểm tra:
| Yêu cầu | Tối thiểu | Khuyến nghị |
|---|---|---|
| GPU | NVIDIA RTX 3070 (8GB) | RTX 4080/4090 (16GB+) |
| VRAM | 8 GB | 16 GB+ |
| RAM | 32 GB | 64 GB |
| OS | Ubuntu 22.04 | Ubuntu 22.04 |
| Driver | NVIDIA 535+ | 550+ |
| CUDA | 12.1+ | 12.4+ |
| Disk | 50 GB free | 100 GB+ (Omniverse + assets) |
Quan trọng: Isaac Lab không hỗ trợ Windows native hoặc macOS. Chỉ Ubuntu được support chính thức. WSL2 có thể chạy nhưng không khuyến nghị cho training.
Bước 1: Cài đặt Isaac Sim và Isaac Lab
1a. Cài đặt Isaac Sim 5.0
# Cách 1: Pip install (khuyến nghị cho Isaac Lab)
pip install isaacsim==5.0.0 --extra-index-url https://pypi.nvidia.com
# Cách 2: Omniverse Launcher
# Download từ https://developer.nvidia.com/omniverse
# Install Isaac Sim từ Launcher
1b. Clone và cài đặt Isaac Lab
# Clone Isaac Lab
git clone https://github.com/isaac-sim/IsaacLab.git
cd IsaacLab
# Install Isaac Lab và dependencies
./isaaclab.sh --install
# Chọn: rsl_rl (cho locomotion training)
# Verify installation
./isaaclab.sh -p scripts/tutorials/00_sim/create_empty.py
Nếu thấy Omniverse viewport hiện lên với empty scene, cài đặt thành công.
1c. Verify GPU
"""
Kiểm tra GPU và Isaac Sim sẵn sàng.
"""
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"VRAM: {torch.cuda.get_device_properties(0).total_mem / 1e9:.1f} GB")
# Test Isaac Sim import
from isaacsim import SimulationApp
app = SimulationApp({"headless": True})
print("Isaac Sim initialized successfully!")
app.close()
Bước 2: Hiểu kiến trúc Isaac Lab
Isaac Lab dùng Manager-Based architecture:
ManagerBasedRLEnv
├── SceneManager # Spawn robots, objects, lights
│ ├── ArticulationCfg # Robot definition
│ ├── RigidObjectCfg # Objects
│ └── SensorsCfg # Cameras, LiDAR, contact
├── ObservationManager # Tạo observation vector
├── ActionManager # Process actions từ policy
├── RewardManager # Tính reward signals
├── TerminationManager # Check episode done
├── EventManager # Domain randomization
└── CurriculumManager # Tăng độ khó dần
Mỗi manager được config qua Python dataclasses -- không cần XML/YAML.
Bước 3: Tạo Locomotion Environment
Đây là environment config cho quadruped locomotion training. Robot sẽ học đi trên mặt phẳng với velocity command.
"""
Isaac Lab Quadruped Locomotion Environment Config
File: envs/quadruped_flat_env_cfg.py
"""
import math
from dataclasses import MISSING
import isaaclab.sim as sim_utils
from isaaclab.assets import ArticulationCfg, AssetBaseCfg
from isaaclab.envs import ManagerBasedRLEnvCfg
from isaaclab.managers import (
CurriculumTermCfg,
EventTermCfg,
ObservationGroupCfg,
ObservationTermCfg,
RewardTermCfg,
SceneEntityCfg,
TerminationTermCfg,
)
from isaaclab.scene import InteractiveSceneCfg
from isaaclab.terrains import TerrainImporterCfg
from isaaclab.utils import configclass
from isaaclab.utils.noise import AdditiveUniformNoiseCfg
import isaaclab_tasks.manager_based.locomotion.velocity.mdp as mdp
@configclass
class QuadrupedFlatSceneCfg(InteractiveSceneCfg):
"""Scene configuration -- ground plane + robot."""
# Ground plane
ground = AssetBaseCfg(
prim_path="/World/defaultGroundPlane",
spawn=sim_utils.GroundPlaneCfg(),
)
# Distant light
light = AssetBaseCfg(
prim_path="/World/Light",
spawn=sim_utils.DistantLightCfg(
intensity=3000.0,
color=(0.75, 0.75, 0.75),
),
)
# Robot: ANYmal-D quadruped
robot: ArticulationCfg = ArticulationCfg(
prim_path="{ENV_REGEX_NS}/Robot",
spawn=sim_utils.UsdFileCfg(
usd_path=(
"isaaclab_assets/Robots/ANYbotics/ANYmal-D/anymal_d.usd"
),
activate_contact_sensors=True,
rigid_props=sim_utils.RigidBodyPropertiesCfg(
disable_gravity=False,
retain_accelerations=False,
),
articulation_props=sim_utils.ArticulationRootPropertiesCfg(
enabled_self_collisions=False,
solver_position_iteration_count=4,
solver_velocity_iteration_count=4,
),
),
init_state=ArticulationCfg.InitialStateCfg(
pos=(0.0, 0.0, 0.6),
joint_pos={
".*HAA": 0.0, # Hip Abduction/Adduction
".*HFE": 0.4, # Hip Flexion/Extension
".*KFE": -0.8, # Knee Flexion/Extension
},
),
)
@configclass
class ObservationsCfg:
"""Observation space -- what the policy sees."""
@configclass
class PolicyCfg(ObservationGroupCfg):
# Base velocity (3) -- linear velocity
base_lin_vel = ObservationTermCfg(
func=mdp.base_lin_vel,
noise=AdditiveUniformNoiseCfg(n_min=-0.1, n_max=0.1),
)
# Base angular velocity (3)
base_ang_vel = ObservationTermCfg(
func=mdp.base_ang_vel,
noise=AdditiveUniformNoiseCfg(n_min=-0.2, n_max=0.2),
)
# Projected gravity (3) -- orientation info
projected_gravity = ObservationTermCfg(
func=mdp.projected_gravity,
noise=AdditiveUniformNoiseCfg(n_min=-0.05, n_max=0.05),
)
# Velocity command (3) -- target vx, vy, yaw_rate
velocity_commands = ObservationTermCfg(
func=mdp.generated_commands,
params={"command_name": "base_velocity"},
)
# Joint positions (12)
joint_pos = ObservationTermCfg(
func=mdp.joint_pos_rel,
noise=AdditiveUniformNoiseCfg(n_min=-0.01, n_max=0.01),
)
# Joint velocities (12)
joint_vel = ObservationTermCfg(
func=mdp.joint_vel_rel,
noise=AdditiveUniformNoiseCfg(n_min=-1.5, n_max=1.5),
)
# Previous actions (12)
actions = ObservationTermCfg(func=mdp.last_action)
policy: PolicyCfg = PolicyCfg()
@configclass
class RewardsCfg:
"""Reward terms -- guide the policy."""
# === Positive rewards ===
# Track linear velocity command
track_lin_vel_xy_exp = RewardTermCfg(
func=mdp.track_lin_vel_xy_exp,
weight=1.5,
params={"command_name": "base_velocity", "std": math.exp(-0.25)},
)
# Track angular velocity command
track_ang_vel_z_exp = RewardTermCfg(
func=mdp.track_ang_vel_z_exp,
weight=0.75,
params={"command_name": "base_velocity", "std": math.exp(-0.25)},
)
# === Negative rewards (penalties) ===
# Penalize z-axis linear velocity (bouncing)
lin_vel_z_l2 = RewardTermCfg(
func=mdp.lin_vel_z_l2, weight=-2.0
)
# Penalize xy angular velocity (rolling/pitching)
ang_vel_xy_l2 = RewardTermCfg(
func=mdp.ang_vel_xy_l2, weight=-0.05
)
# Penalize joint torques
joint_torques_l2 = RewardTermCfg(
func=mdp.joint_torques_l2, weight=-0.0001
)
# Penalize joint acceleration
joint_acc_l2 = RewardTermCfg(
func=mdp.joint_acc_l2, weight=-2.5e-7
)
# Penalize action rate (smooth actions)
action_rate_l2 = RewardTermCfg(
func=mdp.action_rate_l2, weight=-0.01
)
# Penalize undesired contacts (body touching ground)
undesired_contacts = RewardTermCfg(
func=mdp.undesired_contacts,
weight=-1.0,
params={
"sensor_cfg": SceneEntityCfg(
"contact_forces", body_names=".*THIGH"
),
"threshold": 1.0,
},
)
@configclass
class TerminationsCfg:
"""When to end an episode."""
time_out = TerminationTermCfg(
func=mdp.time_out, time_out=True
)
base_contact = TerminationTermCfg(
func=mdp.illegal_contact,
params={
"sensor_cfg": SceneEntityCfg(
"contact_forces", body_names="base"
),
"threshold": 1.0,
},
)
Bước 4: Domain Randomization Config
Đây là phần quan trọng nhất cho sim-to-real transfer. Isaac Lab có built-in event system cho domain randomization:
"""
Domain Randomization Events
File: envs/quadruped_flat_env_cfg.py (tiếp tục)
"""
@configclass
class EventsCfg:
"""Domain randomization -- chìa khóa cho sim-to-real transfer."""
# === Reset events (mỗi episode) ===
# Random initial base position/orientation
reset_base = EventTermCfg(
func=mdp.reset_root_state_uniform,
mode="reset",
params={
"pose_range": {
"x": (-0.5, 0.5),
"y": (-0.5, 0.5),
"yaw": (-3.14, 3.14),
},
"velocity_range": {
"x": (-0.5, 0.5),
"y": (-0.5, 0.5),
"z": (-0.5, 0.5),
},
},
)
# Random initial joint positions
reset_joints = EventTermCfg(
func=mdp.reset_joints_by_offset,
mode="reset",
params={
"position_range": (-0.2, 0.2),
"velocity_range": (-0.5, 0.5),
},
)
# === Interval events (định kỳ trong episode) ===
# Random external push (test balance recovery)
push_robot = EventTermCfg(
func=mdp.push_by_setting_velocity,
mode="interval",
interval_range_s=(10.0, 15.0),
params={
"velocity_range": {
"x": (-0.5, 0.5),
"y": (-0.5, 0.5),
},
},
)
# === Startup events (1 lần khi tạo env) ===
# Randomize physics material (friction)
physics_material = EventTermCfg(
func=mdp.randomize_rigid_body_material,
mode="startup",
params={
"asset_cfg": SceneEntityCfg("robot", body_names=".*"),
"static_friction_range": (0.4, 1.2),
"dynamic_friction_range": (0.4, 1.0),
"restitution_range": (0.0, 0.4),
"num_buckets": 64,
},
)
# Randomize robot mass
add_base_mass = EventTermCfg(
func=mdp.randomize_rigid_body_mass,
mode="startup",
params={
"asset_cfg": SceneEntityCfg("robot", body_names="base"),
"mass_distribution_params": (-3.0, 3.0),
"operation": "add",
},
)
# Randomize joint properties
randomize_actuator_gains = EventTermCfg(
func=mdp.randomize_actuator_gains,
mode="startup",
params={
"asset_cfg": SceneEntityCfg("robot", joint_names=".*"),
"stiffness_distribution_params": (0.8, 1.2),
"damping_distribution_params": (0.8, 1.2),
"operation": "scale",
},
)
Bước 5: Train với RSL-RL
RSL-RL Config
"""
RSL-RL PPO training config.
File: agents/rsl_rl_ppo_cfg.py
"""
from isaaclab_rl.rsl_rl import RslRlOnPolicyRunnerCfg, RslRlPpoCfg
QUADRUPED_PPO_CFG = RslRlOnPolicyRunnerCfg(
num_steps_per_env=24,
max_iterations=1500,
save_interval=100,
experiment_name="quadruped_flat",
empirical_normalization=False,
policy=RslRlPpoCfg(
# Actor-Critic MLP
actor_hidden_dims=[512, 256, 128],
critic_hidden_dims=[512, 256, 128],
activation="elu",
# PPO hyperparameters
init_noise_std=1.0,
num_learning_epochs=5,
num_mini_batches=4,
clip_param=0.2,
discount_factor=0.99,
gae_lambda=0.95,
learning_rate=1e-3,
max_grad_norm=1.0,
value_loss_coef=1.0,
entropy_coef=0.01,
# Schedule
desired_kl=0.01,
schedule="adaptive",
),
)
Launch training
# Train với 4096 parallel environments (headless mode)
cd IsaacLab
./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py \
--task Isaac-Velocity-Flat-Anymal-D-v0 \
--num_envs 4096 \
--headless \
--max_iterations 1500
# Performance benchmark:
# RTX 4090: ~90,000 FPS -> ~1500 iterations in ~20 minutes
# RTX 3080: ~50,000 FPS -> ~1500 iterations in ~35 minutes
Training output sẽ giống như:
[INFO] Starting training...
[INFO] Env: Isaac-Velocity-Flat-Anymal-D-v0 | Num envs: 4096
[INFO] Policy: MLP [512, 256, 128] | Activation: ELU
---
Iter | Reward | Ep. Len | FPS | LR
0 | -12.34 | 200 | 89542 | 1.00e-03
100 | 2.15 | 350 | 91203 | 8.50e-04
200 | 8.73 | 500 | 90876 | 6.20e-04
500 | 15.42 | 800 | 91105 | 3.10e-04
1000 | 19.87 | 1000 | 90234 | 1.50e-04
1500 | 21.56 | 1000 | 91342 | 1.00e-04
---
[INFO] Training complete! Model saved to: logs/rsl_rl/quadruped_flat/
Monitor với TensorBoard
# Trong terminal khác
tensorboard --logdir logs/rsl_rl/quadruped_flat/ --port 6006
# Mở browser: http://localhost:6006
Các metrics quan trọng cần theo dõi:
- Mean reward: Tăng dần và converge -- dấu hiệu training thành công
- Episode length: Tăng lên (robot không ngã sớm)
- Policy loss: Giảm dần và ổn định
- Value loss: Giảm dần
- Learning rate: Giảm theo adaptive schedule
- KL divergence: Giữ quanh
desired_kl(0.01)
Bước 6: Evaluate và Visualize
# Play trained policy với rendering
./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/play.py \
--task Isaac-Velocity-Flat-Anymal-D-v0 \
--num_envs 32 \
--checkpoint logs/rsl_rl/quadruped_flat/model_1500.pt
Hoặc evaluate trong Python:
"""
Evaluate trained policy và in statistics.
"""
import torch
import numpy as np
from isaaclab.envs import ManagerBasedRLEnv
def evaluate_policy(env, policy, num_episodes=100):
"""Run policy và tính success metrics."""
episode_rewards = []
episode_lengths = []
obs, _ = env.reset()
current_rewards = torch.zeros(env.num_envs, device=env.device)
current_lengths = torch.zeros(env.num_envs, device=env.device)
while len(episode_rewards) < num_episodes:
with torch.no_grad():
actions = policy(obs)
obs, rewards, dones, truncated, infos = env.step(actions)
current_rewards += rewards
current_lengths += 1
# Collect finished episodes
done_mask = dones | truncated
if done_mask.any():
done_indices = done_mask.nonzero(as_tuple=False).squeeze(-1)
for idx in done_indices:
episode_rewards.append(current_rewards[idx].item())
episode_lengths.append(current_lengths[idx].item())
current_rewards[done_indices] = 0
current_lengths[done_indices] = 0
# Statistics
rewards_arr = np.array(episode_rewards[:num_episodes])
lengths_arr = np.array(episode_lengths[:num_episodes])
print(f"\n=== Evaluation ({num_episodes} episodes) ===")
print(f"Reward: {rewards_arr.mean():.2f} +/- {rewards_arr.std():.2f}")
print(f"Ep Length: {lengths_arr.mean():.0f} +/- {lengths_arr.std():.0f}")
print(f"Max reward: {rewards_arr.max():.2f}")
print(f"Min reward: {rewards_arr.min():.2f}")
return rewards_arr, lengths_arr
Bước 7: Export policy cho deployment
"""
Export trained policy sang TorchScript cho deployment.
Có thể chạy trên Jetson, robot controller, hoặc ROS 2 node.
"""
import torch
def export_policy(checkpoint_path, output_path, obs_dim=48, act_dim=12):
"""Export RSL-RL policy sang TorchScript."""
# Load checkpoint
checkpoint = torch.load(checkpoint_path, map_location="cpu")
print(f"Loaded checkpoint: {checkpoint_path}")
# Tạo dummy input để trace
dummy_obs = torch.randn(1, obs_dim)
# Load actor network
from rsl_rl.modules import ActorCritic
actor_critic = ActorCritic(
num_obs=obs_dim,
num_actions=act_dim,
actor_hidden_dims=[512, 256, 128],
critic_hidden_dims=[512, 256, 128],
activation="elu",
)
actor_critic.load_state_dict(checkpoint["model_state_dict"])
actor_critic.eval()
# Export chỉ actor (không cần critic cho inference)
class PolicyWrapper(torch.nn.Module):
def __init__(self, actor):
super().__init__()
self.actor = actor
def forward(self, obs):
return self.actor(obs)
wrapper = PolicyWrapper(actor_critic.actor)
traced = torch.jit.trace(wrapper, dummy_obs)
traced.save(output_path)
# Verify
loaded = torch.jit.load(output_path)
test_out = loaded(dummy_obs)
print(f"Exported to: {output_path}")
print(f"Input shape: {dummy_obs.shape}")
print(f"Output shape: {test_out.shape}")
print(f"File size: {os.path.getsize(output_path) / 1024:.1f} KB")
# Export
export_policy(
checkpoint_path="logs/rsl_rl/quadruped_flat/model_1500.pt",
output_path="quadruped_policy.pt",
obs_dim=48,
act_dim=12,
)
Exported policy có thể deploy qua:
- ROS 2 node (xem bài sim-to-real cho chi tiết)
- NVIDIA Jetson (xem bài Edge AI)
- Isaac ROS cho NVIDIA robot stack
Tips và Best Practices
1. VRAM management
# Kiểm tra VRAM usage trong training
nvidia-smi --query-gpu=memory.used,memory.total --format=csv -l 5
# Giảm num_envs nếu hết VRAM:
# RTX 3070 (8GB): --num_envs 2048
# RTX 3080 (10GB): --num_envs 3072
# RTX 4080 (16GB): --num_envs 4096
# RTX 4090 (24GB): --num_envs 8192
2. Reward tuning
- Bắt đầu với positive rewards trước (track velocity), thêm penalties sau
- Dùng exponential tracking rewards (
track_lin_vel_xy_exp) thay vì L2 -- stable hơn - Action rate penalty rất quan trọng -- không có thì policy sẽ output jerky actions
3. Domain randomization
- Bắt đầu với ít randomization, tăng dần khi policy đã học được basic locomotion
- Friction randomization quan trọng nhất cho sim-to-real
- External push test balance recovery -- critical cho real-world robustness
- Mass randomization +-3kg cho quadruped là hợp lý
4. Debug
# Chạy với rendering để xem robot làm gì
./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py \
--task Isaac-Velocity-Flat-Anymal-D-v0 \
--num_envs 16
# Không dùng --headless -> sẽ hiện viewport
# Giảm num_envs để có thể quan sát từng robot
Tổng kết
Trong bài này, bạn đã học:
- Cài đặt Isaac Sim 5.0 + Isaac Lab 2.2
- Kiến trúc Manager-Based của Isaac Lab
- Tạo environment cho quadruped locomotion
- Domain randomization config (friction, mass, push, actuator gains)
- Train với RSL-RL PPO, 4,096 parallel envs, ~20 phút trên RTX 4090
- Evaluate và export policy cho deployment
Pipeline đầy đủ:
Isaac Lab env config
→ Domain randomization setup
→ RSL-RL PPO training (4096 envs, ~1500 iters)
→ TensorBoard monitoring
→ Evaluate (>90% success)
→ Export TorchScript
→ Deploy (ROS 2 / Jetson / Isaac ROS)
Isaac Lab là tool mạnh nhất hiện nay cho large-scale robot RL training. Kết hợp với domain randomization tốt, bạn có thể train policies transfer zero-shot sang robot thật.
Bài viết liên quan
- Simulation cho Robotics: MuJoCo vs Isaac Sim vs Gazebo -- So sánh 3 simulator hàng đầu
- Bắt đầu với MuJoCo: Cài đặt đến mô phỏng robot đầu tiên -- Hands-on MuJoCo tutorial
- Sim-to-Real Transfer: Train simulation, chạy thực tế -- Domain randomization và sim-to-real best practices
- RL cho Bipedal Walking -- Reinforcement learning cho robot đi bộ
- Edge AI với NVIDIA Jetson -- Deploy model lên edge device