Hướng dẫn SO-101 sim-to-real với Isaac Lab & LeRobot

Bạn vừa mở hộp chiếc cánh tay robot SO-101 — 6 bậc tự do, servo Feetech STS3215, giá chưa đến 100 USD. Đây là con robot "affordable" nhất trong cộng đồng LeRobot của HuggingFace. Nhưng câu hỏi tiếp theo mới khó: làm thế nào để dạy nó nhặt và đặt vật thể một cách đáng tin cậy?

Câu trả lời nằm ở sim-to-real transfer — train policy trong môi trường mô phỏng, rồi deploy lên robot thật. NVIDIA đã xuất bản một learning path toàn diện kết hợp Isaac Lab (physics-accurate sim), LeRobot (data collection + training framework) và GR00T N1.5 (vision-language-action foundation model 3B parameters). Bài viết này hướng dẫn toàn bộ pipeline từ đầu đến cuối.

Tại sao sim-to-real thay vì chỉ teleop thực tế?

Teleop trực tiếp trên robot thật có hai vấn đề lớn:

Data bottleneck: Thu thập 100 episode demo trên robot thật mất vài giờ, dễ mệt, dữ liệu không đủ đa dạng.
Risk: Sai lệch nhỏ trong demo → policy học sai → robot va chạm hoặc thực thi sai.

Với sim, bạn có thể thu thập hàng nghìn demo trong vài phút, áp dụng domain randomization (thay đổi ánh sáng, texture, vật lý) để policy robust hơn, và test an toàn trước khi chạy trên hardware thật. Để hiểu rõ hơn về nền tảng này, xem thêm Isaac Lab từ số không: môi trường sim cho robot learning.

Hardware cần thiết

SO-101 gồm một cặp arm: Leader (bạn điều khiển bằng tay) và Follower (thực thi policy). Cả hai đều dùng servo Feetech STS3215.

Component	Specification
Bậc tự do	6 DOF (Base, Shoulder, Elbow, Wrist Pitch, Wrist Roll, Gripper)
Motor follower	6× Feetech STS3215, gear ratio 1/345
Motor leader	Gear ratio khác nhau theo joint (1/191 đến 1/345) để dễ cầm tay
Camera	2× USB webcam 640×480 @30fps (wrist + front)
Controller board	Waveshare Bus Servo Adapter
PC training/inference	GPU ≥25GB VRAM (RTX 4080 trở lên)

SO-101 được thiết kế để in 3D phần lớn các bộ phận. Bạn có thể mua kit từ Hiwonder hoặc tự order BOM từ GitHub của TheRobotStudio và in bằng PLA/PETG.

Stack kỹ thuật tổng quan

Isaac Sim (NVIDIA Omniverse)
   └── Môi trường 3D physically-accurate cho SO-101
       │
Isaac Lab
   └── Training framework: RL/IL, domain randomization, scripted policy
       │
LeRobot (HuggingFace)
   └── Data collection, Hub upload, policy training, robot control
       │
GR00T N1.5 (3B params)
   └── VLA foundation model: vision + language → action sequences
       │
SO-101 Follower Arm
   └── Real hardware deploy via inference server

Isaac Lab cung cấp môi trường vật lý chính xác cho SO-101 với task "vial-to-rack" (nhặt ống nghiệm đặt vào rack). LeRobot là "glue layer" thu thập dữ liệu, upload HuggingFace Hub, train model, và điều khiển robot thực. GR00T N1.5 nhận camera images và language instruction, output action sequences cho từng joint. Bạn có thể tìm hiểu sâu hơn về LeRobot framework qua bài LeRobot Framework: nền tảng imitation learning cho robot thực.

Bước 1: Cài đặt môi trường

Yêu cầu hệ thống

Ubuntu 22.04 (khuyến nghị)
Python 3.10
CUDA 12.x
GPU ≥25GB VRAM

Cài Isaac GR00T

git clone https://github.com/NVIDIA/Isaac-GR00T
cd Isaac-GR00T
conda create -n gr00t python=3.10
conda activate gr00t
pip install --upgrade setuptools
pip install -e ".[base]"
# Flash Attention giúp training nhanh hơn ~2×
pip install --no-build-isolation flash-attn==2.7.1.post4

Cài LeRobot với Feetech SDK

pip install lerobot
# Feetech SDK cần thiết để giao tiếp với servo STS3215
pip install -e ".[feetech]"

Download model GR00T N1.5

huggingface-cli download nvidia/GR00T-N1.5-3B

Bước 2: Lắp ráp và cấu hình SO-101

Tìm USB port

Kết nối từng arm vào máy tính, chạy lệnh sau để xác định port:

lerobot-find-port
# Ngắt cáp USB khi được hỏi, script tự tìm port tương ứng
# Output mẫu: /dev/ttyACM0 (follower), /dev/ttyACM1 (leader)

Trên Linux, cần cấp quyền truy cập USB:

sudo chmod 666 /dev/ttyACM0
sudo chmod 666 /dev/ttyACM1

Setup motor ID và baudrate

Mỗi motor cần ID unique từ 1–6. Kết nối từng motor một vào controller board và chạy script setup:

# Setup follower arm
lerobot-setup-motors \
    --robot.type=so101_follower \
    --robot.port=/dev/ttyACM0 \
    --robot.id=my_follower_arm

Script hướng dẫn bạn cắm từng motor theo thứ tự, tự động gán ID: 1 (shoulder pan) → 2 (shoulder lift) → 3 (elbow flex) → 4 (wrist flex) → 5 (wrist roll) → 6 (gripper).

# Setup leader arm
lerobot-setup-motors \
    --teleop.type=so101_leader \
    --teleop.port=/dev/ttyACM1 \
    --teleop.id=my_leader_arm

Calibration

Calibration đảm bảo leader và follower có cùng giá trị position khi ở cùng tư thế vật lý. Bước này bắt buộc nếu muốn transfer policy giữa các robot — neural network train trên một robot cần biết offset để chạy đúng trên robot khác.

# Calibrate follower
lerobot-calibrate \
    --robot.type=so101_follower \
    --robot.port=/dev/ttyACM0 \
    --robot.id=my_follower_arm

# Calibrate leader
lerobot-calibrate \
    --teleop.type=so101_leader \
    --teleop.port=/dev/ttyACM1 \
    --teleop.id=my_leader_arm

Trong quá trình calibration, bạn được yêu cầu đưa robot về các tư thế reference (neutral, góc giới hạn mỗi joint). LeRobot ghi offset vào config file ~/.cache/lerobot/calibration/.

Bước 3: Thu thập dữ liệu trong Isaac Lab

Thay vì teleop trực tiếp trên robot thật (tốn thời gian và mệt mỏi), bạn teleop trong Isaac Sim để thu thập demo nhanh và đa dạng hơn. Xem thêm về kỹ thuật thu thập dữ liệu teleop tại LeRobot teleop và thu thập dữ liệu thực tế.

# Launch teleoperation trong Isaac Lab với Domain Randomization bật sẵn
lerobot_agent \
    --task Lerobot-So101-Teleop-Vials-To-Rack-DR \
    --repo_id ${HF_USER}/so101_teleop_vials \
    --repo_root $(pwd)/datasets/so101_teleop_vials

Điều khiển khi recording:

Phím	Chức năng
`S`	Start/Stop recording một episode
`C`	Cancel episode (bỏ không lưu vào dataset)
`R`	Reset môi trường với randomization parameters mới

Domain Randomization (DR) được áp dụng mỗi khi reset, randomize:

Ánh sáng: exposure −4 đến +3 stops, color temperature 2500K–9500K, HDRI ngẫu nhiên
Camera pose: ±0.02m position offset, ±0.05 rad rotation offset
Vật thể: vị trí vial và rack ngẫu nhiên trên bàn, 33% xác suất vial đã nằm sẵn trong slot

Mục tiêu: thu thập tối thiểu 70 episodes để đủ data cho training chất lượng tốt.

Bước 4: Upload dataset lên HuggingFace Hub

huggingface-cli login  # nhập HF token
lerobot-upload \
    --repo_id ${HF_USER}/so101_teleop_vials \
    --repo_root $(pwd)/datasets/so101_teleop_vials

Dataset được lưu theo LeRobot v2 format: metadata JSON, video observations (640×480 @30fps), joint positions, gripper state. Mỗi episode được đánh số và có thể browse trực tiếp trên HuggingFace Hub.

Bước 5: Fine-tune GR00T N1.5

Chuẩn bị modality config

SO-101 là embodiment mới (không có trong GR00T pre-training). GR00T sử dụng EmbodimentTag = new_embodiment cho các robot chưa từng gặp, cần copy modality file phù hợp với camera setup:

# Dual-camera setup (wrist camera + front camera)
cp getting_started/examples/so100_dualcam__modality.json \
   ./demo_data/so101-vials/meta/modality.json

# Nếu chỉ có single camera:
# cp getting_started/examples/so100__modality.json \
#    ./demo_data/so101-vials/meta/modality.json

Verify dataset

python scripts/load_dataset.py \
    --dataset-path ./demo_data/so101-vials \
    --plot-state-action \
    --video-backend torchvision_av

Nếu load thành công, bạn thấy state/action plots và video preview cho mỗi episode.

Training command

python scripts/gr00t_finetune.py \
   --dataset-path ./demo_data/so101-vials/ \
   --num-gpus 1 \
   --output-dir ./checkpoints/so101-policy \
   --max-steps 10000 \
   --data-config so100_dualcam \
   --video-backend torchvision_av \
   --no-tune_diffusion_model \
   --batch-size 16 \
   --lora-rank 16 \
   --dataloader-num-workers 16

Các flag quan trọng:

Flag	Giải thích	Khi nào dùng
`--no-tune_diffusion_model`	Skip fine-tune diffusion head, chỉ train LoRA	GPU VRAM < 40GB
`--max-steps 10000`	10K steps cho task đơn giản	Pick-and-place đơn giản
`--max-steps 20000`	20K steps cho task phức tạp	Multi-step manipulation
`--lora-rank 16`	LoRA rank cho parameter-efficient fine-tuning	Cân bằng quality vs compute
`--batch-size 16`	Batch size	Phù hợp RTX 4080 16GB với `--no-tune_diffusion_model`

Training mất khoảng 2–4 giờ trên RTX 4080 với 10K steps và 70 episodes.

Theo dõi convergence

Average MSE sau khoảng 5000 steps nên đạt 50–60 trên action prediction. Nếu loss không giảm sau 2000 steps, kiểm tra:

Dataset format có đúng LeRobot v2 không (v3 không tương thích với GR00T N1.5 — dùng PR #2109 để convert)
modality.json có trỏ đúng camera keys không (default: wrist và front)
--data-config có match với số camera không

Bước 6: Evaluation trong Simulation (Open-loop)

Trước khi deploy lên robot thật, đánh giá open-loop để kiểm tra model có predict action đúng không:

python scripts/eval_policy.py --plot \
   --embodiment_tag new_embodiment \
   --model_path ./checkpoints/so101-policy \
   --data_config so100_dualcam \
   --dataset_path ./demo_data/so101-vials/ \
   --video_backend torchvision_av \
   --modality_keys single_arm gripper

Output là action trajectory plots so sánh ground truth vs predicted actions. Open-loop eval tốt là điều kiện cần nhưng chưa đủ — policy vẫn có thể fail trong closed-loop execution vì accumulation error.

Bước 7: Deploy lên robot thật

Deploy theo kiến trúc server-client tách biệt. Server chạy model inference trên GPU, client đọc camera và điều khiển servo.

Terminal 1 — Inference server

python scripts/inference_service.py --server \
    --model_path ./checkpoints/so101-policy \
    --embodiment-tag new_embodiment \
    --data-config so100_dualcam \
    --denoising-steps 4

Server lắng nghe trên port mặc định. Nếu robot di chuyển bị giật cục, tăng --denoising-steps 16 để có trajectory mượt hơn (tradeoff: inference chậm hơn ~4×).

Terminal 2 — Robot client

python getting_started/examples/eval_lerobot.py \
    --robot.type=so101_follower \
    --robot.port=/dev/ttyACM0 \
    --robot.id=my_follower_arm \
    --robot.cameras="{ \
        wrist: {type: opencv, index_or_path: 9, width: 640, height: 480, fps: 30}, \
        front: {type: opencv, index_or_path: 15, width: 640, height: 480, fps: 30}}" \
    --policy_host=127.0.0.1 \
    --lang_instruction="Pick up the vial and place it in the yellow rack."

Tìm camera index đúng:

v4l2-ctl --list-devices
# Hoặc thử từng index: ls /dev/video*

Hai chiến lược sim-to-real chính

Chiến lược 1: Domain Randomization (DR) — Sim-only

Train hoàn toàn trong sim, randomize mạnh để policy generalizes sang real world. Không cần demo trên robot thật.

Ưu điểm:

Không cần thu thập data thực tế
Scale tốt với nhiều randomization parameters
Dễ implement

Nhược điểm:

Policy thường conservative (di chuyển chậm, "phòng thủ")
Cần expertise để tune randomization ranges hợp lý
Kém hơn co-training trên visual accuracy

Khuyến nghị: Dùng khi chưa có robot thật, hoặc khi muốn validate pipeline nhanh.

Chiến lược 2: Co-training (Sim + Real)

Kết hợp ít dữ liệu thực tế với nhiều dữ liệu sim. Chỉ cần 5 episodes real + 70–100 episodes sim là đủ.

# Upload real data lên Hub
lerobot-record \
    --robot.type=so101_follower \
    --robot.port=/dev/ttyACM0 \
    --teleop.type=so101_leader \
    --teleop.port=/dev/ttyACM1 \
    --dataset.repo_id=${HF_USER}/so101_real_5eps \
    --dataset.num_episodes=5

# Merge datasets trước khi train (concept)
# Thực tế: train với --dataset.repo_ids list trong GR00T finetune config