SO-101 Sim-to-Real Training: Isaac Lab & LeRobot Guide

You just unboxed an SO-101 robot arm — 6 degrees of freedom, Feetech STS3215 servos, under $100. It's the most affordable robot arm in HuggingFace's LeRobot community. But the next question is the hard one: how do you teach it to pick and place objects reliably?

The answer is sim-to-real transfer — train a policy in simulation, then deploy it on the real robot. NVIDIA has published a comprehensive learning path combining Isaac Lab (physics-accurate simulation), LeRobot (data collection + training framework), and GR00T N1.5 (a 3B-parameter vision-language-action foundation model). This article walks through the entire pipeline from scratch.

Why Sim-to-Real Instead of Just Real Teleoperation?

Teleoperation directly on the physical robot has two major problems:

Data bottleneck: Collecting 100 demo episodes on a real robot takes hours, is fatiguing, and produces data that lacks diversity.
Risk: Small errors in demonstration data → policy learns wrong behavior → robot crashes.

With simulation, you can collect thousands of demos in minutes, apply domain randomization (randomize lighting, textures, physics) to make the policy robust, and safely test before running on real hardware. For a deeper look at Isaac Lab fundamentals, see Isaac Lab from Scratch: Simulation for Robot Learning.

Required Hardware

SO-101 consists of a pair of arms: a Leader (you move by hand for demonstrations) and a Follower (runs the learned policy). Both use Feetech STS3215 servos.

Component	Specification
Degrees of freedom	6 DOF (Base, Shoulder, Elbow, Wrist Pitch, Wrist Roll, Gripper)
Follower motors	6× Feetech STS3215, gear ratio 1/345
Leader motors	Mixed gear ratios per joint (1/191 to 1/345) for easy hand-held movement
Cameras	2× USB webcam 640×480 @30fps (wrist + front)
Controller board	Waveshare Bus Servo Adapter
Training/inference PC	GPU with ≥25GB VRAM (RTX 4080 or better)

Most SO-101 structural parts are 3D-printed. You can buy a kit from Hiwonder or order your own BOM from TheRobotStudio's GitHub and print in PLA or PETG.

Technical Stack Overview

Isaac Sim (NVIDIA Omniverse)
   └── Physically-accurate 3D environment for SO-101
       │
Isaac Lab
   └── Training framework: RL/IL, domain randomization, scripted policies
       │
LeRobot (HuggingFace)
   └── Data collection, Hub upload, policy training, robot control
       │
GR00T N1.5 (3B params)
   └── VLA foundation model: vision + language → action sequences
       │
SO-101 Follower Arm
   └── Real hardware deployment via inference server

Isaac Lab provides a physically-accurate environment for the SO-101 with the "vial-to-rack" task (pick up a centrifuge vial and place it in a rack). LeRobot is the "glue layer" — data collection, HuggingFace Hub upload, policy training, and robot control. GR00T N1.5 takes camera images and a language instruction, then outputs action sequences for each joint. For a deep introduction to the LeRobot framework, read LeRobot Framework: Imitation Learning for Real Robots.

Step 1: Environment Setup

System Requirements

Ubuntu 22.04 (recommended)
Python 3.10
CUDA 12.x
GPU with ≥25GB VRAM

Install Isaac GR00T

git clone https://github.com/NVIDIA/Isaac-GR00T
cd Isaac-GR00T
conda create -n gr00t python=3.10
conda activate gr00t
pip install --upgrade setuptools
pip install -e ".[base]"
# Flash Attention speeds up training by ~2×
pip install --no-build-isolation flash-attn==2.7.1.post4

Install LeRobot with Feetech SDK

pip install lerobot
# Feetech SDK required to communicate with STS3215 servos
pip install -e ".[feetech]"

Download the GR00T N1.5 Model

huggingface-cli download nvidia/GR00T-N1.5-3B

Step 2: Assemble and Configure the SO-101

Find USB Ports

Connect each arm to your computer and run:

lerobot-find-port
# When prompted: unplug the USB cable of the arm you're configuring
# Example output: /dev/ttyACM0 (follower), /dev/ttyACM1 (leader)

On Linux, grant USB access permissions:

sudo chmod 666 /dev/ttyACM0
sudo chmod 666 /dev/ttyACM1

Set Motor IDs and Baudrate

Each motor needs a unique ID from 1–6. Connect one motor at a time to the controller board and run:

# Configure follower arm
lerobot-setup-motors \
    --robot.type=so101_follower \
    --robot.port=/dev/ttyACM0 \
    --robot.id=my_follower_arm

The script guides you through connecting each motor in sequence, automatically assigning IDs: 1 (shoulder pan) → 2 (shoulder lift) → 3 (elbow flex) → 4 (wrist flex) → 5 (wrist roll) → 6 (gripper).

# Configure leader arm
lerobot-setup-motors \
    --teleop.type=so101_leader \
    --teleop.port=/dev/ttyACM1 \
    --teleop.id=my_leader_arm

Calibration

Calibration ensures leader and follower report the same position values when in the same physical pose. This is mandatory if you want to transfer policies between robots — a neural network trained on one robot needs to know position offsets to run correctly on another.

# Calibrate follower
lerobot-calibrate \
    --robot.type=so101_follower \
    --robot.port=/dev/ttyACM0 \
    --robot.id=my_follower_arm

# Calibrate leader
lerobot-calibrate \
    --teleop.type=so101_leader \
    --teleop.port=/dev/ttyACM1 \
    --teleop.id=my_leader_arm

During calibration, you'll move each arm to reference poses (neutral position, joint limits). LeRobot saves offsets to ~/.cache/lerobot/calibration/.

Step 3: Collect Data in Isaac Lab

Instead of direct real-robot teleoperation (time-consuming and tiring), teleop inside Isaac Sim to collect demos faster and with more variety. For more on teleoperation data collection techniques, see LeRobot Teleop and Real-World Data Collection.

# Launch teleoperation in Isaac Lab with Domain Randomization enabled
lerobot_agent \
    --task Lerobot-So101-Teleop-Vials-To-Rack-DR \
    --repo_id ${HF_USER}/so101_teleop_vials \
    --repo_root $(pwd)/datasets/so101_teleop_vials

Recording controls:

Key	Function
`S`	Start/Stop recording an episode
`C`	Cancel episode (discard without saving)
`R`	Reset environment with new randomization parameters

Domain Randomization (DR) is applied on each reset, randomizing:

Lighting: exposure −4 to +3 stops, color temperature 2500K–9500K, random HDRI selection
Camera pose: ±0.02m position offset, ±0.05 rad rotation offset
Objects: random vial and rack positions on the table, 33% chance a vial is pre-placed in a rack slot

Target: collect at least 70 episodes for quality policy training results.

Step 4: Upload Dataset to HuggingFace Hub

huggingface-cli login  # enter your HF token
lerobot-upload \
    --repo_id ${HF_USER}/so101_teleop_vials \
    --repo_root $(pwd)/datasets/so101_teleop_vials

The dataset is saved in LeRobot v2 format: JSON metadata, video observations (640×480 @30fps), joint positions, and gripper state. Each episode is indexed and can be browsed directly on the HuggingFace Hub.

Step 5: Fine-Tune GR00T N1.5

Prepare the Modality Config

SO-101 is a new embodiment (not in GR00T's pre-training corpus). GR00T uses EmbodimentTag = new_embodiment for unseen robots. Copy the appropriate modality config for your camera setup:

# Dual-camera setup (wrist + front camera)
cp getting_started/examples/so100_dualcam__modality.json \
   ./demo_data/so101-vials/meta/modality.json

# For single-camera setups:
# cp getting_started/examples/so100__modality.json \
#    ./demo_data/so101-vials/meta/modality.json

Verify Dataset Loading

python scripts/load_dataset.py \
    --dataset-path ./demo_data/so101-vials \
    --plot-state-action \
    --video-backend torchvision_av

If successful, you'll see state/action plots and video preview for each episode.

Training Command

python scripts/gr00t_finetune.py \
   --dataset-path ./demo_data/so101-vials/ \
   --num-gpus 1 \
   --output-dir ./checkpoints/so101-policy \
   --max-steps 10000 \
   --data-config so100_dualcam \
   --video-backend torchvision_av \
   --no-tune_diffusion_model \
   --batch-size 16 \
   --lora-rank 16 \
   --dataloader-num-workers 16

Key training flags:

Flag	Purpose	When to Use
`--no-tune_diffusion_model`	Skip fine-tuning diffusion head; only train LoRA	GPU VRAM < 40GB
`--max-steps 10000`	10K steps for simple tasks	Basic pick-and-place
`--max-steps 20000`	20K steps for complex tasks	Multi-step manipulation
`--lora-rank 16`	LoRA rank for parameter-efficient fine-tuning	Balances quality vs compute
`--batch-size 16`	Batch size	Fits RTX 4080 16GB with `--no-tune_diffusion_model`

Training takes approximately 2–4 hours on an RTX 4080 with 10K steps and 70 episodes.

Monitoring Convergence

Average MSE should reach 50–60 on action prediction around step 5000. If loss doesn't decrease after 2000 steps, check:

Dataset format: must be LeRobot v2, not v3 (use PR #2109 to convert if needed)
modality.json camera keys match your setup (default: wrist and front)
--data-config matches your number of cameras

Step 6: Sim Evaluation (Open-Loop)

Before deploying on real hardware, run open-loop evaluation to verify action prediction quality:

python scripts/eval_policy.py --plot \
   --embodiment_tag new_embodiment \
   --model_path ./checkpoints/so101-policy \
   --data_config so100_dualcam \
   --dataset_path ./demo_data/so101-vials/ \
   --video_backend torchvision_av \
   --modality_keys single_arm gripper

The output shows action trajectory plots comparing ground truth vs predicted actions. Good open-loop performance is a necessary but not sufficient condition — policies can still fail in closed-loop execution due to error accumulation.

Step 7: Deploy on Real Hardware

Deploy using a server-client split architecture. The server runs model inference on the GPU; the client reads cameras and commands the servos.

Terminal 1 — Inference Server

python scripts/inference_service.py --server \
    --model_path ./checkpoints/so101-policy \
    --embodiment-tag new_embodiment \
    --data-config so100_dualcam \
    --denoising-steps 4

If the robot moves jerkily, increase to --denoising-steps 16 for smoother trajectories (tradeoff: inference is ~4× slower).

Terminal 2 — Robot Client

python getting_started/examples/eval_lerobot.py \
    --robot.type=so101_follower \
    --robot.port=/dev/ttyACM0 \
    --robot.id=my_follower_arm \
    --robot.cameras="{ \
        wrist: {type: opencv, index_or_path: 9, width: 640, height: 480, fps: 30}, \
        front: {type: opencv, index_or_path: 15, width: 640, height: 480, fps: 30}}" \
    --policy_host=127.0.0.1 \
    --lang_instruction="Pick up the vial and place it in the yellow rack."

Finding the right camera index:

v4l2-ctl --list-devices
# Or enumerate: ls /dev/video*

Two Core Sim-to-Real Strategies

Strategy 1: Domain Randomization (DR) — Sim-Only

Train entirely in simulation with aggressive randomization so the policy generalizes to real-world conditions. No real demonstrations needed.

Strengths:

No real-world data collection required
Scales well across many randomization parameters
Simple to implement

Weaknesses:

Policies tend to be conservative (slow, "defensive" movements)
Requires expertise to tune randomization ranges appropriately
Visual accuracy lower than co-training approaches

Recommended when: You don't yet have a physical robot, or want to quickly validate the training pipeline.

Strategy 2: Co-Training (Sim + Real)

Combine a small amount of real teleoperation data with a large sim dataset. Just 5 real episodes combined with 70–100 sim episodes is typically sufficient.

# Collect real teleoperation data
lerobot-record \
    --robot.type=so101_follower \
    --robot.port=/dev/ttyACM0 \
    --teleop.type=so101_leader \
    --teleop.port=/dev/ttyACM1 \
    --dataset.repo_id=${HF_USER}/so101_real_5eps \
    --dataset.num_episodes=5

Strengths:

Higher accuracy than DR-only because real data anchors the policy to visual reality
Much less real data required than real-only training

Weaknesses:

Still requires some physical robot teleoperation
Slightly more complex pipeline setup

Pre-trained co-training checkpoint available at: aravindhs-NV/grootn16-finetune_sreetz-so101_teleop_vials_rack_left_sim_and_real/checkpoint-10000

Common Errors and Solutions

Error	Cause	Fix
`ModuleNotFoundError: service`	Wrong Isaac-GR00T branch	Checkout tag `n1.5-release`
Model ignores language instruction	Overfitting on visual patterns	Reduce `--max-steps`, add language variation in demos
Jerky robot movement	`denoising-steps` too low	Increase to 16
`v3 dataset incompatible`	Dataset format newer than GR00T supports	Use LeRobot v2 or convert via PR #2109
CUDA out of memory	Insufficient VRAM	Add `--no-tune_diffusion_model` flag
Camera not recognized	Wrong index	Use `v4l2-ctl --list-devices` to find correct index

Next Steps

Once you've mastered the basic SO-101 + Isaac Lab + GR00T N1.5 pipeline, consider exploring:

Cosmos Augmentation — Use NVIDIA Cosmos to generate synthetic variations of real camera images (change textures, lighting), increasing visual diversity without recording more demos.
SAGE + GapONet — Quantitatively measure the sim-to-real gap by comparing actuator dynamics between sim and real, then compensate with a learned residual model.
SmolVLA as a lightweight alternative — A smaller (~0.5B parameter) model suitable for deployment on lower-spec GPUs. See LeRobot SmolVLA: Lightweight Training for Real Robots.

The key insight from this entire pipeline: diverse data matters more than abundant data. Seventy episodes with strong domain randomization consistently outperform 500 episodes collected under fixed conditions — because the policy needs to generalize to the real world, not just overfit to simulation conditions.

For a deeper dive into sim-to-real transfer fundamentals, see Complete Sim-to-Real Pipeline: From Isaac Lab to Real Hardware.

SO-101 Sim-to-Real Training: Isaac Lab & LeRobot Guide

Why Sim-to-Real Instead of Just Real Teleoperation?

Required Hardware

Technical Stack Overview

Step 1: Environment Setup

System Requirements

Install Isaac GR00T

Install LeRobot with Feetech SDK

Download the GR00T N1.5 Model

Step 2: Assemble and Configure the SO-101

Find USB Ports

Set Motor IDs and Baudrate

Calibration

Step 3: Collect Data in Isaac Lab

Step 4: Upload Dataset to HuggingFace Hub

Step 5: Fine-Tune GR00T N1.5

Prepare the Modality Config

Verify Dataset Loading

Training Command

Monitoring Convergence

Step 6: Sim Evaluation (Open-Loop)

Step 7: Deploy on Real Hardware

Terminal 1 — Inference Server

Terminal 2 — Robot Client

Two Core Sim-to-Real Strategies

Strategy 1: Domain Randomization (DR) — Sim-Only

Strategy 2: Co-Training (Sim + Real)

Common Errors and Solutions

Next Steps

Nguyễn Anh Tuấn

Bài viết liên quan

VLA-Adapter: Train VLA 0.5B với 9.6GB VRAM, 99.2% LIBERO

VLA-0: Train VLA Đỉnh Cao Không Cần Sửa Kiến Trúc

RDT2: Foundation Model Zero-Shot Cross-Embodiment cho Bimanual UR5e/Franka

Why Sim-to-Real Instead of Just Real Teleoperation?

Required Hardware

Technical Stack Overview

Step 1: Environment Setup

System Requirements

Install Isaac GR00T

Install LeRobot with Feetech SDK

Download the GR00T N1.5 Model

Step 2: Assemble and Configure the SO-101

Find USB Ports

Set Motor IDs and Baudrate

Calibration

Step 3: Collect Data in Isaac Lab

Step 4: Upload Dataset to HuggingFace Hub

Step 5: Fine-Tune GR00T N1.5

Prepare the Modality Config

Verify Dataset Loading

Training Command

Monitoring Convergence

Step 6: Sim Evaluation (Open-Loop)

Step 7: Deploy on Real Hardware

Terminal 1 — Inference Server

Terminal 2 — Robot Client

Two Core Sim-to-Real Strategies

Strategy 1: Domain Randomization (DR) — Sim-Only

Strategy 2: Co-Training (Sim + Real)

Common Errors and Solutions

Next Steps

Related Posts

Nguyễn Anh Tuấn

Bài viết liên quan

VLA-Adapter: Train VLA 0.5B với 9.6GB VRAM, 99.2% LIBERO

VLA-0: Train VLA Đỉnh Cao Không Cần Sửa Kiến Trúc

RDT2: Foundation Model Zero-Shot Cross-Embodiment cho Bimanual UR5e/Franka