GR00T N1 + G1 (Bài 3): fine-tune GR00T N1 — GPU, config, training script

Đây là bài 3 trong series GR00T N1 + Unitree G1. Bài trước thu được LeRobot dataset. Bài này: chạy fine-tune GR00T N1 2B params với dataset đó.

GR00T N1 là pretrained foundation model — nó đã biết nhiều task chung. Fine-tune chỉ cần dạy nó task cụ thể của bạn trên robot cụ thể. Với 50-100 demos, training thường mất 2-4 giờ trên RTX 4090.

GPU requirements thực tế

Setup	VRAM	Batch size	Training time (50 demos)
RTX 4090	24GB	8	~3 giờ
A100 40GB	40GB	16	~1.5 giờ
A100 80GB	80GB	32	~45 phút
2× RTX 4090	48GB	16	~1.5 giờ (DDP)

Minimum: RTX 4090 (24GB). Dưới 24GB VRAM sẽ bị OOM với default config. Có thể dùng gradient checkpointing để giảm VRAM nhưng training sẽ chậm hơn ~40%.

Cài đặt môi trường training

git clone https://github.com/NVIDIA/Isaac-GR00T.git
cd Isaac-GR00T

# Tạo conda environment
conda create -n groot python=3.10
conda activate groot

# Cài dependencies
pip install -e ".[train]"

# Verify CUDA + torch
python -c "import torch; print(torch.cuda.get_device_name(0)); print(torch.version.cuda)"

# Download GR00T N1 pretrained checkpoint
python scripts/download_model.py --model GR00T-N1-2B
# checkpoint sẽ về: ./checkpoints/GR00T-N1-2B/

Config file

GR00T dùng YAML config. Tạo file config cho task của bạn:

# configs/finetune_g1_pickplace.yaml

# Model
model:
  name: "GR00T-N1-2B"
  checkpoint_path: "./checkpoints/GR00T-N1-2B"
  freeze_vision_encoder: false    # true để train nhanh hơn, false cho accuracy tốt hơn
  freeze_language_model: true     # LUÔN freeze LM để tiết kiệm VRAM

# Dataset
dataset:
  path: "./data/g1_pickplace_lerobot"
  robot: "g1"
  cameras:
    - "observation.images.left_wrist"
    - "observation.images.right_wrist"
    - "observation.images.head"      # bỏ nếu không có head camera
  action_keys:
    - "action/left_ee_pose"     # 7-dim: xyz + quat
    - "action/right_ee_pose"    # 7-dim
    - "action/gripper"          # 2-dim: left, right
  
  # Train/val split
  train_split: 0.9
  val_split: 0.1

# Training hyperparams
training:
  batch_size: 8                 # giảm xuống 4 nếu OOM
  learning_rate: 1.0e-4
  lr_scheduler: "cosine"
  warmup_steps: 100
  num_epochs: 100
  gradient_clip: 1.0
  
  # Gradient checkpointing — bật nếu VRAM < 24GB
  gradient_checkpointing: false
  
  # Mixed precision
  bf16: true                    # A100 dùng bf16; RTX 4090 dùng fp16
  fp16: false

# Action head — flow matching
action_head:
  num_timesteps: 100           # denoising steps (10 cho fast inference)
  action_chunk_size: 16        # predict 16 steps ahead
  
# Logging
output_dir: "./runs/g1_pickplace"
log_every: 10                  # log loss mỗi 10 steps
eval_every: 500                # evaluate trên val set mỗi 500 steps
save_every: 500                # save checkpoint mỗi 500 steps

Chạy training

# Single GPU
python scripts/finetune.py \
  --config configs/finetune_g1_pickplace.yaml

# Multi-GPU (2× RTX 4090)
torchrun --nproc_per_node=2 scripts/finetune.py \
  --config configs/finetune_g1_pickplace.yaml

# Xem training logs
tensorboard --logdir ./runs/g1_pickplace/tensorboard

Output trong terminal:

Epoch 1/100 | Step 50/500 | Loss: 0.342 | Action loss: 0.298 | LR: 8.3e-5
Epoch 1/100 | Step 100/500 | Loss: 0.241 | Action loss: 0.198 | LR: 1.0e-4
...
[Eval] Step 500 | Val loss: 0.187 | Saved checkpoint: ./runs/g1_pickplace/checkpoint_500/

Theo dõi training — khi nào stop?

Loss curve bình thường:

Train loss:   0.35 → 0.20 → 0.12 → 0.09 → 0.07 (giảm đều)
Val loss:     0.33 → 0.19 → 0.13 → 0.10 → 0.08 (theo sát train)

Dấu hiệu overfitting — stop sớm:

Train loss:   0.35 → 0.15 → 0.08 → 0.04 → 0.02 (giảm rất nhanh)
Val loss:     0.33 → 0.18 → 0.16 → 0.18 → 0.22 (bắt đầu tăng lại)

Khi val loss tăng 3 eval liên tiếp → dùng checkpoint trước đó (early stopping).

Adapt config cho robot khác

Thay robot: "g1" và action_keys theo robot của bạn:

# Ví dụ cho GR1 (Fourier Intelligence)
dataset:
  robot: "gr1"
  cameras:
    - "observation.images.left_wrist"
    - "observation.images.right_wrist"
  action_keys:
    - "action/left_ee_pose"
    - "action/right_ee_pose"
    - "action/gripper"

# Ví dụ cho robot arm đơn (không phải humanoid)
dataset:
  robot: "franka"
  cameras:
    - "observation.images.wrist"
    - "observation.images.overhead"
  action_keys:
    - "action/ee_pose"     # 7-dim
    - "action/gripper"     # 1-dim

GR00T N1 hỗ trợ variable action dimension — chỉ cần khai báo đúng action_keys.

Evaluate checkpoint trong sim

Trước khi deploy lên robot thật, luôn evaluate trong sim:

# Rollout checkpoint trong Isaac Lab
python scripts/evaluate_checkpoint.py \
  --checkpoint ./runs/g1_pickplace/checkpoint_best/ \
  --robot g1 \
  --task PickPlace \
  --num_episodes 20 \
  --render         # hiện Isaac Sim GUI

# Output:
# Episode 0: SUCCESS (12.3s)
# Episode 1: SUCCESS (11.8s)
# Episode 2: FAIL — gripper missed object at step 45
# ...
# Success rate: 17/20 = 85%

Threshold trước khi deploy real: ≥ 80% success rate trong sim cho task đơn giản (pick-place). Tasks phức tạp hơn có thể thấp hơn nhưng phải có trend cải thiện.

Troubleshooting thường gặp

OOM (Out of Memory):

# Giảm batch size
training:
  batch_size: 4   # từ 8 xuống 4

# Hoặc bật gradient checkpointing
  gradient_checkpointing: true

# Hoặc freeze vision encoder
model:
  freeze_vision_encoder: true

Loss không giảm sau 20 epochs:

# Thử tăng learning rate
training:
  learning_rate: 3.0e-4   # từ 1e-4 lên 3e-4

# Hoặc unfreeze vision encoder
model:
  freeze_vision_encoder: false

Val loss tốt nhưng sim performance tệ: → Dataset quality issue. Xem lại demos: có demo fail trong dataset không? Vị trí object có đủ diverse không? Camera calibration đúng chưa?

Best checkpoint: checkpoint_best vs checkpoint_last

Script tự động lưu checkpoint_best (val loss thấp nhất) và checkpoint_last (epoch cuối).

ls ./runs/g1_pickplace/
# checkpoint_500/   checkpoint_1000/  checkpoint_best/  checkpoint_last/

# Dùng checkpoint_best cho deploy
python scripts/evaluate_checkpoint.py \
  --checkpoint ./runs/g1_pickplace/checkpoint_best/

Luôn dùng checkpoint_best — không phải checkpoint_last. Training thêm không phải lúc nào cũng tốt hơn.

Bài tiếp theo: Deploy GR00T-WBC trên G1 thật — GEAR + SONIC.

Nguồn tham khảo

GR00T N1 + G1 (Bài 3): fine-tune GR00T N1 — GPU, config, training script

Đây là bài 3 trong series GR00T N1 + Unitree G1. Bài trước thu được LeRobot dataset. Bài này: chạy fine-tune GR00T N1 2B params với dataset đó.

GPU requirements thực tế

Setup	VRAM	Batch size	Training time (50 demos)
RTX 4090	24GB	8	~3 giờ
A100 40GB	40GB	16	~1.5 giờ
A100 80GB	80GB	32	~45 phút
2× RTX 4090	48GB	16	~1.5 giờ (DDP)

Minimum: RTX 4090 (24GB). Dưới 24GB VRAM sẽ bị OOM với default config. Có thể dùng gradient checkpointing để giảm VRAM nhưng training sẽ chậm hơn ~40%.

Cài đặt môi trường training

git clone https://github.com/NVIDIA/Isaac-GR00T.git
cd Isaac-GR00T

# Tạo conda environment
conda create -n groot python=3.10
conda activate groot

# Cài dependencies
pip install -e ".[train]"

# Verify CUDA + torch
python -c "import torch; print(torch.cuda.get_device_name(0)); print(torch.version.cuda)"

# Download GR00T N1 pretrained checkpoint
python scripts/download_model.py --model GR00T-N1-2B
# checkpoint sẽ về: ./checkpoints/GR00T-N1-2B/

Config file

GR00T dùng YAML config. Tạo file config cho task của bạn:

# configs/finetune_g1_pickplace.yaml

# Model
model:
  name: "GR00T-N1-2B"
  checkpoint_path: "./checkpoints/GR00T-N1-2B"
  freeze_vision_encoder: false    # true để train nhanh hơn, false cho accuracy tốt hơn
  freeze_language_model: true     # LUÔN freeze LM để tiết kiệm VRAM

# Dataset
dataset:
  path: "./data/g1_pickplace_lerobot"
  robot: "g1"
  cameras:
    - "observation.images.left_wrist"
    - "observation.images.right_wrist"
    - "observation.images.head"      # bỏ nếu không có head camera
  action_keys:
    - "action/left_ee_pose"     # 7-dim: xyz + quat
    - "action/right_ee_pose"    # 7-dim
    - "action/gripper"          # 2-dim: left, right
  
  # Train/val split
  train_split: 0.9
  val_split: 0.1

# Training hyperparams
training:
  batch_size: 8                 # giảm xuống 4 nếu OOM
  learning_rate: 1.0e-4
  lr_scheduler: "cosine"
  warmup_steps: 100
  num_epochs: 100
  gradient_clip: 1.0
  
  # Gradient checkpointing — bật nếu VRAM < 24GB
  gradient_checkpointing: false
  
  # Mixed precision
  bf16: true                    # A100 dùng bf16; RTX 4090 dùng fp16
  fp16: false

# Action head — flow matching
action_head:
  num_timesteps: 100           # denoising steps (10 cho fast inference)
  action_chunk_size: 16        # predict 16 steps ahead
  
# Logging
output_dir: "./runs/g1_pickplace"
log_every: 10                  # log loss mỗi 10 steps
eval_every: 500                # evaluate trên val set mỗi 500 steps
save_every: 500                # save checkpoint mỗi 500 steps

Chạy training

# Single GPU
python scripts/finetune.py \
  --config configs/finetune_g1_pickplace.yaml

# Multi-GPU (2× RTX 4090)
torchrun --nproc_per_node=2 scripts/finetune.py \
  --config configs/finetune_g1_pickplace.yaml

# Xem training logs
tensorboard --logdir ./runs/g1_pickplace/tensorboard

Output trong terminal:

Epoch 1/100 | Step 50/500 | Loss: 0.342 | Action loss: 0.298 | LR: 8.3e-5
Epoch 1/100 | Step 100/500 | Loss: 0.241 | Action loss: 0.198 | LR: 1.0e-4
...
[Eval] Step 500 | Val loss: 0.187 | Saved checkpoint: ./runs/g1_pickplace/checkpoint_500/

Theo dõi training — khi nào stop?

Loss curve bình thường:

Train loss:   0.35 → 0.20 → 0.12 → 0.09 → 0.07 (giảm đều)
Val loss:     0.33 → 0.19 → 0.13 → 0.10 → 0.08 (theo sát train)

Dấu hiệu overfitting — stop sớm:

Train loss:   0.35 → 0.15 → 0.08 → 0.04 → 0.02 (giảm rất nhanh)
Val loss:     0.33 → 0.18 → 0.16 → 0.18 → 0.22 (bắt đầu tăng lại)

Khi val loss tăng 3 eval liên tiếp → dùng checkpoint trước đó (early stopping).

Adapt config cho robot khác

Thay robot: "g1" và action_keys theo robot của bạn:

# Ví dụ cho GR1 (Fourier Intelligence)
dataset:
  robot: "gr1"
  cameras:
    - "observation.images.left_wrist"
    - "observation.images.right_wrist"
  action_keys:
    - "action/left_ee_pose"
    - "action/right_ee_pose"
    - "action/gripper"

# Ví dụ cho robot arm đơn (không phải humanoid)
dataset:
  robot: "franka"
  cameras:
    - "observation.images.wrist"
    - "observation.images.overhead"
  action_keys:
    - "action/ee_pose"     # 7-dim
    - "action/gripper"     # 1-dim

GR00T N1 hỗ trợ variable action dimension — chỉ cần khai báo đúng action_keys.

Evaluate checkpoint trong sim

Trước khi deploy lên robot thật, luôn evaluate trong sim:

# Rollout checkpoint trong Isaac Lab
python scripts/evaluate_checkpoint.py \
  --checkpoint ./runs/g1_pickplace/checkpoint_best/ \
  --robot g1 \
  --task PickPlace \
  --num_episodes 20 \
  --render         # hiện Isaac Sim GUI

# Output:
# Episode 0: SUCCESS (12.3s)
# Episode 1: SUCCESS (11.8s)
# Episode 2: FAIL — gripper missed object at step 45
# ...
# Success rate: 17/20 = 85%

Threshold trước khi deploy real: ≥ 80% success rate trong sim cho task đơn giản (pick-place). Tasks phức tạp hơn có thể thấp hơn nhưng phải có trend cải thiện.

Troubleshooting thường gặp

OOM (Out of Memory):

# Giảm batch size
training:
  batch_size: 4   # từ 8 xuống 4

# Hoặc bật gradient checkpointing
  gradient_checkpointing: true

# Hoặc freeze vision encoder
model:
  freeze_vision_encoder: true

Loss không giảm sau 20 epochs:

# Thử tăng learning rate
training:
  learning_rate: 3.0e-4   # từ 1e-4 lên 3e-4

# Hoặc unfreeze vision encoder
model:
  freeze_vision_encoder: false

Best checkpoint: checkpoint_best vs checkpoint_last

Script tự động lưu checkpoint_best (val loss thấp nhất) và checkpoint_last (epoch cuối).

ls ./runs/g1_pickplace/
# checkpoint_500/   checkpoint_1000/  checkpoint_best/  checkpoint_last/

# Dùng checkpoint_best cho deploy
python scripts/evaluate_checkpoint.py \
  --checkpoint ./runs/g1_pickplace/checkpoint_best/

Luôn dùng checkpoint_best — không phải checkpoint_last. Training thêm không phải lúc nào cũng tốt hơn.

Bài tiếp theo: Deploy GR00T-WBC trên G1 thật — GEAR + SONIC.

GR00T N1 + G1 (Bài 3): fine-tune GR00T N1 — GPU, config, training script

GR00T N1 + G1 (Bài 3): fine-tune GR00T N1 — GPU, config, training script

GPU requirements thực tế

Cài đặt môi trường training

Config file

Chạy training

Theo dõi training — khi nào stop?

Adapt config cho robot khác

Evaluate checkpoint trong sim

Troubleshooting thường gặp

Best checkpoint: checkpoint_best vs checkpoint_last

Nguồn tham khảo

Bài viết liên quan

Nguyễn Anh Tuấn

Bài viết liên quan

GR00T N1 + G1 (Bài 5): sim2real transfer, domain randomization, và eval với humanoid-bench

GR00T N1 + G1 (Bài 2): thu data trong Isaac Lab và xr_teleoperate → LeRobot

GR00T N1 + Unitree G1: kiến trúc WBC+VLA decoupled từ 6Hz đến 500Hz

GR00T N1 + G1 (Bài 3): fine-tune GR00T N1 — GPU, config, training script

GR00T N1 + G1 (Bài 3): fine-tune GR00T N1 — GPU, config, training script

GPU requirements thực tế

Cài đặt môi trường training

Config file

Chạy training

Theo dõi training — khi nào stop?

Adapt config cho robot khác

Evaluate checkpoint trong sim

Troubleshooting thường gặp

Best checkpoint: checkpoint_best vs checkpoint_last

Nguồn tham khảo

Bài viết liên quan

Nguyễn Anh Tuấn

Bài viết liên quan

GR00T N1 + G1 (Bài 5): sim2real transfer, domain randomization, và eval với humanoid-bench

GR00T N1 + G1 (Bài 2): thu data trong Isaac Lab và xr_teleoperate → LeRobot

GR00T N1 + Unitree G1: kiến trúc WBC+VLA decoupled từ 6Hz đến 500Hz