wholebody-vlagr00tvisual-sim2realunitree-g1isaac-labvlasim2realhumanoidrl

Run GR00T-VisualSim2Real on G1

Train VIRAL and DoorMan for Unitree G1 in Isaac Lab: setup, teacher-student training, DAgger, GRPO, inference, and sim-to-real.

Nguyễn Anh TuấnJune 7, 202614 min read
Run GR00T-VisualSim2Real on G1

GR00T-VisualSim2Real is NVIDIA's open-source repository for two important humanoid robotics projects: VIRAL and DoorMan. Both target the same practical question: can we train a policy entirely in simulation, use RGB cameras plus proprioception, and transfer it zero-shot to a real Unitree G1 without collecting more real-world fine-tuning data?

This guide is written so a beginner can follow the workflow, but the workload itself is not beginner-sized. VIRAL and DoorMan are whole-body visual loco-manipulation systems. The robot must balance, walk, position its body, use dexterous hands, interact with objects or doors, and recover from small closed-loop errors. The VIRAL paper reports that reliable teacher and student training depends heavily on compute scale, with experiments scaling to tens of GPUs and up to 64 GPUs. DoorMan also reports multi-GPU phases for student distillation and GRPO. A realistic goal for a small lab is therefore: install the stack, run smoke tests, train a reduced teacher/student setup, evaluate in Isaac Lab, export ONNX, and only deploy on real hardware after camera alignment, safety, and whole-body control are solid.

If you are new to robot VLA and policy training, read fine-tuning GR00T N1, the WholeBodyVLA open-source guide, and humanoid sim-to-real transfer first. GR00T-VisualSim2Real sits at the intersection of those topics: large-scale simulation, privileged RL teachers, vision students, domain randomization, and deployment on a real humanoid.

Original papers and repository

Official repository: NVlabs/GR00T-VisualSim2Real. The README states that the repository contains application code for VIRAL and DoorMan, built on Isaac Lab/Isaac Sim 5.1 with TRL and Hydra. It supports PPO teacher training, DAgger student distillation, evaluation, and ONNX export.

The two original papers:

Project Paper Main task Robot Key result
VIRAL Visual Sim-to-Real at Scale for Humanoid Loco-Manipulation Walk, stand, grasp, drop, turn, repeat Unitree G1 RGB policy runs zero-shot for up to 54 loco-manipulation cycles
DoorMan Opening the Sim-to-Real Door for Humanoid Pixel-to-Action Policy Transfer Open diverse doors from RGB Humanoid/Unitree G1-style stack Teacher-student-bootstrap with GRPO, up to 31.7% faster than human teleoperators

The core idea is simple but powerful. A teacher policy is trained in simulation with privileged state: full robot state, object state, door state, contact information, target poses, or other values that are easy in simulation but unavailable on the real robot. That teacher may be strong, but it is not directly deployable. A student policy then learns to imitate the teacher using only deployment-like observations, mainly RGB camera input and proprioception. In DoorMan, NVIDIA adds a GRPO fine-tuning stage after DAgger so the student becomes more consistent under partial observability during a long articulated-object task.

Minimal pipeline:

Isaac Lab simulation
        |
        v
Privileged teacher (PPO)
  obs: full state, object/door state, targets, contacts
  action: whole-body delta action / joint targets
        |
        | rollouts + teacher actions
        v
Vision student (DAgger + BC)
  obs: RGB camera + proprioception + action history
  action: same low-level action space
        |
        | optional for DoorMan
        v
Student bootstrap / GRPO fine-tuning
  reward: mostly task success, closed-loop consistency
        |
        v
Evaluation -> ONNX export -> real robot deployment

VIRAL versus DoorMan

VIRAL is the broader framework for humanoid loco-manipulation. Its project page shows Unitree G1 walking to a table, standing, grasping an object, dropping or turning, and repeating the behavior across many cycles. The paper emphasizes three main technical groups: delta action space and reference state initialization for the teacher, tiled rendering and online DAgger for the student, and visual domain randomization plus real-to-sim alignment for cameras and dexterous hands.

DoorMan is narrower but physically harder. Opening a door is not just "reach the handle." The robot must estimate distance from RGB, move the base to the right pose, grasp the handle, pull or push along the hinge dynamics, maintain balance under changing contact forces, and adjust when the door state is only partially visible. DoorMan therefore adds staged-reset exploration for long-horizon teacher training and GRPO fine-tuning for the student. The DoorMan project page reports a three-stage training budget: PPO teacher on 1 L40s GPU for about 6 hours, DAgger student distillation on 32 L40s GPUs for about 24 hours, and GRPO on 64 L40s GPUs for about 12 hours. Treat those as sizing references, not as mandatory settings for a small smoke test.

Policy architecture

The main code lives under gr00t/rl/. The important pieces are:

Path Role
train_agent_trl.py Training entry point for both teacher and student
eval_agent_trl.py Evaluates checkpoints and exports ONNX when num_envs=1
config/exp/loco_manip/ Hydra experiment configs for loco-manipulation tasks
config/robot/g1/ Unitree G1 robot configuration, including the G1 43-DOF model
config/obs/ Observation groups: privileged observations, RGB, proprioception
config/domain_rand/ Visual and physics randomization
envs/loco_manip/ Task implementations
trl/trainer/ PPO and distillation trainers

A common beginner mistake is treating GR00T-VisualSim2Real like VLM fine-tuning on a static text-image dataset. It is not. This is closed-loop robot policy training. The simulator produces a rollout, the policy outputs actions, those actions move the robot, and the next state depends on the policy's previous mistakes. If observation timing, camera intrinsics, action scaling, or hand dynamics differ between simulation and the real robot, the policy can fail even when simulated reward looks strong.

The teacher learns the hard skill first with privileged observations. The student does not explore from scratch. It learns, "given this RGB frame and this proprioceptive state, what would the teacher do?" DAgger is more robust than offline behavior cloning because the student runs in the loop; when it drifts away from the teacher's distribution, the teacher can still provide corrective actions. That matters for humanoids because a few centimeters of base or wrist error can break a grasp.

Hardware and software requirements

The repository README lists the baseline environment:

Item Recommendation
OS Ubuntu 22.04
NVIDIA driver >= 535
Python 3.11
PyTorch 2.7.0 with CUDA 12.8 wheels
Isaac Sim 5.1
Isaac Lab Version compatible with Isaac Sim 5.1
GPU At least one strong NVIDIA GPU for smoke tests; many GPUs for serious training
Tracking Weights & Biases by default

If you only have a single RTX 4090 or one L40s, start with small num_envs, run headless, evaluate often, and do not expect to reproduce the full paper. Visual policies consume VRAM through the simulator, camera rendering, rollout batches, and the network. For student training, the README's num_envs=8 debug command is a much more realistic starting point than the teacher example with num_envs=48.

Installation

The commands below closely follow the official README. Use a dedicated conda or mamba environment because Isaac Sim, PyTorch, and NumPy version conflicts are common.

conda create -n viral python=3.11 -y
conda activate viral

pip install torch==2.7.0 torchvision==0.22.0 \
  --index-url https://download.pytorch.org/whl/cu128

pip install isaacsim==5.1.0.0 isaacsim-rl==5.1.0.0

Install Isaac Lab from source:

pip install setuptools poetry-core flatdict

pip install --no-build-isolation -e <path-to-IsaacLab>/source/isaaclab
pip install --no-build-isolation -e <path-to-IsaacLab>/source/isaaclab_assets \
  -e <path-to-IsaacLab>/source/isaaclab_tasks \
  -e "<path-to-IsaacLab>/source/isaaclab_rl[all]"

pip install numpy==1.26.0
python -c "import isaaclab; print(isaaclab.__file__)"

Install GR00T-VisualSim2Real:

cd <path-to-GR00T-VisualSim2Real>
pip install -e .
pip install numpy==1.26.0

python -c "from gr00t.rl.envs.base_task.base_task import BaseTask; print('OK')"

If import isaaclab fails, the usual causes are a mismatched Isaac Sim version, the wrong Python environment, or NumPy being upgraded by another dependency. The repository README explicitly pins numpy==1.26.0 again after installing the package.

Train a VIRAL teacher with PPO

The teacher uses full state in simulation. The README gives this command:

HYDRA_FULL_ERROR=1 accelerate launch --num_processes 1 \
  gr00t/rl/train_agent_trl.py \
  +exp=loco_manip/walk_stand_place_grasp_turn_homie \
  num_envs=48 \
  project_name=wsdpt_teacher

Key arguments:

Argument Meaning
+exp=... Selects the Hydra experiment config
num_envs=48 Number of parallel environments; faster but more VRAM
project_name Weights & Biases project name
headless=True/False Run without GUI or open Isaac Sim for inspection
env.config.reset_from_dataset.enable Enables reset from demonstration data if the config supports it

For a first run, reduce the scale:

HYDRA_FULL_ERROR=1 accelerate launch --num_processes 1 \
  gr00t/rl/train_agent_trl.py \
  +exp=loco_manip/walk_stand_place_grasp_turn_homie \
  num_envs=8 \
  headless=True \
  experiment_name=teacher_smoke_g1 \
  project_name=viral_debug

You do not need a good reward curve immediately. A smoke test only needs to prove that Isaac Lab launches, the environment resets, GPU memory is stable, checkpoints are written under logs_rl/<experiment_name>/, and metrics reach W&B. After that, increase num_envs, tune reward scales, or open the GUI to debug posture and contact behavior.

Evaluate the teacher

Once you have a checkpoint:

python gr00t/rl/eval_agent_trl.py \
  +checkpoint=logs_rl/<teacher_experiment>/model_step_044500.pt

Evaluation checklist:

Symptom Interpretation
Robot stands but does not reach Locomotion is acceptable, manipulation reward or target observations may be wrong
Reaches correctly but fails to grasp Hand SysID, finger action scale, or contact model may be off
Sim success but jerky actions Action smoothing, delta action scale, latency, or randomization need work
GUI looks good but headless metrics are bad Seed, environment count, or Hydra override changed

Train a VIRAL student with DAgger

The student needs a trained teacher checkpoint. The README asks you to set teacher_actor_path inside the student experiment config:

# gr00t/rl/config/exp/loco_manip/wsdpt_student_for_teacher_v8q8.002_resnet_rgb_delay.yaml
teacher_actor_path: logs_rl/<your_teacher_experiment>/model_step_XXXXXX.pt

Then launch training:

HYDRA_FULL_ERROR=1 accelerate launch --num_processes 1 \
  gr00t/rl/train_agent_trl.py \
  +exp=loco_manip/wsdpt_student_for_teacher_v8q8.002_resnet_rgb_delay \
  num_envs=8 \
  headless=True \
  experiment_name=wsdpt_student \
  project_name=wsdpt_student_debug

The config name tells you something important: the student uses ResNet RGB input and delay randomization. On a real robot, images do not arrive at the policy at exactly the same time as joint states. Camera exposure, networking, preprocessing, and inference all add latency. If simulation does not randomize delay, the student can learn a clean timing assumption that breaks during deployment.

Evaluate the student:

python gr00t/rl/eval_agent_trl.py \
  +checkpoint=logs_rl/<student_experiment>/model_step_XXXXXX.pt

When you evaluate with num_envs=1, the repository exports ONNX automatically:

python gr00t/rl/eval_agent_trl.py \
  +checkpoint=logs_rl/<student_experiment>/model_step_XXXXXX.pt \
  num_envs=1

The export is saved under:

logs_rl/<student_experiment>/exported/

DoorMan: training a door-opening policy

DoorMan uses the same teacher-student philosophy, but the task is articulated loco-manipulation. Its project page describes randomization over mass, handle type, hinge damping, stiffness, texture, and background. That is a major difference from simple pick-and-place: the door physics determine reaction forces on the robot, and a small grasping error can destabilize the base.

DoorMan pipeline:

Phase 1: PPO privileged teacher
  - staged reset so the teacher does not explore the full horizon from scratch
  - privileged obs include door state, handle pose, hinge state

Phase 2: DAgger vision student
  - RGB-only perception + proprioception
  - learns teacher actions in closed loop

Phase 3: GRPO bootstrap
  - fine-tunes the student with task-success rewards
  - improves consistency when observations are missing or delayed

Because the public README does not list a dedicated DoorMan command the way it lists the VIRAL commands, the practical workflow is to locate the door experiment configs in the version of the repository you have:

find gr00t/rl/config/exp -iname "*door*" -o -iname "*doorman*"

Then run the same training entry point:

HYDRA_FULL_ERROR=1 accelerate launch --num_processes 1 \
  gr00t/rl/train_agent_trl.py \
  +exp=<door_or_doorman_teacher_exp> \
  num_envs=8 \
  headless=True \
  experiment_name=doorman_teacher_smoke \
  project_name=doorman_debug

For DoorMan, do not skip staged reset. If the robot starts far from the door, does not know where the handle is, does not know the hinge response, and receives sparse success reward, PPO can spend a long time discovering useful trajectories. Staged reset breaks the horizon into easier subproblems: near the handle, already grasped, actively pulling, near completion. Once the teacher is stable, the student can learn from diverse rollouts.

Inference and deployment on Unitree G1

Real deployment needs two layers: policy inference and whole-body control. The GR00T-WholeBodyControl documentation describes a ZMQ-based PolicyServer pipeline: the inference client reads camera/state, queries the policy server, and publishes actions to a C++ deploy stack. With GR00T-VisualSim2Real, if you export a student policy to ONNX, the principle is the same: the policy must receive the correct RGB/proprioception tensors, output the action space expected by the controller, and run at a stable control cadence.

Deployment diagram:

Camera server on G1  --->  inference client  --->  action publisher
Joint/IMU state      --->        |                    |
                                 v                    v
                         policy checkpoint       C++ whole-body control
                              or ONNX                  |
                                                       v
                                                Unitree G1 actuators

If you use the GR00T-WholeBodyControl PolicyServer, the manual setup looks like:

# GPU machine
uv run python gr00t/eval/run_gr00t_server.py \
  --model-path /path/to/your/finetuned_model \
  --embodiment-tag UNITREE_G1_SONIC \
  --device cuda:0 \
  --port 5550

Inference client:

source .venv_inference/bin/activate
python gear_sonic/scripts/run_vla_inference.py \
  --host <policy_server_ip> \
  --port 5550 \
  --embodiment-tag unitree_g1_sonic \
  --prompt "open the door" \
  --camera-host 192.168.123.164

For a DoorMan or VIRAL policy that is not a language-conditioned GR00T N1 model, you may not need a prompt; however, the interface principles remain the same: action publish rate, action horizon, camera host, latency compensation, and initial pose must match the controller. The inference documentation lists defaults such as 50 Hz action publishing, 2.5 Hz inference rate, and action horizon 40 for the VLA stack. For a custom ONNX policy, measure end-to-end latency rather than only model forward time.

Paper results and how to interpret them

VIRAL reports an RGB-based policy on Unitree G1 performing continuous loco-manipulation for up to 54 cycles, with generalization across tray position, cylinder position, robot position, table height, lighting, table cloth color, table type, and object variety. The project page also shows a realistic development timeline: many failures before stable grasping, then walk-stand-grasp, and finally long repeated cycles.

DoorMan reports a policy trained entirely in simulation with pure RGB perception, transferring zero-shot across diverse doors, handles, textures, and locations. The paper and project page report performance up to 31.7% faster than human teleoperators in completion time, and the GRPO fine-tuning stage improves success rate by roughly 20-30%.

Those numbers do not mean cloning the repository will reproduce the paper immediately. They depend on compute scale, randomization quality, camera alignment, hand SysID, the controller, safety setup, and hardware calibration. For a small lab, define milestones instead:

Stage Goal
1 Import the package, launch Isaac Lab, reset the G1 task
2 Run a teacher smoke test and avoid NaN checkpoints
3 Evaluate the teacher and inspect reasonable behavior
4 Distill a small RGB student
5 Export ONNX and verify tensor I/O
6 Run sim-to-sim with latency and camera randomization
7 Perform real-robot dry runs with no payload and E-stop
8 Attempt the real task with speed limits and a safety spotter

Troubleshooting

Problem Common cause Fix
import isaaclab fails Isaac Lab is not installed editable or the wrong env is active Activate the right conda env and reinstall source packages
Isaac Sim crashes when cameras are enabled Not enough VRAM or incompatible driver Reduce num_envs, run headless, update driver
Training becomes NaN Reward/action scale too large, unstable contacts Lower learning rate, check action clipping
Student does not learn Wrong teacher path or RGB observation mismatch Print Hydra config and verify teacher_actor_path
Sim works but real robot fails Camera FOV, delay, or hand SysID mismatch Align camera, randomize delay, remeasure finger response
Door task does not progress Reward is too sparse Use staged reset and curriculum

Conclusion

GR00T-VisualSim2Real is important because it turns visual sim-to-real humanoid manipulation from isolated demos into a reusable research pipeline: privileged teachers, vision students, large-scale rendering, domain randomization, real-to-sim alignment, and export for deployment. VIRAL shows that Unitree G1 can execute repeated RGB-based loco-manipulation. DoorMan shows that articulated objects such as doors can also be handled by policies trained in simulation.

If you start today, do not start with the real robot. Start with a teacher smoke test in Isaac Lab, then a small student distillation run, then sim-to-sim evaluation with latency and camera randomization. Once the policy survives those steps, deployment on Unitree G1 becomes an engineering process rather than a guess.

References

NT

Nguyễn Anh Tuấn

Robotics & AI Engineer. Building VnRobo — sharing knowledge about robot learning, VLA models, and automation.

Khám phá VnRobo

Related Posts

WholebodyVLA Open-Source: Hướng Dẫn Kiến Trúc & Code
wholebody-vla

WholebodyVLA Open-Source: Hướng Dẫn Kiến Trúc & Code

4/12/202619 min read
NT
Hướng dẫn fine-tune NVIDIA GR00T N1
wholebody-vla

Hướng dẫn fine-tune NVIDIA GR00T N1

4/12/202612 min read
NT
Làm synthetic data cho GR00T VLA
wholebody-vla

Làm synthetic data cho GR00T VLA

6/6/202614 min read
NT