wholebody-vlaviralgr00t-visualsim2realunitree-g1isaac-simisaac-labdaggerpposim2realwholebody-vlahumanoid

VIRAL: RGB Sim2Real for G1 Loco-Manip

Build VIRAL with Isaac Sim 5.1: PPO teacher, RGB DAgger student, Hydra fields, ONNX export, and EgoHumanoid comparison.

Nguyễn Anh TuấnJune 11, 202616 min read
VIRAL: RGB Sim2Real for G1 Loco-Manip

Why VIRAL is the fourth stack

In Part 1 on OpenWBT, we started with debuggable whole-body teleoperation in MuJoCo and Isaac before touching real hardware. In Part 2 on TWIST2, the focus moved to direct robot data collection with PICO teleoperation, a Redis bus, and a low-level controller. In Part 3 on EgoHumanoid, the question became data scale: can egocentric human demonstrations, plus a limited amount of robot data, co-train a VLA policy for a G1 humanoid?

VIRAL takes a different route. The NVlabs/GR00T-VisualSim2Real repository describes VIRAL as a visual sim-to-real framework for humanoid loco-manipulation on the Unitree G1. The paper VIRAL: Visual Sim-to-Real at Scale for Humanoid Loco-Manipulation emphasizes that the robot learns entirely in simulation and then deploys zero-shot to real hardware from RGB and proprioception. In short: EgoHumanoid uses human egocentric data to add real-world diversity, while VIRAL tries to make simulation rich enough for an RGB student policy to transfer to the real G1.

This post does not try to reproduce the full paper at multi-GPU scale. The practical goal is narrower: set up NVlabs/GR00T-VisualSim2Real, understand the teacher-student workflow, run the PPO teacher through gr00t/rl/train_agent_trl.py +exp=loco_manip/walk_stand_place_grasp_turn_homie, distill an RGB DAgger student through wsdpt_student_for_teacher_v8q8.002_resnet_rgb_delay.yaml, read important Hydra fields such as teacher_actor_path, num_envs, and env.config.reset_from_dataset.enable, then export ONNX with gr00t/rl/eval_agent_trl.py num_envs=1.

For broader context outside this series, also read Running GR00T-VisualSim2Real for G1 and the WholeBodyVLA open-source guide. This post focuses on one concrete path: VIRAL's walk_stand_place_grasp_turn_homie task.

Series roadmap

  1. OpenWBT: G1 Teleop in MuJoCo/Isaac: build the environment, verify ONNX policies, and understand lower-body joystick plus upper-body IK.
  2. TWIST2: PICO Teleop and G1 Sim2Real: use PICO teleoperation, Redis, and a low-level controller to collect direct robot data.
  3. EgoHumanoid: Human Demos to G1 VLA: turn egocentric human demos into robot-ready data through view and action alignment.
  4. VIRAL: RGB Sim2Real for G1 Loco-Manip: train a privileged teacher in simulation, distill an RGB student, randomize visuals, and export the policy.
  5. FromW1: Moving Skills onto Real Hardware: handle latency, contacts, and actuators when moving from sim to hardware.
  6. CLONE: Closed-Loop Whole-Body Teleop: treat closed-loop teleoperation as a long-horizon data stack.

Technical references to keep open

Source Why it matters Detail to remember
GR00T-VisualSim2Real README Install, teacher training, student training, evaluation, ONNX export The repository uses Isaac Sim 5.1, Isaac Lab, TRL, and Hydra
VIRAL paper Understand the teacher-student design and sim-to-real recipe The teacher has privileged full state; the student uses RGB; domain randomization and camera/hand alignment matter
VIRAL project page Inspect tasks, failure cases, and generalization The page shows variation across tray position, objects, table height, lighting, and 54 loco-manipulation cycles
Student config YAML Read teacher_actor_path, cameras, DAgger, and ResNet RGB delay This is the distillation config for a student trained from a teacher checkpoint
EgoHumanoid paper Compare against a human-egocentric-data pipeline EgoHumanoid co-trains human and robot data through view/action alignment, not simulation alone

VIRAL visual sim-to-real pipeline
VIRAL visual sim-to-real pipeline

Mental model: the teacher sees full state, the student sees RGB

The hardest part of VIRAL is not the training command. It is the split into two policies:

Isaac Sim 5.1 / Isaac Lab
  G1 robot, objects, table, tray, contacts, task stage
        |
        v
Privileged PPO teacher
  obs: full state, object pose, hand-object transform, target, contact-like signals
  action: homie command + right arm + finger primitive
        |
        | rollouts + teacher action labels
        v
RGB DAgger student
  obs: minimal proprioception + delayed RGB image
  backbone: ResNet18 vision encoder + MLP
  action: same action space as the teacher
        |
        v
eval_agent_trl.py num_envs=1
  checkpoint -> exported ONNX
        |
        v
G1 deployment stack

The teacher is allowed to "cheat" in the training sense: it can use information that the real robot will not directly measure at runtime, such as object position, hand-object transforms, target place/lift positions, and task stage. That makes the long-horizon PPO problem easier. But such a teacher is not a deployable visual policy. The student is the deployable policy: it receives observations closer to the real system, mainly RGB camera input and proprioception.

DAgger matters because the student is not only trained on a static set of clean teacher states. The student runs in closed loop inside simulation; when it drifts away from the teacher's ideal trajectory, the teacher can still provide the corrective action at that new state. For humanoid loco-manipulation, this difference is large. A few centimeters of base error can change the camera view, move the wrist out of reach, and make an offline behavior cloning policy leave its training distribution. Online DAgger produces more "student is slightly wrong but still recoverable" states, which is exactly what the real robot needs.

Step 1: install the VIRAL environment

The official README expects Ubuntu 22.04, an NVIDIA driver at least 535, Conda or Mamba, Isaac Sim 5.1, and Isaac Lab. The repository uses Python 3.11, PyTorch 2.7.0 with CUDA 12.8 wheels, then installs Isaac Sim through pip:

conda create -n viral python=3.11 -y
conda activate viral

pip install torch==2.7.0 torchvision==0.22.0 \
  --index-url https://download.pytorch.org/whl/cu128

pip install isaacsim==5.1.0.0 isaacsim-rl==5.1.0.0

Install Isaac Lab from source, then install the repository:

pip install setuptools poetry-core flatdict

pip install --no-build-isolation -e <path-to-IsaacLab>/source/isaaclab
pip install --no-build-isolation -e <path-to-IsaacLab>/source/isaaclab_assets \
  -e <path-to-IsaacLab>/source/isaaclab_tasks \
  -e "<path-to-IsaacLab>/source/isaaclab_rl[all]"

pip install numpy==1.26.0

cd <path-to-GR00T-VisualSim2Real>
pip install -e .
pip install numpy==1.26.0

Run two smoke checks before training:

python -c "import isaaclab; print(isaaclab.__file__)"
python -c "from gr00t.rl.envs.base_task.base_task import BaseTask; print('OK')"

If the second command fails, do not start editing YAML yet. The issue is usually the editable install, the Isaac Lab path, or a numpy/Python version conflict. With Isaac Sim and Isaac Lab, a small version mismatch can produce a very noisy error. Lock the environment first; tune num_envs later.

Step 2: train the PPO teacher

The teacher path in this post uses the experiment below:

HYDRA_FULL_ERROR=1 accelerate launch --num_processes 1 \
  gr00t/rl/train_agent_trl.py \
  +exp=loco_manip/walk_stand_place_grasp_turn_homie \
  num_envs=48 \
  project_name=wsdpt_teacher

The walk_stand_place_grasp_turn_homie.yaml experiment composes several Hydra config groups:

Config group Main value Meaning
/algo ppo Train the teacher with PPO
/env walk_stand_place_grasp_turn_homie The walk, stand, place, grasp, turn task
/simulator isaacsim Isaac Sim backend
/robot g1/g1_43dof Unitree G1 43-DOF robot config
/obs obs_walk_stand_place_grasp_turn_homie Rich observation set for the teacher
/rewards reward_wsdpt_butterflyV8_q_2_teacher Reward shaping for the WSDPT task
/trainer trl_homie_api Trainer wrapper used by the repository

The YAML itself may set num_envs to 2048 for serious training. The README example uses num_envs=48. The important point for beginners is that the command-line value overrides the experiment default through Hydra. If your GPU only has 24 GB of memory, start much lower:

HYDRA_FULL_ERROR=1 accelerate launch --num_processes 1 \
  gr00t/rl/train_agent_trl.py \
  +exp=loco_manip/walk_stand_place_grasp_turn_homie \
  num_envs=8 \
  headless=True \
  project_name=wsdpt_teacher_smoke \
  experiment_name=wspgt_teacher_smoke

num_envs is the number of parallel simulation environments. Raising it collects rollouts faster, but memory use grows with robots, objects, contact sensors, observation buffers, and camera rendering when cameras are enabled. The teacher does not use RGB rendering, so it is usually lighter than the student. Still, this loco-manipulation task is heavy because it includes the G1 model, a table, objects, a tray, hand primitives, and many termination conditions.

To visually inspect the simulation, add:

headless=False

Use the GUI as a debugging tool, not as the default long-run mode. It is useful for checking whether the robot spawns correctly, whether objects are in reasonable positions, and whether the task stage changes as expected. For real training, run headless.

Step 3: understand env.config.reset_from_dataset.enable

The environment config includes an important reset block:

env:
  config:
    reset_from_dataset:
      enable: True
      use_motion_file_dir: True
      motion_file_dir: "gr00t/rl/data/motions/g1_wsdpt/33demos_675_775"
      num_per_sample: 10
      sample_interval_s: 0.1
      resample_every: 1000

This tells the simulator that episodes can be reset from a motion/demo dataset instead of always starting from the same initial state. For a long task such as walk-stand-place-grasp-turn, this acts like reference state initialization. It lets the teacher encounter more stages of the task: walking to the table, preparing to place, transitioning into grasp, and turning after manipulation. If every episode starts from frame zero, PPO may spend a long time before it sees reward from later stages.

For debugging, use the field intentionally:

Goal Override to try Why
Inspect clean environment spawn env.config.reset_from_dataset.enable=False Easier to reason about the initial state
Follow the repository recipe Keep it True Better for learning the long-horizon task
Diagnose dataset path errors Use HYDRA_FULL_ERROR=1 and num_envs=1 The stack trace is easier to read

Do not disable reset_from_dataset only because it looks complex. In long-horizon whole-body tasks, reset and curriculum logic often matter as much as reward design. If the dataset path is missing or the demos are in the wrong location, fix the data path first instead of silently training a different task.

Step 4: evaluate the teacher checkpoint

Once you have a checkpoint, evaluate it with:

python gr00t/rl/eval_agent_trl.py \
  +checkpoint=logs_rl/<experiment_dir>/model_step_044500.pt

At this point, you are asking three practical questions:

Question Good sign Bad sign
Is the robot walking stably? No knee contact, no strong drift, no fall during stage changes Termination from knee contact, low height, or gravity
Does the right arm approach the object? The wrist reaches the object region before the finger primitive closes The arm swings too fast or knocks the object away
Does the task stage progress? Walk, stand, place, grasp, turn happen in order The robot stays in one stage or resets early

The teacher must be good enough before training the student. If the teacher cannot grasp reliably in simulation, the RGB student will not fix that. The student is learning to imitate the teacher under harder observations; it is not a magic upgrade from a weak policy.

Step 5: distill the RGB student with DAgger

The student uses this experiment config:

gr00t/rl/config/exp/loco_manip/wsdpt_student_for_teacher_v8q8.002_resnet_rgb_delay.yaml

The README workflow is to set teacher_actor_path, then launch training:

teacher_actor_path: logs_rl/<your_teacher_experiment>/model_step_XXXXXX.pt
HYDRA_FULL_ERROR=1 accelerate launch --num_processes 1 \
  gr00t/rl/train_agent_trl.py \
  +exp=loco_manip/wsdpt_student_for_teacher_v8q8.002_resnet_rgb_delay \
  num_envs=8 \
  headless=True \
  experiment_name=wsdpt_student \
  project_name=wsdpt_student_debug

The original student YAML may set num_envs: 1024, enable RGB cameras, use camera resolution [108, 192], enable RGB and disable depth. The README uses num_envs=8 as a debugging run. The student is heavier than the teacher because every environment needs rendered images. If VRAM usage jumps compared with teacher training, that is expected.

Important fields:

Field Location Practical meaning
teacher_actor_path top-level student YAML Path to the PPO teacher checkpoint, loaded by network_load_dict.teacher_actor.path
num_envs top-level or command line Number of parallel environments; start small for RGB students
enable_cameras top-level and simulator config Required for the vision student
simulator.config.cameras.camera_resolutions student YAML RGB image size, here 108 x 192
obs.rgb_image_delay_step RGB-delay obs config Selects latest or delayed RGB frames
algo.config.use_dagger DAgger algo config Enables DAgger/BC-style training from the teacher
algo.config.enforce_teacher_rollout student YAML Forces teacher rollout logic during distillation
algo.config.ratio_teacher_rollout student YAML Controls the teacher rollout ratio
algo.config.network_load_dict.teacher_actor.path student YAML Points to ${teacher_actor_path}
algo.config.actor.backbone.vision_module student YAML ResNet vision encoder, defaulting to pretrained ResNet18

The student's observations are much less privileged than the teacher's. actor_obs includes base angular velocity, projected gravity, previous actions, DOF position/velocity without fingers, delta actions, and homie commands. vision_obs uses rgb_image_delayed. The teacher, meanwhile, still sees object and target state. That is the teacher-student gap: the teacher knows where the object is in state; the student must infer it from pixels.

Step 6: RGB delay and domain randomization

The config name includes rgb_delay because the student should not assume the camera image arrives instantly. On a real robot, camera capture, drivers, preprocessing, policy inference, and actuator commands all introduce latency. The observation config supports:

obs:
  rgb_image_delay_random: False
  rgb_image_delay_resample_on_reset: False
  rgb_image_delay_step: 1
  rgb_image_delay_step_min: 1
  rgb_image_delay_step_max: 5
  history_save_interval: 1

rgb_image_delay_step=1 means the student uses the latest frame in the buffer. To train with randomized delay, you can override:

obs.rgb_image_delay_random=True \
obs.rgb_image_delay_resample_on_reset=True \
obs.rgb_image_delay_step_min=1 \
obs.rgb_image_delay_step_max=5

Do not turn on random delay, heavy visual randomization, and aggressive camera extrinsic noise all at once in the first run. Layer the difficulty so failures are attributable. A practical order is:

  1. Train an RGB student without random delay at low num_envs.
  2. Add mild camera extrinsics randomization.
  3. Add image, material, and lighting randomization according to the recipe.
  4. Add randomized RGB delay if the real rollout stack has noticeable latency.

The VIRAL paper emphasizes large-scale visual domain randomization across lighting, materials, camera parameters, image quality, and sensor delays. For a smaller lab, randomization that is too strong too early can prevent the student from learning at all. Keep a clean baseline run for comparison.

Step 7: export ONNX

The README states that evaluation with num_envs=1 automatically exports the policy as ONNX:

python gr00t/rl/eval_agent_trl.py \
  +checkpoint=logs_rl/<student_experiment_dir>/model_step_XXXXXX.pt \
  num_envs=1

The model is written under:

<experiment_dir>/exported/

Why does num_envs=1 matter? Real deployment does not run 1024 robots in parallel. Export should produce a graph with input and output shapes suitable for one robot. If you export with an unexpected batch shape or observation layout, the ONNX file may exist but fail in the runtime deployment path. Before moving the ONNX file forward, check:

Check How
Correct observation inputs Print actor obs and vision obs shapes during eval
Correct camera feed Run one short episode with headless=False and inspect the RGB frame
Correct action dimension The student config uses robot.actions_dim: 31 for 15 + 14 + 2 according to repo comments
Export artifact exists Inspect the exported/ folder after eval
No privileged dependency Confirm the actor uses actor_obs + vision_obs, not teacher_obs

VIRAL versus EgoHumanoid

Both stacks target visual policy learning for G1 loco-manipulation, but they assume different data sources:

Axis VIRAL EgoHumanoid
Main data source Large-scale simulation in Isaac Sim/Isaac Lab Egocentric human demos plus robot teleoperation data
Learning recipe Privileged RL teacher, RGB DAgger student Co-train a VLA policy on aligned human/robot data
Runtime observation RGB plus minimal proprioception Egocentric RGB plus a unified language/action schema
Embodiment gap handling Real-to-sim camera/hand alignment and visual domain randomization View alignment and action alignment between humans and robots
Strength Does not require large real human/robot demo collection; simulator is controllable Human data provides real scene diversity and helps generalization outside the lab
Weakness Requires a high-quality simulator, significant compute, and careful randomization Human-to-robot alignment is difficult and the data pipeline is long
Best use case A lab with strong Isaac Sim compute and limited robot time A lab that can collect diverse human demos and a smaller set of good robot demos

A common misunderstanding is that VIRAL is "data-light." It is not. It replaces real-world data collection with simulation compute, curriculum, and randomization. EgoHumanoid is not simply "no simulation" either. It shifts the burden to data alignment so human video and human motion can become robot-compatible. For a practical lab, choose based on constraints:

Lab condition Stack to prioritize
Strong GPU workstation, limited robot time VIRAL smoke tests and small-scale teacher/student training
PICO/ZED setup and many people able to collect demos in diverse places EgoHumanoid-style data pipeline
Need a real robot demo quickly OpenWBT or TWIST2 first, VIRAL/EgoHumanoid later
Serious RGB sim-to-real research goal VIRAL, because its domain randomization and teacher-student design directly target that question

Common beginner mistakes

Training the student before the teacher is good enough. This is the most expensive mistake. An RGB student cannot magically exceed a poor teacher on this task. Evaluate the teacher with both GUI inspection and metrics first.

Setting num_envs too high. The YAML may use 1024 or 2048 environments, but that is not a universal machine setting. On one GPU, begin with 4, 8, or 16 environments to validate the pipeline.

Misunderstanding Hydra overrides. +exp=... selects the experiment; num_envs=8 overrides a top-level value; obs.rgb_image_delay_random=True overrides a nested field. If you override the wrong path, Hydra may create an unexpected field or fail depending on mode. Keep HYDRA_FULL_ERROR=1 enabled.

Disabling cameras while training the student. The student config needs enable_cameras: true and simulator RGB cameras. If you disable cameras to save memory, you are no longer training the RGB student.

Confusing teacher_actor_path with the student checkpoint. teacher_actor_path must point to a PPO teacher checkpoint. The student checkpoint is passed to +checkpoint=... when evaluating the student.

Lab checklist

# 1. Verify install
python -c "from gr00t.rl.envs.base_task.base_task import BaseTask; print('OK')"

# 2. Teacher smoke test
HYDRA_FULL_ERROR=1 accelerate launch --num_processes 1 \
  gr00t/rl/train_agent_trl.py \
  +exp=loco_manip/walk_stand_place_grasp_turn_homie \
  num_envs=4 \
  headless=False \
  project_name=wsdpt_teacher_smoke \
  experiment_name=wspgt_teacher_gui

# 3. Teacher training, headless
HYDRA_FULL_ERROR=1 accelerate launch --num_processes 1 \
  gr00t/rl/train_agent_trl.py \
  +exp=loco_manip/walk_stand_place_grasp_turn_homie \
  num_envs=48 \
  headless=True \
  project_name=wsdpt_teacher

# 4. Teacher evaluation
python gr00t/rl/eval_agent_trl.py \
  +checkpoint=logs_rl/<teacher_experiment>/model_step_XXXXXX.pt

# 5. Student distillation
HYDRA_FULL_ERROR=1 accelerate launch --num_processes 1 \
  gr00t/rl/train_agent_trl.py \
  +exp=loco_manip/wsdpt_student_for_teacher_v8q8.002_resnet_rgb_delay \
  teacher_actor_path=logs_rl/<teacher_experiment>/model_step_XXXXXX.pt \
  num_envs=8 \
  headless=True \
  experiment_name=wsdpt_student_rgb_delay \
  project_name=wsdpt_student_debug

# 6. Student evaluation + ONNX export
python gr00t/rl/eval_agent_trl.py \
  +checkpoint=logs_rl/<student_experiment>/model_step_XXXXXX.pt \
  num_envs=1

For a beginner, the first success condition is not the real robot picking up an object immediately. The first success condition is: the environment spawns, the teacher checkpoint evaluates, the RGB student trains without crashing, an ONNX artifact appears under exported/, and you understand what each Hydra override changes.

Conclusion

VIRAL is worth studying because it cleanly shows a modern sim-to-real path for humanoids: train a difficult skill with a privileged teacher in Isaac Sim, distill an RGB student through DAgger, bridge sim-to-real with domain randomization and alignment, then export a deployable policy. Compared with EgoHumanoid, it depends less on human egocentric data but requires a stronger simulator, more compute, and stricter debugging discipline.

In this series, VIRAL is the simulation-first visual policy stack. It does not replace OpenWBT, TWIST2, or EgoHumanoid. It adds another route toward whole-body humanoid VLA: first make the simulator strong enough, then force the student to learn from observations that resemble the real robot.

NT

Nguyễn Anh Tuấn

Robotics & AI Engineer. Building VnRobo — sharing knowledge about robot learning, VLA models, and automation.

Khám phá VnRobo

Related Posts

TWIST2: PICO teleop và G1 sim2real
wholebody-vla

TWIST2: PICO teleop và G1 sim2real

6/11/202617 min read
NT
CLONE: MoE teleop và chọn stack
wholebody-vla

CLONE: MoE teleop và chọn stack

6/11/202617 min read
NT
OpenWBT: G1 teleop trong MuJoCo/Isaac
wholebody-vla

OpenWBT: G1 teleop trong MuJoCo/Isaac

6/11/202613 min read
NT