VIRAL: RGB Sim2Real for G1 Loco-Manip

Why VIRAL is the fourth stack

In Part 1 on OpenWBT, we started with debuggable whole-body teleoperation in MuJoCo and Isaac before touching real hardware. In Part 2 on TWIST2, the focus moved to direct robot data collection with PICO teleoperation, a Redis bus, and a low-level controller. In Part 3 on EgoHumanoid, the question became data scale: can egocentric human demonstrations, plus a limited amount of robot data, co-train a VLA policy for a G1 humanoid?

VIRAL takes a different route. The NVlabs/GR00T-VisualSim2Real repository describes VIRAL as a visual sim-to-real framework for humanoid loco-manipulation on the Unitree G1. The paper VIRAL: Visual Sim-to-Real at Scale for Humanoid Loco-Manipulation emphasizes that the robot learns entirely in simulation and then deploys zero-shot to real hardware from RGB and proprioception. In short: EgoHumanoid uses human egocentric data to add real-world diversity, while VIRAL tries to make simulation rich enough for an RGB student policy to transfer to the real G1.

This post does not try to reproduce the full paper at multi-GPU scale. The practical goal is narrower: set up NVlabs/GR00T-VisualSim2Real, understand the teacher-student workflow, run the PPO teacher through gr00t/rl/train_agent_trl.py +exp=loco_manip/walk_stand_place_grasp_turn_homie, distill an RGB DAgger student through wsdpt_student_for_teacher_v8q8.002_resnet_rgb_delay.yaml, read important Hydra fields such as teacher_actor_path, num_envs, and env.config.reset_from_dataset.enable, then export ONNX with gr00t/rl/eval_agent_trl.py num_envs=1.

For broader context outside this series, also read Running GR00T-VisualSim2Real for G1 and the WholeBodyVLA open-source guide. This post focuses on one concrete path: VIRAL's walk_stand_place_grasp_turn_homie task.

Series roadmap

OpenWBT: G1 Teleop in MuJoCo/Isaac: build the environment, verify ONNX policies, and understand lower-body joystick plus upper-body IK.
TWIST2: PICO Teleop and G1 Sim2Real: use PICO teleoperation, Redis, and a low-level controller to collect direct robot data.
EgoHumanoid: Human Demos to G1 VLA: turn egocentric human demos into robot-ready data through view and action alignment.
VIRAL: RGB Sim2Real for G1 Loco-Manip: train a privileged teacher in simulation, distill an RGB student, randomize visuals, and export the policy.
FromW1: Moving Skills onto Real Hardware: handle latency, contacts, and actuators when moving from sim to hardware.
CLONE: Closed-Loop Whole-Body Teleop: treat closed-loop teleoperation as a long-horizon data stack.

Technical references to keep open

Source	Why it matters	Detail to remember
GR00T-VisualSim2Real README	Install, teacher training, student training, evaluation, ONNX export	The repository uses Isaac Sim 5.1, Isaac Lab, TRL, and Hydra
VIRAL paper	Understand the teacher-student design and sim-to-real recipe	The teacher has privileged full state; the student uses RGB; domain randomization and camera/hand alignment matter
VIRAL project page	Inspect tasks, failure cases, and generalization	The page shows variation across tray position, objects, table height, lighting, and 54 loco-manipulation cycles
Student config YAML	Read `teacher_actor_path`, cameras, DAgger, and ResNet RGB delay	This is the distillation config for a student trained from a teacher checkpoint
EgoHumanoid paper	Compare against a human-egocentric-data pipeline	EgoHumanoid co-trains human and robot data through view/action alignment, not simulation alone

Mental model: the teacher sees full state, the student sees RGB

The hardest part of VIRAL is not the training command. It is the split into two policies:

Isaac Sim 5.1 / Isaac Lab
  G1 robot, objects, table, tray, contacts, task stage
        |
        v
Privileged PPO teacher
  obs: full state, object pose, hand-object transform, target, contact-like signals
  action: homie command + right arm + finger primitive
        |
        | rollouts + teacher action labels
        v
RGB DAgger student
  obs: minimal proprioception + delayed RGB image
  backbone: ResNet18 vision encoder + MLP
  action: same action space as the teacher
        |
        v
eval_agent_trl.py num_envs=1
  checkpoint -> exported ONNX
        |
        v
G1 deployment stack

The teacher is allowed to "cheat" in the training sense: it can use information that the real robot will not directly measure at runtime, such as object position, hand-object transforms, target place/lift positions, and task stage. That makes the long-horizon PPO problem easier. But such a teacher is not a deployable visual policy. The student is the deployable policy: it receives observations closer to the real system, mainly RGB camera input and proprioception.

DAgger matters because the student is not only trained on a static set of clean teacher states. The student runs in closed loop inside simulation; when it drifts away from the teacher's ideal trajectory, the teacher can still provide the corrective action at that new state. For humanoid loco-manipulation, this difference is large. A few centimeters of base error can change the camera view, move the wrist out of reach, and make an offline behavior cloning policy leave its training distribution. Online DAgger produces more "student is slightly wrong but still recoverable" states, which is exactly what the real robot needs.

Step 1: install the VIRAL environment

The official README expects Ubuntu 22.04, an NVIDIA driver at least 535, Conda or Mamba, Isaac Sim 5.1, and Isaac Lab. The repository uses Python 3.11, PyTorch 2.7.0 with CUDA 12.8 wheels, then installs Isaac Sim through pip:

conda create -n viral python=3.11 -y
conda activate viral

pip install torch==2.7.0 torchvision==0.22.0 \
  --index-url https://download.pytorch.org/whl/cu128

pip install isaacsim==5.1.0.0 isaacsim-rl==5.1.0.0

Install Isaac Lab from source, then install the repository:

pip install setuptools poetry-core flatdict

pip install --no-build-isolation -e <path-to-IsaacLab>/source/isaaclab
pip install --no-build-isolation -e <path-to-IsaacLab>/source/isaaclab_assets \
  -e <path-to-IsaacLab>/source/isaaclab_tasks \
  -e "<path-to-IsaacLab>/source/isaaclab_rl[all]"

pip install numpy==1.26.0

cd <path-to-GR00T-VisualSim2Real>
pip install -e .
pip install numpy==1.26.0

Run two smoke checks before training:

python -c "import isaaclab; print(isaaclab.__file__)"
python -c "from gr00t.rl.envs.base_task.base_task import BaseTask; print('OK')"

If the second command fails, do not start editing YAML yet. The issue is usually the editable install, the Isaac Lab path, or a numpy/Python version conflict. With Isaac Sim and Isaac Lab, a small version mismatch can produce a very noisy error. Lock the environment first; tune num_envs later.

Step 2: train the PPO teacher

The teacher path in this post uses the experiment below:

HYDRA_FULL_ERROR=1 accelerate launch --num_processes 1 \
  gr00t/rl/train_agent_trl.py \
  +exp=loco_manip/walk_stand_place_grasp_turn_homie \
  num_envs=48 \
  project_name=wsdpt_teacher

The walk_stand_place_grasp_turn_homie.yaml experiment composes several Hydra config groups:

Config group	Main value	Meaning
`/algo`	`ppo`	Train the teacher with PPO
`/env`	`walk_stand_place_grasp_turn_homie`	The walk, stand, place, grasp, turn task
`/simulator`	`isaacsim`	Isaac Sim backend
`/robot`	`g1/g1_43dof`	Unitree G1 43-DOF robot config
`/obs`	`obs_walk_stand_place_grasp_turn_homie`	Rich observation set for the teacher
`/rewards`	`reward_wsdpt_butterflyV8_q_2_teacher`	Reward shaping for the WSDPT task
`/trainer`	`trl_homie_api`	Trainer wrapper used by the repository

The YAML itself may set num_envs to 2048 for serious training. The README example uses num_envs=48. The important point for beginners is that the command-line value overrides the experiment default through Hydra. If your GPU only has 24 GB of memory, start much lower:

HYDRA_FULL_ERROR=1 accelerate launch --num_processes 1 \
  gr00t/rl/train_agent_trl.py \
  +exp=loco_manip/walk_stand_place_grasp_turn_homie \
  num_envs=8 \
  headless=True \
  project_name=wsdpt_teacher_smoke \
  experiment_name=wspgt_teacher_smoke

num_envs is the number of parallel simulation environments. Raising it collects rollouts faster, but memory use grows with robots, objects, contact sensors, observation buffers, and camera rendering when cameras are enabled. The teacher does not use RGB rendering, so it is usually lighter than the student. Still, this loco-manipulation task is heavy because it includes the G1 model, a table, objects, a tray, hand primitives, and many termination conditions.

To visually inspect the simulation, add:

headless=False

Use the GUI as a debugging tool, not as the default long-run mode. It is useful for checking whether the robot spawns correctly, whether objects are in reasonable positions, and whether the task stage changes as expected. For real training, run headless.

Step 3: understand `env.config.reset_from_dataset.enable`

The environment config includes an important reset block:

env:
  config:
    reset_from_dataset:
      enable: True
      use_motion_file_dir: True
      motion_file_dir: "gr00t/rl/data/motions/g1_wsdpt/33demos_675_775"
      num_per_sample: 10
      sample_interval_s: 0.1
      resample_every: 1000

This tells the simulator that episodes can be reset from a motion/demo dataset instead of always starting from the same initial state. For a long task such as walk-stand-place-grasp-turn, this acts like reference state initialization. It lets the teacher encounter more stages of the task: walking to the table, preparing to place, transitioning into grasp, and turning after manipulation. If every episode starts from frame zero, PPO may spend a long time before it sees reward from later stages.

For debugging, use the field intentionally:

Goal	Override to try	Why
Inspect clean environment spawn	`env.config.reset_from_dataset.enable=False`	Easier to reason about the initial state
Follow the repository recipe	Keep it `True`	Better for learning the long-horizon task
Diagnose dataset path errors	Use `HYDRA_FULL_ERROR=1` and `num_envs=1`	The stack trace is easier to read

Do not disable reset_from_dataset only because it looks complex. In long-horizon whole-body tasks, reset and curriculum logic often matter as much as reward design. If the dataset path is missing or the demos are in the wrong location, fix the data path first instead of silently training a different task.

Step 4: evaluate the teacher checkpoint

Once you have a checkpoint, evaluate it with:

python gr00t/rl/eval_agent_trl.py \
  +checkpoint=logs_rl/<experiment_dir>/model_step_044500.pt

At this point, you are asking three practical questions:

Question	Good sign	Bad sign
Is the robot walking stably?	No knee contact, no strong drift, no fall during stage changes	Termination from knee contact, low height, or gravity
Does the right arm approach the object?	The wrist reaches the object region before the finger primitive closes	The arm swings too fast or knocks the object away
Does the task stage progress?	Walk, stand, place, grasp, turn happen in order	The robot stays in one stage or resets early

The teacher must be good enough before training the student. If the teacher cannot grasp reliably in simulation, the RGB student will not fix that. The student is learning to imitate the teacher under harder observations; it is not a magic upgrade from a weak policy.

Step 5: distill the RGB student with DAgger

The student uses this experiment config:

gr00t/rl/config/exp/loco_manip/wsdpt_student_for_teacher_v8q8.002_resnet_rgb_delay.yaml

The README workflow is to set teacher_actor_path, then launch training:

teacher_actor_path: logs_rl/<your_teacher_experiment>/model_step_XXXXXX.pt

HYDRA_FULL_ERROR=1 accelerate launch --num_processes 1 \
  gr00t/rl/train_agent_trl.py \
  +exp=loco_manip/wsdpt_student_for_teacher_v8q8.002_resnet_rgb_delay \
  num_envs=8 \
  headless=True \
  experiment_name=wsdpt_student \
  project_name=wsdpt_student_debug

The original student YAML may set num_envs: 1024, enable RGB cameras, use camera resolution [108, 192], enable RGB and disable depth. The README uses num_envs=8 as a debugging run. The student is heavier than the teacher because every environment needs rendered images. If VRAM usage jumps compared with teacher training, that is expected.

Important fields:

Field	Location	Practical meaning
`teacher_actor_path`	top-level student YAML	Path to the PPO teacher checkpoint, loaded by `network_load_dict.teacher_actor.path`
`num_envs`	top-level or command line	Number of parallel environments; start small for RGB students
`enable_cameras`	top-level and simulator config	Required for the vision student
`simulator.config.cameras.camera_resolutions`	student YAML	RGB image size, here 108 x 192
`obs.rgb_image_delay_step`	RGB-delay obs config	Selects latest or delayed RGB frames
`algo.config.use_dagger`	DAgger algo config	Enables DAgger/BC-style training from the teacher
`algo.config.enforce_teacher_rollout`	student YAML	Forces teacher rollout logic during distillation
`algo.config.ratio_teacher_rollout`	student YAML	Controls the teacher rollout ratio
`algo.config.network_load_dict.teacher_actor.path`	student YAML	Points to `${teacher_actor_path}`
`algo.config.actor.backbone.vision_module`	student YAML	ResNet vision encoder, defaulting to pretrained ResNet18

The student's observations are much less privileged than the teacher's. actor_obs includes base angular velocity, projected gravity, previous actions, DOF position/velocity without fingers, delta actions, and homie commands. vision_obs uses rgb_image_delayed. The teacher, meanwhile, still sees object and target state. That is the teacher-student gap: the teacher knows where the object is in state; the student must infer it from pixels.

Step 6: RGB delay and domain randomization

The config name includes rgb_delay because the student should not assume the camera image arrives instantly. On a real robot, camera capture, drivers, preprocessing, policy inference, and actuator commands all introduce latency. The observation config supports:

obs:
  rgb_image_delay_random: False
  rgb_image_delay_resample_on_reset: False
  rgb_image_delay_step: 1
  rgb_image_delay_step_min: 1
  rgb_image_delay_step_max: 5
  history_save_interval: 1

rgb_image_delay_step=1 means the student uses the latest frame in the buffer. To train with randomized delay, you can override:

obs.rgb_image_delay_random=True \
obs.rgb_image_delay_resample_on_reset=True \
obs.rgb_image_delay_step_min=1 \
obs.rgb_image_delay_step_max=5

Do not turn on random delay, heavy visual randomization, and aggressive camera extrinsic noise all at once in the first run. Layer the difficulty so failures are attributable. A practical order is:

Train an RGB student without random delay at low num_envs.
Add mild camera extrinsics randomization.
Add image, material, and lighting randomization according to the recipe.
Add randomized RGB delay if the real rollout stack has noticeable latency.

The VIRAL paper emphasizes large-scale visual domain randomization across lighting, materials, camera parameters, image quality, and sensor delays. For a smaller lab, randomization that is too strong too early can prevent the student from learning at all. Keep a clean baseline run for comparison.

Step 7: export ONNX

The README states that evaluation with num_envs=1 automatically exports the policy as ONNX:

python gr00t/rl/eval_agent_trl.py \
  +checkpoint=logs_rl/<student_experiment_dir>/model_step_XXXXXX.pt \
  num_envs=1

The model is written under:

<experiment_dir>/exported/

Why does num_envs=1 matter? Real deployment does not run 1024 robots in parallel. Export should produce a graph with input and output shapes suitable for one robot. If you export with an unexpected batch shape or observation layout, the ONNX file may exist but fail in the runtime deployment path. Before moving the ONNX file forward, check:

Check	How
Correct observation inputs	Print actor obs and vision obs shapes during eval
Correct camera feed	Run one short episode with `headless=False` and inspect the RGB frame
Correct action dimension	The student config uses `robot.actions_dim: 31` for 15 + 14 + 2 according to repo comments
Export artifact exists	Inspect the `exported/` folder after eval
No privileged dependency	Confirm the actor uses `actor_obs` + `vision_obs`, not `teacher_obs`

VIRAL versus EgoHumanoid

Both stacks target visual policy learning for G1 loco-manipulation, but they assume different data sources:

Axis	VIRAL	EgoHumanoid
Main data source	Large-scale simulation in Isaac Sim/Isaac Lab	Egocentric human demos plus robot teleoperation data
Learning recipe	Privileged RL teacher, RGB DAgger student	Co-train a VLA policy on aligned human/robot data
Runtime observation	RGB plus minimal proprioception	Egocentric RGB plus a unified language/action schema
Embodiment gap handling	Real-to-sim camera/hand alignment and visual domain randomization	View alignment and action alignment between humans and robots
Strength	Does not require large real human/robot demo collection; simulator is controllable	Human data provides real scene diversity and helps generalization outside the lab
Weakness	Requires a high-quality simulator, significant compute, and careful randomization	Human-to-robot alignment is difficult and the data pipeline is long
Best use case	A lab with strong Isaac Sim compute and limited robot time	A lab that can collect diverse human demos and a smaller set of good robot demos

A common misunderstanding is that VIRAL is "data-light." It is not. It replaces real-world data collection with simulation compute, curriculum, and randomization. EgoHumanoid is not simply "no simulation" either. It shifts the burden to data alignment so human video and human motion can become robot-compatible. For a practical lab, choose based on constraints:

Lab condition	Stack to prioritize
Strong GPU workstation, limited robot time	VIRAL smoke tests and small-scale teacher/student training
PICO/ZED setup and many people able to collect demos in diverse places	EgoHumanoid-style data pipeline
Need a real robot demo quickly	OpenWBT or TWIST2 first, VIRAL/EgoHumanoid later
Serious RGB sim-to-real research goal	VIRAL, because its domain randomization and teacher-student design directly target that question

Common beginner mistakes

Training the student before the teacher is good enough. This is the most expensive mistake. An RGB student cannot magically exceed a poor teacher on this task. Evaluate the teacher with both GUI inspection and metrics first.

Setting num_envs too high. The YAML may use 1024 or 2048 environments, but that is not a universal machine setting. On one GPU, begin with 4, 8, or 16 environments to validate the pipeline.

Misunderstanding Hydra overrides. +exp=... selects the experiment; num_envs=8 overrides a top-level value; obs.rgb_image_delay_random=True overrides a nested field. If you override the wrong path, Hydra may create an unexpected field or fail depending on mode. Keep HYDRA_FULL_ERROR=1 enabled.

Disabling cameras while training the student. The student config needs enable_cameras: true and simulator RGB cameras. If you disable cameras to save memory, you are no longer training the RGB student.

Confusing teacher_actor_path with the student checkpoint. teacher_actor_path must point to a PPO teacher checkpoint. The student checkpoint is passed to +checkpoint=... when evaluating the student.

Lab checklist

# 1. Verify install
python -c "from gr00t.rl.envs.base_task.base_task import BaseTask; print('OK')"

# 2. Teacher smoke test
HYDRA_FULL_ERROR=1 accelerate launch --num_processes 1 \
  gr00t/rl/train_agent_trl.py \
  +exp=loco_manip/walk_stand_place_grasp_turn_homie \
  num_envs=4 \
  headless=False \
  project_name=wsdpt_teacher_smoke \
  experiment_name=wspgt_teacher_gui

# 3. Teacher training, headless
HYDRA_FULL_ERROR=1 accelerate launch --num_processes 1 \
  gr00t/rl/train_agent_trl.py \
  +exp=loco_manip/walk_stand_place_grasp_turn_homie \
  num_envs=48 \
  headless=True \
  project_name=wsdpt_teacher

# 4. Teacher evaluation
python gr00t/rl/eval_agent_trl.py \
  +checkpoint=logs_rl/<teacher_experiment>/model_step_XXXXXX.pt

# 5. Student distillation
HYDRA_FULL_ERROR=1 accelerate launch --num_processes 1 \
  gr00t/rl/train_agent_trl.py \
  +exp=loco_manip/wsdpt_student_for_teacher_v8q8.002_resnet_rgb_delay \
  teacher_actor_path=logs_rl/<teacher_experiment>/model_step_XXXXXX.pt \
  num_envs=8 \
  headless=True \
  experiment_name=wsdpt_student_rgb_delay \
  project_name=wsdpt_student_debug

# 6. Student evaluation + ONNX export
python gr00t/rl/eval_agent_trl.py \
  +checkpoint=logs_rl/<student_experiment>/model_step_XXXXXX.pt \
  num_envs=1

For a beginner, the first success condition is not the real robot picking up an object immediately. The first success condition is: the environment spawns, the teacher checkpoint evaluates, the RGB student trains without crashing, an ONNX artifact appears under exported/, and you understand what each Hydra override changes.

Conclusion

VIRAL is worth studying because it cleanly shows a modern sim-to-real path for humanoids: train a difficult skill with a privileged teacher in Isaac Sim, distill an RGB student through DAgger, bridge sim-to-real with domain randomization and alignment, then export a deployable policy. Compared with EgoHumanoid, it depends less on human egocentric data but requires a stronger simulator, more compute, and stricter debugging discipline.

In this series, VIRAL is the simulation-first visual policy stack. It does not replace OpenWBT, TWIST2, or EgoHumanoid. It adds another route toward whole-body humanoid VLA: first make the simulator strong enough, then force the student to learn from observations that resemble the real robot.

Why VIRAL is the fourth stack

Series roadmap

OpenWBT: G1 Teleop in MuJoCo/Isaac: build the environment, verify ONNX policies, and understand lower-body joystick plus upper-body IK.
TWIST2: PICO Teleop and G1 Sim2Real: use PICO teleoperation, Redis, and a low-level controller to collect direct robot data.
EgoHumanoid: Human Demos to G1 VLA: turn egocentric human demos into robot-ready data through view and action alignment.
VIRAL: RGB Sim2Real for G1 Loco-Manip: train a privileged teacher in simulation, distill an RGB student, randomize visuals, and export the policy.
FromW1: Moving Skills onto Real Hardware: handle latency, contacts, and actuators when moving from sim to hardware.
CLONE: Closed-Loop Whole-Body Teleop: treat closed-loop teleoperation as a long-horizon data stack.

Technical references to keep open

Source	Why it matters	Detail to remember
GR00T-VisualSim2Real README	Install, teacher training, student training, evaluation, ONNX export	The repository uses Isaac Sim 5.1, Isaac Lab, TRL, and Hydra
VIRAL paper	Understand the teacher-student design and sim-to-real recipe	The teacher has privileged full state; the student uses RGB; domain randomization and camera/hand alignment matter
VIRAL project page	Inspect tasks, failure cases, and generalization	The page shows variation across tray position, objects, table height, lighting, and 54 loco-manipulation cycles
Student config YAML	Read `teacher_actor_path`, cameras, DAgger, and ResNet RGB delay	This is the distillation config for a student trained from a teacher checkpoint
EgoHumanoid paper	Compare against a human-egocentric-data pipeline	EgoHumanoid co-trains human and robot data through view/action alignment, not simulation alone

Mental model: the teacher sees full state, the student sees RGB

The hardest part of VIRAL is not the training command. It is the split into two policies:

Isaac Sim 5.1 / Isaac Lab
  G1 robot, objects, table, tray, contacts, task stage
        |
        v
Privileged PPO teacher
  obs: full state, object pose, hand-object transform, target, contact-like signals
  action: homie command + right arm + finger primitive
        |
        | rollouts + teacher action labels
        v
RGB DAgger student
  obs: minimal proprioception + delayed RGB image
  backbone: ResNet18 vision encoder + MLP
  action: same action space as the teacher
        |
        v
eval_agent_trl.py num_envs=1
  checkpoint -> exported ONNX
        |
        v
G1 deployment stack

Step 1: install the VIRAL environment

conda create -n viral python=3.11 -y
conda activate viral

pip install torch==2.7.0 torchvision==0.22.0 \
  --index-url https://download.pytorch.org/whl/cu128

pip install isaacsim==5.1.0.0 isaacsim-rl==5.1.0.0

Install Isaac Lab from source, then install the repository:

pip install setuptools poetry-core flatdict

pip install --no-build-isolation -e <path-to-IsaacLab>/source/isaaclab
pip install --no-build-isolation -e <path-to-IsaacLab>/source/isaaclab_assets \
  -e <path-to-IsaacLab>/source/isaaclab_tasks \
  -e "<path-to-IsaacLab>/source/isaaclab_rl[all]"

pip install numpy==1.26.0

cd <path-to-GR00T-VisualSim2Real>
pip install -e .
pip install numpy==1.26.0

Run two smoke checks before training:

python -c "import isaaclab; print(isaaclab.__file__)"
python -c "from gr00t.rl.envs.base_task.base_task import BaseTask; print('OK')"

Step 2: train the PPO teacher

The teacher path in this post uses the experiment below:

HYDRA_FULL_ERROR=1 accelerate launch --num_processes 1 \
  gr00t/rl/train_agent_trl.py \
  +exp=loco_manip/walk_stand_place_grasp_turn_homie \
  num_envs=48 \
  project_name=wsdpt_teacher

The walk_stand_place_grasp_turn_homie.yaml experiment composes several Hydra config groups:

Config group	Main value	Meaning
`/algo`	`ppo`	Train the teacher with PPO
`/env`	`walk_stand_place_grasp_turn_homie`	The walk, stand, place, grasp, turn task
`/simulator`	`isaacsim`	Isaac Sim backend
`/robot`	`g1/g1_43dof`	Unitree G1 43-DOF robot config
`/obs`	`obs_walk_stand_place_grasp_turn_homie`	Rich observation set for the teacher
`/rewards`	`reward_wsdpt_butterflyV8_q_2_teacher`	Reward shaping for the WSDPT task
`/trainer`	`trl_homie_api`	Trainer wrapper used by the repository

HYDRA_FULL_ERROR=1 accelerate launch --num_processes 1 \
  gr00t/rl/train_agent_trl.py \
  +exp=loco_manip/walk_stand_place_grasp_turn_homie \
  num_envs=8 \
  headless=True \
  project_name=wsdpt_teacher_smoke \
  experiment_name=wspgt_teacher_smoke

To visually inspect the simulation, add:

headless=False

Step 3: understand `env.config.reset_from_dataset.enable`

The environment config includes an important reset block:

env:
  config:
    reset_from_dataset:
      enable: True
      use_motion_file_dir: True
      motion_file_dir: "gr00t/rl/data/motions/g1_wsdpt/33demos_675_775"
      num_per_sample: 10
      sample_interval_s: 0.1
      resample_every: 1000

For debugging, use the field intentionally:

Goal	Override to try	Why
Inspect clean environment spawn	`env.config.reset_from_dataset.enable=False`	Easier to reason about the initial state
Follow the repository recipe	Keep it `True`	Better for learning the long-horizon task
Diagnose dataset path errors	Use `HYDRA_FULL_ERROR=1` and `num_envs=1`	The stack trace is easier to read

Step 4: evaluate the teacher checkpoint

Once you have a checkpoint, evaluate it with:

python gr00t/rl/eval_agent_trl.py \
  +checkpoint=logs_rl/<experiment_dir>/model_step_044500.pt

At this point, you are asking three practical questions:

Question	Good sign	Bad sign
Is the robot walking stably?	No knee contact, no strong drift, no fall during stage changes	Termination from knee contact, low height, or gravity
Does the right arm approach the object?	The wrist reaches the object region before the finger primitive closes	The arm swings too fast or knocks the object away
Does the task stage progress?	Walk, stand, place, grasp, turn happen in order	The robot stays in one stage or resets early

Step 5: distill the RGB student with DAgger

The student uses this experiment config:

gr00t/rl/config/exp/loco_manip/wsdpt_student_for_teacher_v8q8.002_resnet_rgb_delay.yaml

The README workflow is to set teacher_actor_path, then launch training:

teacher_actor_path: logs_rl/<your_teacher_experiment>/model_step_XXXXXX.pt

HYDRA_FULL_ERROR=1 accelerate launch --num_processes 1 \
  gr00t/rl/train_agent_trl.py \
  +exp=loco_manip/wsdpt_student_for_teacher_v8q8.002_resnet_rgb_delay \
  num_envs=8 \
  headless=True \
  experiment_name=wsdpt_student \
  project_name=wsdpt_student_debug

Important fields:

Field	Location	Practical meaning
`teacher_actor_path`	top-level student YAML	Path to the PPO teacher checkpoint, loaded by `network_load_dict.teacher_actor.path`
`num_envs`	top-level or command line	Number of parallel environments; start small for RGB students
`enable_cameras`	top-level and simulator config	Required for the vision student
`simulator.config.cameras.camera_resolutions`	student YAML	RGB image size, here 108 x 192
`obs.rgb_image_delay_step`	RGB-delay obs config	Selects latest or delayed RGB frames
`algo.config.use_dagger`	DAgger algo config	Enables DAgger/BC-style training from the teacher
`algo.config.enforce_teacher_rollout`	student YAML	Forces teacher rollout logic during distillation
`algo.config.ratio_teacher_rollout`	student YAML	Controls the teacher rollout ratio
`algo.config.network_load_dict.teacher_actor.path`	student YAML	Points to `${teacher_actor_path}`
`algo.config.actor.backbone.vision_module`	student YAML	ResNet vision encoder, defaulting to pretrained ResNet18

Step 6: RGB delay and domain randomization

obs:
  rgb_image_delay_random: False
  rgb_image_delay_resample_on_reset: False
  rgb_image_delay_step: 1
  rgb_image_delay_step_min: 1
  rgb_image_delay_step_max: 5
  history_save_interval: 1

rgb_image_delay_step=1 means the student uses the latest frame in the buffer. To train with randomized delay, you can override:

obs.rgb_image_delay_random=True \
obs.rgb_image_delay_resample_on_reset=True \
obs.rgb_image_delay_step_min=1 \
obs.rgb_image_delay_step_max=5

Do not turn on random delay, heavy visual randomization, and aggressive camera extrinsic noise all at once in the first run. Layer the difficulty so failures are attributable. A practical order is:

Train an RGB student without random delay at low num_envs.
Add mild camera extrinsics randomization.
Add image, material, and lighting randomization according to the recipe.
Add randomized RGB delay if the real rollout stack has noticeable latency.

Step 7: export ONNX

The README states that evaluation with num_envs=1 automatically exports the policy as ONNX:

python gr00t/rl/eval_agent_trl.py \
  +checkpoint=logs_rl/<student_experiment_dir>/model_step_XXXXXX.pt \
  num_envs=1

The model is written under:

<experiment_dir>/exported/

Check	How
Correct observation inputs	Print actor obs and vision obs shapes during eval
Correct camera feed	Run one short episode with `headless=False` and inspect the RGB frame
Correct action dimension	The student config uses `robot.actions_dim: 31` for 15 + 14 + 2 according to repo comments
Export artifact exists	Inspect the `exported/` folder after eval
No privileged dependency	Confirm the actor uses `actor_obs` + `vision_obs`, not `teacher_obs`

VIRAL versus EgoHumanoid

Both stacks target visual policy learning for G1 loco-manipulation, but they assume different data sources:

Axis	VIRAL	EgoHumanoid
Main data source	Large-scale simulation in Isaac Sim/Isaac Lab	Egocentric human demos plus robot teleoperation data
Learning recipe	Privileged RL teacher, RGB DAgger student	Co-train a VLA policy on aligned human/robot data
Runtime observation	RGB plus minimal proprioception	Egocentric RGB plus a unified language/action schema
Embodiment gap handling	Real-to-sim camera/hand alignment and visual domain randomization	View alignment and action alignment between humans and robots
Strength	Does not require large real human/robot demo collection; simulator is controllable	Human data provides real scene diversity and helps generalization outside the lab
Weakness	Requires a high-quality simulator, significant compute, and careful randomization	Human-to-robot alignment is difficult and the data pipeline is long
Best use case	A lab with strong Isaac Sim compute and limited robot time	A lab that can collect diverse human demos and a smaller set of good robot demos

Lab condition	Stack to prioritize
Strong GPU workstation, limited robot time	VIRAL smoke tests and small-scale teacher/student training
PICO/ZED setup and many people able to collect demos in diverse places	EgoHumanoid-style data pipeline
Need a real robot demo quickly	OpenWBT or TWIST2 first, VIRAL/EgoHumanoid later
Serious RGB sim-to-real research goal	VIRAL, because its domain randomization and teacher-student design directly target that question

Common beginner mistakes

Setting num_envs too high. The YAML may use 1024 or 2048 environments, but that is not a universal machine setting. On one GPU, begin with 4, 8, or 16 environments to validate the pipeline.

Lab checklist

# 1. Verify install
python -c "from gr00t.rl.envs.base_task.base_task import BaseTask; print('OK')"

# 2. Teacher smoke test
HYDRA_FULL_ERROR=1 accelerate launch --num_processes 1 \
  gr00t/rl/train_agent_trl.py \
  +exp=loco_manip/walk_stand_place_grasp_turn_homie \
  num_envs=4 \
  headless=False \
  project_name=wsdpt_teacher_smoke \
  experiment_name=wspgt_teacher_gui

# 3. Teacher training, headless
HYDRA_FULL_ERROR=1 accelerate launch --num_processes 1 \
  gr00t/rl/train_agent_trl.py \
  +exp=loco_manip/walk_stand_place_grasp_turn_homie \
  num_envs=48 \
  headless=True \
  project_name=wsdpt_teacher

# 4. Teacher evaluation
python gr00t/rl/eval_agent_trl.py \
  +checkpoint=logs_rl/<teacher_experiment>/model_step_XXXXXX.pt

# 5. Student distillation
HYDRA_FULL_ERROR=1 accelerate launch --num_processes 1 \
  gr00t/rl/train_agent_trl.py \
  +exp=loco_manip/wsdpt_student_for_teacher_v8q8.002_resnet_rgb_delay \
  teacher_actor_path=logs_rl/<teacher_experiment>/model_step_XXXXXX.pt \
  num_envs=8 \
  headless=True \
  experiment_name=wsdpt_student_rgb_delay \
  project_name=wsdpt_student_debug

# 6. Student evaluation + ONNX export
python gr00t/rl/eval_agent_trl.py \
  +checkpoint=logs_rl/<student_experiment>/model_step_XXXXXX.pt \
  num_envs=1

Why VIRAL is the fourth stack

Series roadmap

Technical references to keep open

Mental model: the teacher sees full state, the student sees RGB

Step 1: install the VIRAL environment

Step 2: train the PPO teacher

Step 3: understand env.config.reset_from_dataset.enable

Step 4: evaluate the teacher checkpoint

Step 5: distill the RGB student with DAgger

Step 6: RGB delay and domain randomization

Step 7: export ONNX

VIRAL versus EgoHumanoid

Common beginner mistakes

Lab checklist

Conclusion

Related Posts

Nguyễn Anh Tuấn

Related Posts

TWIST2: PICO teleop và G1 sim2real

CLONE: MoE teleop và chọn stack

OpenWBT: G1 teleop trong MuJoCo/Isaac

Why VIRAL is the fourth stack

Series roadmap

Technical references to keep open

Mental model: the teacher sees full state, the student sees RGB

Step 1: install the VIRAL environment

Step 2: train the PPO teacher

Step 3: understand env.config.reset_from_dataset.enable

Step 4: evaluate the teacher checkpoint

Step 5: distill the RGB student with DAgger

Step 6: RGB delay and domain randomization

Step 7: export ONNX

VIRAL versus EgoHumanoid

Common beginner mistakes

Lab checklist

Conclusion

Related Posts

Nguyễn Anh Tuấn

Related Posts

TWIST2: PICO teleop và G1 sim2real

CLONE: MoE teleop và chọn stack

OpenWBT: G1 teleop trong MuJoCo/Isaac

Step 3: understand `env.config.reset_from_dataset.enable`

Step 3: understand `env.config.reset_from_dataset.enable`