wholebody-vlagrootvlawhole-body-vlalerobotunitree-g1sonicdataset

GR00T Whole-Body VLA Data: Open Datasets

Part 1 of the GR00T whole-body VLA data pipeline: download public datasets, validate GR00T-LeRobot format, fine-tune, and run inference.

Nguyen Anh TuanJune 6, 202613 min read
GR00T Whole-Body VLA Data: Open Datasets

GR00T Whole-Body VLA Data: Open Datasets

Disclosure: This article may contain affiliate or referral links. If you buy or sign up through those links, VnRobo may earn a commission or service credit. The technical recommendations prioritize engineering fit first.

This series focuses on the part of NVIDIA Isaac-GR00T that most often breaks in practice: data. A GR00T N1.5/N1.7 whole-body VLA workflow does not only need images and language. It needs the right dataset layout, right state/action slicing, right meta/modality.json, right embodiment-tag, and for UNITREE_G1 / GEAR-SONIC style whole-body control, the action space must match the decoder/controller.

Part 1 uses open/public datasets. By the end, you should be able to download a Hugging Face dataset, validate it as GR00T-LeRobot, and run a fine-tune or at least open-loop inference.

Primary Sources And Assumptions

According to the current NVIDIA/Isaac-GR00T README, the N1.7 workflow is: prepare data, run inference, fine-tune, evaluate, then deploy. The dataset format is based on LeRobot v2 with an additional meta/modality.json file that describes state, action, video, and annotation mapping. N1.7 is still an early access style workflow, so some paths and flags can change before a stable release.

Read these together with this guide:

Public datasets worth checking:

Dataset Use case Notes
nvidia/Arena-G1-Loco-Manipulation-Task Sim/public G1 loco-manipulation Includes a lerobot folder, meta/modality.json, and is described as GR00T-Lerobot formatted data.
nvidia/PhysicalAI-Robotics-GR00T-Teleop-G1 Real/teleop G1 public examples Has task subfolders such as g1-pick-pear, LeRobot-style metadata, and modality.json.
PID0930/g1-inspire-pick-cube-gr00t-lerobot Small Unitree G1 + Inspire hands smoke test The dataset card mentions NEW_EMBODIMENT and a custom config. Verify config path in your repo branch.
SensoriRobotics/g1_locomanipulation_sdg Synthetic G1 loco-manipulation Dataset card says it is LeRobot v2.0 for GR00T N1.6. Useful for format learning and sim/public mixing.

Not every G1 dataset can be used with UNITREE_G1_SONIC. If actions are raw joint targets, base velocities, or custom hand commands, use NEW_EMBODIMENT and a custom modality config. Use UNITREE_G1_SONIC only when the action space is the SONIC latent whole-body action format expected by that stack.

Practical Success Path

If your goal is to actually train and infer successfully, do not start with custom simulation or real robot data. Use this order:

1. Download a public GR00T-LeRobot dataset that already has meta/modality.json
2. Verify the dataset and print one parquet row
3. Run a 100-step smoke fine-tune with a config that matches the dataset
4. Load the checkpoint with open-loop eval or PolicyServer
5. Only then replace the data with your sim or real dataset
6. For G1 + SONIC, keep the two pipelines separate:
   - GR00T VLA fine-tune: learns 78-dim SONIC latent + hand actions
   - SONIC controller training: learns the decoder/controller, covered in Part 4

Key point: Parts 1-3 do not train the SONIC controller from motion data. They train/fine-tune the GR00T VLA on LeRobot datasets. You can often use the released SONIC checkpoint; train/fine-tune SONIC only when you need a new controller, motion foundation, or embodiment support.

1.1 Download An Open Dataset

Goal

Download a public dataset without destroying its directory structure, then classify it:

  • SONIC latent whole-body data: likely use UNITREE_G1_SONIC.
  • G1 raw/custom action data: use NEW_EMBODIMENT plus a custom modality config.
  • Arm/tabletop data: useful for loader tests, but not directly a whole-body dataset.

Environment Requirements

For downloading:

  • Ubuntu 22.04/24.04 or similar Linux.
  • Python 3.10+.
  • git-lfs, uv, huggingface_hub.
  • Disk space: at least 2-10 GB for small public datasets; much more for video-heavy datasets.

For training/inference:

  • Small open-loop inference: 16-24 GB VRAM can be enough for smoke tests, but OOM is likely.
  • Debug fine-tune: 1 GPU with 48-80 GB VRAM is more practical.
  • Serious whole-body fine-tuning: prepare 4+ GPUs with 80 GB VRAM if possible. NVIDIA GEAR/SONIC style docs strongly point toward multi-GPU training.
  • GR00T base models are large, so stable network and cache storage matter.

Setup Isaac-GR00T

sudo apt update
sudo apt install -y git git-lfs
git lfs install

git clone --recurse-submodules https://github.com/NVIDIA/Isaac-GR00T.git
cd Isaac-GR00T

curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync --all-extras

# Required if the dataset/model needs auth or license acceptance.
uv run hf auth login

If you cloned without submodules:

git submodule update --init --recursive

Download Arena G1 LeRobot Data

nvidia/Arena-G1-Loco-Manipulation-Task is a good first dataset because its card documents a lerobot folder converted to GR00T-Lerobot format.

mkdir -p datasets

uv run hf download nvidia/Arena-G1-Loco-Manipulation-Task \
  --repo-type dataset \
  --include "lerobot/**" \
  --local-dir datasets/arena_g1_loco

find datasets/arena_g1_loco -maxdepth 3 -type f | head -40

If the dataset root is directly under lerobot, set:

export DATASET_ROOT="$PWD/datasets/arena_g1_loco/lerobot"

If Hugging Face created an extra directory level, find the metadata:

find datasets/arena_g1_loco -name modality.json -print

Download A G1 Teleoperation Example

uv run hf download nvidia/PhysicalAI-Robotics-GR00T-Teleop-G1 \
  --repo-type dataset \
  --include "g1-pick-pear/**" \
  --local-dir datasets/g1_teleop_public

export DATASET_ROOT="$PWD/datasets/g1_teleop_public/g1-pick-pear"

This dataset is useful for learning whole-body modality slicing. Its modality.json contains body parts such as left_leg, right_leg, waist, left_arm, left_hand, right_arm, and right_hand. The exact training command still depends on the config and embodiment supported by your checkpoint/repo branch.

Download A Small Smoke-Test Dataset

uv run hf download PID0930/g1-inspire-pick-cube-gr00t-lerobot \
  --repo-type dataset \
  --local-dir datasets/g1_inspire_pick_cube

export DATASET_ROOT="$PWD/datasets/g1_inspire_pick_cube"

The dataset card mentions --embodiment-tag NEW_EMBODIMENT and examples/G1Inspire/g1_inspire_config.py. That path must be verified against your Isaac-GR00T branch. If the file does not exist, create a custom modality config from meta/modality.json.

1.2 Convert Or Verify GR00T-LeRobot Format

Goal

A valid GR00T-LeRobot dataset should have at least:

dataset_root/
├── meta/
│   ├── info.json
│   ├── episodes.jsonl
│   ├── tasks.jsonl
│   └── modality.json
├── data/
│   └── chunk-000/
│       ├── episode_000000.parquet
│       └── episode_000001.parquet
└── videos/
    └── chunk-000/
        └── observation.images.ego_view/
            ├── episode_000000.mp4
            └── episode_000001.mp4

Some LeRobot exporters store files as data/train-00000.parquet instead of data/chunk-000/episode_*.parquet. GR00T docs show the chunk/episode style, while GR00T-WholeBodyControl collection docs show data/train-00000.parquet. This is version-dependent:

  • If the exporter and loader come from matching branches, it may work.
  • If the Isaac-GR00T loader fails, convert to the chunk/episode layout or use the matching repo branch. This needs verification on your exact version.

Check Required Files

test -f "$DATASET_ROOT/meta/info.json"
test -f "$DATASET_ROOT/meta/episodes.jsonl"
test -f "$DATASET_ROOT/meta/tasks.jsonl"
test -f "$DATASET_ROOT/meta/modality.json"

python -m json.tool "$DATASET_ROOT/meta/modality.json" | head -80
head -3 "$DATASET_ROOT/meta/episodes.jsonl"
head -3 "$DATASET_ROOT/meta/tasks.jsonl"
find "$DATASET_ROOT/data" -type f | head
find "$DATASET_ROOT/videos" -type f | head

Quick Verification Script

mkdir -p tools
cat > tools/verify_groot_lerobot_dataset.py <<'PY'
import argparse
import json
from pathlib import Path

import pandas as pd

def read_jsonl(path):
    rows = []
    with path.open("r", encoding="utf-8") as f:
        for line in f:
            line = line.strip()
            if line:
                rows.append(json.loads(line))
    return rows

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("dataset")
    args = parser.parse_args()

    root = Path(args.dataset)
    meta = root / "meta"
    required = ["info.json", "episodes.jsonl", "tasks.jsonl", "modality.json"]
    for name in required:
        path = meta / name
        if not path.exists():
            raise SystemExit(f"missing {path}")

    modality = json.loads((meta / "modality.json").read_text())
    for section in ["state", "action"]:
        if section not in modality:
            raise SystemExit(f"modality.json missing {section}")

    episodes = read_jsonl(meta / "episodes.jsonl")
    tasks = read_jsonl(meta / "tasks.jsonl")
    parquet_files = sorted((root / "data").rglob("*.parquet"))
    video_files = sorted((root / "videos").rglob("*.mp4"))

    if not episodes:
        raise SystemExit("episodes.jsonl is empty")
    if not parquet_files:
        raise SystemExit("no parquet files under data/")

    print(f"episodes: {len(episodes)}")
    print(f"tasks: {len(tasks)}")
    print(f"parquet files: {len(parquet_files)}")
    print(f"video files: {len(video_files)}")
    print("state keys:", list(modality["state"].keys()))
    print("action keys:", list(modality["action"].keys()))

    sample = pd.read_parquet(parquet_files[0])
    print("sample parquet:", parquet_files[0])
    print("columns:", list(sample.columns)[:50])
    print("rows:", len(sample))
    print("OK")

if __name__ == "__main__":
    main()
PY

uv pip install pandas pyarrow
uv run python tools/verify_groot_lerobot_dataset.py "$DATASET_ROOT"

Expected output:

episodes: ...
tasks: ...
parquet files: ...
video files: ...
state keys: [...]
action keys: [...]
rows: ...
OK

Inspect modality.json

A raw whole-body dataset often uses slices like:

{
  "state": {
    "left_leg": { "start": 0, "end": 6 },
    "right_leg": { "start": 6, "end": 12 },
    "waist": { "start": 12, "end": 15 },
    "left_arm": { "start": 15, "end": 22 },
    "left_hand": { "start": 22, "end": 29 },
    "right_arm": { "start": 29, "end": 36 },
    "right_hand": { "start": 36, "end": 43 }
  },
  "action": {
    "left_leg": { "start": 0, "end": 6 },
    "right_leg": { "start": 6, "end": 12 }
  },
  "video": {
    "ego_view": {
      "original_key": "observation.images.ego_view"
    }
  }
}

For SONIC latent actions, the action is not a raw joint target. GR00T-WholeBodyControl docs describe the UNITREE_G1_SONIC action shape as 78 dimensions per inference step: 64-dimensional motion token plus 7 left hand and 7 right hand dimensions. If your dataset does not use this action space, do not force UNITREE_G1_SONIC.

If The Dataset Is LeRobot v3

Isaac-GR00T docs state that the current workflow uses LeRobot v2 and provides a v3-to-v2 conversion script:

uv run python scripts/lerobot_conversion/convert_v3_to_v2.py \
  --input-dir /path/to/lerobot_v3_dataset \
  --output-dir datasets/my_dataset_v2

Verify exact flags on your branch:

uv run python scripts/lerobot_conversion/convert_v3_to_v2.py --help

After conversion, check or create meta/modality.json.

1.3 Fine-Tuning

Goal

Run GR00T fine-tuning on the public dataset. There are two branches:

  • Dataset matches a built-in embodiment: use the built-in tag.
  • Dataset is custom G1/raw whole-body data: use NEW_EMBODIMENT plus a custom modality config.

Choose The Embodiment Tag

According to the policy API docs:

Case Embodiment tag
DROID relative EEF/joint OXE_DROID_RELATIVE_EEF_RELATIVE_JOINT
LIBERO Panda LIBERO_PANDA
G1 + SONIC latent WBC UNITREE_G1_SONIC
Custom robot/dataset NEW_EMBODIMENT

Practical rule:

If modality/action = 64-dim SONIC latent + hands:
  use UNITREE_G1_SONIC
If modality/action = raw joints / custom hands / custom base:
  use NEW_EMBODIMENT + custom config

Preflight Before Fine-Tuning

Before copying a long command, check the launcher and config first. This avoids most shape/tag failures:

cd Isaac-GR00T

uv run python gr00t/experiment/launch_finetune.py --help | \
  grep -E "dataset|embodiment|modality|base-model|max-steps|num-gpus"

export DATASET_ROOT=/abs/path/to/dataset
export MODALITY_CONFIG=/abs/path/to/configs/my_robot_config.py

test -d "$DATASET_ROOT"
test -f "$DATASET_ROOT/meta/modality.json"
test -f "$MODALITY_CONFIG"
python -m json.tool "$DATASET_ROOT/meta/modality.json" >/tmp/modality.pretty.json

If your Isaac-GR00T branch renamed a flag, follow that branch's --help output. Do not guess from stale commands.

Smoke-Test Fine-Tuning With NEW_EMBODIMENT

export NUM_GPUS=1
export DATASET_ROOT=/abs/path/to/dataset
export MODALITY_CONFIG=/abs/path/to/configs/my_robot_config.py
export OUT=/tmp/groot_public_g1_smoke

test -f "$MODALITY_CONFIG"

CUDA_VISIBLE_DEVICES=0 uv run python \
  gr00t/experiment/launch_finetune.py \
  --base-model-path nvidia/GR00T-N1.7-3B \
  --dataset-path "$DATASET_ROOT" \
  --embodiment-tag NEW_EMBODIMENT \
  --modality-config-path "$MODALITY_CONFIG" \
  --num-gpus $NUM_GPUS \
  --output-dir "$OUT" \
  --save-total-limit 2 \
  --save-steps 100 \
  --max-steps 100 \
  --global-batch-size 4 \
  --dataloader-num-workers 2

Do not use an SO100 or unrelated robot config for a G1 dataset. For Unitree G1 / Inspire / raw whole-body datasets, MODALITY_CONFIG must match meta/modality.json. If your repo branch does not contain examples/G1Inspire/g1_inspire_config.py, create your own config. This requires verification per dataset and branch.

Fine-Tuning With UNITREE_G1_SONIC

Use this only when the dataset is truly a SONIC latent action dataset:

export NUM_GPUS=4
export DATASET_ROOT=/abs/path/to/sonic_lerobot_dataset
export MODALITY_CONFIG=gr00t/configs/data/embodiment_configs.py
export OUT=/mnt/checkpoints/groot_g1_sonic_public

test -f "$DATASET_ROOT/meta/modality.json"
test -f "$MODALITY_CONFIG"

uv run torchrun --nproc_per_node=$NUM_GPUS --master_port=29500 \
  gr00t/experiment/launch_finetune.py \
  --base-model-path nvidia/GR00T-N1.7-3B \
  --dataset-path "$DATASET_ROOT" \
  --embodiment-tag UNITREE_G1_SONIC \
  --modality-config-path "$MODALITY_CONFIG" \
  --num-gpus $NUM_GPUS \
  --output-dir "$OUT" \
  --save-total-limit 5 \
  --save-steps 5000 \
  --max-steps 20000 \
  --use-wandb \
  --global-batch-size 32 \
  --color-jitter-params brightness 0.3 contrast 0.4 saturation 0.5 hue 0.08 \
  --dataloader-num-workers 4

If you use torchrun, the README notes that it should be launched with uv run torchrun so it uses the correct virtual environment.

Output Directory Example

/mnt/checkpoints/groot_g1_sonic_public/
├── checkpoint-5000/
├── checkpoint-10000/
├── checkpoint-15000/
├── checkpoint-20000/
├── config.json
├── processor_config.json
└── runs/ or wandb/

Done-Correct Criteria

  • Training passes the first 100-500 steps without loader errors.
  • GPU memory is stable.
  • Loss decreases or at least does not become NaN.
  • Checkpoints contain config, processor config, and model weights.
  • If using NEW_EMBODIMENT, the checkpoint/config can be loaded later without guessing modality keys again.

1.4 Inference And Evaluation

Open-Loop Evaluation

Open-loop evaluation does not prove real robot success, but it catches shape, modality, and normalization errors quickly.

uv run python gr00t/eval/open_loop_eval.py \
  --dataset-path "$DATASET_ROOT" \
  --embodiment-tag NEW_EMBODIMENT \
  --model-path "$OUT/checkpoint-100" \
  --traj-ids 0 \
  --action-horizon 16 \
  --steps 200 \
  --modality-keys single_arm gripper

--modality-keys must match your config. For raw whole-body G1, single_arm gripper is probably wrong. Check:

uv run python gr00t/eval/open_loop_eval.py --help

and inspect modality keys from your config and dataset.

PolicyServer For Whole-Body Deployment

If the checkpoint is UNITREE_G1_SONIC, start the server on the GPU machine:

uv run python gr00t/eval/run_gr00t_server.py \
  --model-path /mnt/checkpoints/groot_g1_sonic_public/checkpoint-20000 \
  --embodiment-tag UNITREE_G1_SONIC \
  --device cuda:0 \
  --port 5550

Then from GR00T-WholeBodyControl:

python gear_sonic/scripts/launch_inference.py \
  --policy-host <gpu_machine_ip> \
  --policy-port 5550 \
  --camera-host 192.168.123.164 \
  --prompt "pick up the soda can and place it in the bin"

Simulation:

python gear_sonic/scripts/launch_inference.py --sim \
  --policy-host 127.0.0.1 \
  --policy-port 5550 \
  --prompt "pick up the apple"

Done-Correct Criteria

  • run_gr00t_server.py loads the checkpoint without embodiment mismatch.
  • The client can ping or request an action.
  • Action shape is correct:
    • UNITREE_G1_SONIC: 78 dimensions per inference step according to docs.
    • Custom NEW_EMBODIMENT: action dimension matches modality.json.
  • Predicted actions are not NaN/Inf.
  • In simulation, the robot does not stay frozen because of all-zero actions or a control-loop mismatch.

Common Errors And Fixes

Error Likely cause Fix
missing meta/modality.json Dataset is LeRobot but not GR00T-LeRobot Create meta/modality.json with state/action/video slices.
unknown embodiment tag Tag does not exist in your repo version Check policy.md and --help; use NEW_EMBODIMENT for custom data.
shape mismatch Parquet state/action dimension does not match modality/config Print a sample parquet file and compare dimensions with modality.json.
video key not found Video key in modality.json does not match videos/ Fix original_key and folder names.
OOM during fine-tune Batch too large or GPU VRAM too low Reduce --global-batch-size, workers, shards, and use more GPUs.
Address already in use ZMQ/server port is occupied Use --port 5551 and match the client port.
UNITREE_G1_SONIC with raw joint actions Wrong action space Use NEW_EMBODIMENT or convert actions to the official SONIC latent space if you have that pipeline.

Sources

NT

Nguyễn Anh Tuấn

Robotics & AI Engineer. Building VnRobo — sharing knowledge about robot learning, VLA models, and automation.

Khám phá VnRobo

Related Posts

GR00T whole-body VLA data: có cần data real?
wholebody-vla

GR00T whole-body VLA data: có cần data real?

6/6/202613 min read
NT
GR00T whole-body VLA: train SONIC controller
wholebody-vla

GR00T whole-body VLA: train SONIC controller

6/6/20269 min read
NT
GR00T whole-body VLA data: sinh data sim
wholebody-vla

GR00T whole-body VLA data: sinh data sim

6/6/202614 min read
NT