wholebody-vlahumanoid-vlaisaac-labisaac-lab-mimicsynthetic-dataqarobomimicbehavior-cloninggr1t2

Synthetic Data and QA with Isaac Lab

An Isaac Lab Mimic workflow for filtering demos, expanding 5 demos into 1000 trials, and QA before BC training.

Nguyễn Anh TuấnJune 10, 202613 min read
Synthetic Data and QA with Isaac Lab

What this article gives you

In part 2, we focused on teleoperation that produces actions a policy can actually learn from. In part 3, we kept raw evidence in MCAP for replay and audit. In part 4, we organized training datasets into a LeRobot and Robo-DM data lake.

Part 5 adds the practical middle layer: synthetic data and QA with Isaac Lab Mimic. The goal is not to create random simulation data. The goal is to take a small number of successful teleoperation demonstrations, expand them into a much larger generated dataset, and reject bad data before it reaches behavior cloning or VLA fine-tuning.

The workflow is:

record_demos.py
  -> annotate_demos.py
  -> generate_dataset.py
  -> robomimic/train.py
  -> play.py + QA report

We will use the humanoid nut pouring example:

Isaac-NutPour-GR1T2-Pink-IK-Abs-v0
Isaac-NutPour-GR1T2-Pink-IK-Abs-Mimic-v0

Isaac Lab documents this as a GR1T2 task where the robot picks up the red beaker, pours the contents into the yellow bowl, drops the beaker into the blue bin, then places the bowl onto the white scale. The same documentation provides useful expectations: for the nut pouring visuomotor task, BC trained on 1000 generated demonstrations is typically around 50-60% success over 50 rollouts; for locomanipulation pick-and-place, data generation success is typically 65-82% over 1000 demonstrations, depending on demo quality, annotation quality, and hardware.

If you are entering the series here, the humanoid robot software stack and LeRobot ecosystem guide provide the broader context for why generated datasets still need raw logs, manifests, and versioning.

What "synthetic data" means here

In Isaac Lab Mimic, synthetic data does not mean diffusion-generated videos or random rendered images. It means data generation from real demonstrations. You collect a few successful demonstrations, split each trajectory into reusable subtasks, and let Mimic transform, stitch, and replay those subtask segments under new scene conditions.

For nut pouring, the task can be thought of as four phases:

Phase Operational meaning Why QA matters
Pick red beaker Approach and grasp the red beaker Jerky grasp demos produce many failed generated trials
Pour into yellow bowl Tilt the beaker so the object falls into the bowl Object pose and timing matter, not just end-effector pose
Drop beaker into blue bin Move the beaker to the bin and release Ending the annotation too early can make replay miss success
Place bowl on scale Put the yellow bowl on the white scale A small late-stage error can fail the full task

Mimic does not rescue bad demonstrations. It amplifies good demonstrations, and it also amplifies defects. The correct workflow is therefore conservative: collect more demos than you need, replay them, keep only smooth successes, annotate them carefully, generate with metrics, and train only after the generated dataset passes QA.

Step 1: Record 5 source demos

Isaac Lab uses record_demos.py to collect five source demonstrations for Isaac-NutPour-GR1T2-Pink-IK-Abs-v0. For the hand-tracking and Pink IK path, the reference command looks like this:

./isaaclab.sh -p scripts/tools/record_demos.py \
  --device cpu \
  --task Isaac-NutPour-GR1T2-Pink-IK-Abs-v0 \
  --teleop_device handtracking \
  --dataset_file ./datasets/dataset_gr1_nut_pouring.hdf5 \
  --num_demos 5 \
  --enable_pinocchio

For beginners, the important detail is that the recording task does not use the Mimic suffix. This is the original teleoperation environment. The output is an HDF5 file containing trajectory data, actions, observations, and environment metadata that the annotation script can read later.

Before annotation, run a small QA pass. Do not wait until after generating 1000 trials to discover that one source demo has a long pause or a broken grasp.

python - <<'PY'
import h5py
from pathlib import Path

path = Path("./datasets/dataset_gr1_nut_pouring.hdf5")
with h5py.File(path, "r") as f:
    demos = sorted(f["data"].keys())
    print("num demos:", len(demos))
    for key in demos:
        grp = f["data"][key]
        n = grp.attrs.get("num_samples", None)
        model = grp.attrs.get("model_file", None)
        print(key, "samples=", n, "has_model=", model is not None)
PY

Reject or recollect a source demo if you see these symptoms:

Symptom Why it is dangerous Practical fix
Episode is unusually long BC learns waiting, hesitation, and correction motions Re-record with a shorter path
Wrist pose jumps sharply Pink IK may lose tracking and create action spikes Slow down the operator motion, reset the XR anchor
Robot contacts the object too late The pick subtask becomes too long and hard to stitch Start closer to the object
Recording stops exactly at success Replay may fail to re-trigger the success condition Keep a short buffer after success
The wrong hand moves during a subtask Multi-EEF annotation can become ambiguous Re-record a cleaner demonstration

These checks match the common Isaac Lab Mimic pitfalls: overly long demos, non-smooth motion, pauses, too many subtasks, lack of action noise, tight recording endings, and non-deterministic replay.

Step 2: Annotate demos for Mimic

Annotation converts continuous trajectories into meaningful subtask segments. For nut pouring, use the task with the Mimic suffix and enable cameras because this is a visuomotor task:

./isaaclab.sh -p scripts/imitation_learning/isaaclab_mimic/annotate_demos.py \
  --device cpu \
  --enable_cameras \
  --rendering_mode balanced \
  --task Isaac-NutPour-GR1T2-Pink-IK-Abs-Mimic-v0 \
  --input_file ./datasets/dataset_gr1_nut_pouring.hdf5 \
  --output_file ./datasets/dataset_annotated_gr1_nut_pouring.hdf5 \
  --enable_pinocchio

Two mistakes are common:

  1. Forgetting --enable_cameras for a visuomotor task. The dataset may still be useful for state-based debugging, but it will not contain the visual inputs expected by a camera policy.
  2. Reusing the same action index for multiple right end-effector annotations. Isaac Lab explicitly warns that the nut pouring task has multiple right EEF annotations, so subtasks for the same EEF must use distinct action indices.

After annotation, inspect the file as a dataset, not just as a path that exists:

python - <<'PY'
import h5py

path = "./datasets/dataset_annotated_gr1_nut_pouring.hdf5"
with h5py.File(path, "r") as f:
    for demo in sorted(f["data"].keys()):
        grp = f["data"][demo]
        print("\n", demo)
        def walk(name, obj):
            if "subtask" in name.lower() or "datagen" in name.lower():
                print("  ", name, type(obj).__name__)
        grp.visititems(walk)
PY

The exact HDF5 group names can change across Isaac Lab versions, but the principle is stable: every accepted source demo needs enough subtask metadata for Mimic to generate from it. If a demo failed annotation, remove it or annotate it again. Do not leave "probably fine" demos in the source set. They reduce generation success and make debugging much harder.

Step 3: Understand DataGenConfig and SubTaskConfig

DataGenConfig controls the overall data generation run. In the Isaac Lab source, the important fields include:

Field Practical meaning
generation_guarantee If enabled, generation retries until the requested number of successful trials is reached, within failure limits
generation_keep_failed Keeps failed trials for debugging; useful for diagnosis, not for the main training set
max_num_failures Stops generation after too many failures
seed Makes a generation run easier to reproduce
source_dataset_path Path to the annotated source dataset
generation_path Path to the generated HDF5 output
generation_num_trials Number of trials to generate, for example 1000
generation_select_src_per_subtask Selects source data per subtask; useful only when temporal constraints are well defined
generation_interpolate_from_last_target_pose Interpolates from the previous target pose to reduce waypoint jumps
use_skillgen Enables the advanced skill-generation path
use_navigation_controller Enables navigation control for locomanipulation generation

SubTaskConfig is where generation quality is usually won or lost. A simplified skeleton:

from isaaclab.envs.mimic_env_cfg import DataGenConfig, SubTaskConfig, MimicEnvCfg

cfg = MimicEnvCfg()
cfg.datagen_config = DataGenConfig(
    name="nut_pour_gr1t2_qa",
    generation_guarantee=True,
    generation_keep_failed=False,
    max_num_failures=80,
    seed=42,
    generation_num_trials=1000,
)

cfg.subtask_configs = {
    "left_hand": [
        SubTaskConfig(
            object_ref="red_beaker",
            subtask_term_signal="left_grasp_beaker",
            selection_strategy="nearest_neighbor_object",
            action_noise=0.02,
            num_interpolation_steps=6,
        ),
    ],
    "right_hand": [
        SubTaskConfig(
            object_ref="yellow_bowl",
            subtask_term_signal="right_support_bowl",
            selection_strategy="nearest_neighbor_object",
            action_noise=0.015,
            num_interpolation_steps=8,
        ),
    ],
}

This is a reading skeleton, not a promise that your exact nut pouring task uses object keys named red_beaker or yellow_bowl. In a real environment, open the task config and use the actual object names. The stable idea is that subtasks should refer to the objects they depend on.

Parameter When to tune it Risk if wrong
object_ref The subtask depends on a specific object pose If it is None, object-based nearest-neighbor selection cannot work
selection_strategy='nearest_neighbor_object' Object poses are randomized and source segments should be selected by object similarity If the object reference is wrong, the selected source segment is irrelevant
action_noise You want robustness to small action perturbations Too much noise breaks grasping and pouring
num_interpolation_steps Motion jumps between subtask segments Too many steps can make motion slow and lengthen the horizon
subtask_term_offset_range Annotation ends too close to the success trigger Large offsets can bleed into the next subtask

Isaac Lab's source notes that nearest_neighbor_object and nearest_neighbor_robot_distance require object_ref to be set, and that these strategies often outperform random selection when the object reference is valid. For beginners, this is the reason not to copy a Mimic config blindly and only change the task name.

Step 4: Generate 1000 trials

After source demos pass QA and annotation is clean, generate the dataset:

./isaaclab.sh -p scripts/imitation_learning/isaaclab_mimic/generate_dataset.py \
  --device cpu \
  --headless \
  --enable_pinocchio \
  --enable_cameras \
  --rendering_mode balanced \
  --task Isaac-NutPour-GR1T2-Pink-IK-Abs-Mimic-v0 \
  --generation_num_trials 1000 \
  --num_envs 5 \
  --input_file ./datasets/dataset_annotated_gr1_nut_pouring.hdf5 \
  --output_file ./datasets/generated_dataset_gr1_nut_pouring.hdf5

At this stage, "the file exists" is not enough. Record the generation run in a manifest:

task: Isaac-NutPour-GR1T2-Pink-IK-Abs-Mimic-v0
source_dataset: dataset_annotated_gr1_nut_pouring.hdf5
generated_dataset: generated_dataset_gr1_nut_pouring.hdf5
source_demos: 5
target_trials: 1000
num_envs: 5
seed: 42
enable_cameras: true
enable_pinocchio: true
qa_owner: data_supervisor_01
qa_status: pending

Then produce a minimal QA report:

python - <<'PY'
import h5py
import numpy as np

path = "./datasets/generated_dataset_gr1_nut_pouring.hdf5"
lengths = []
with h5py.File(path, "r") as f:
    demos = sorted(f["data"].keys())
    for key in demos:
        n = int(f["data"][key].attrs.get("num_samples", 0))
        lengths.append(n)

print("demos:", len(lengths))
print("min/mean/max length:", min(lengths), np.mean(lengths), max(lengths))
print("p95 length:", np.percentile(lengths, 95))
PY

If the 95th percentile episode length is roughly twice the median, the dataset may contain too many wandering, paused, or over-corrected trials. If you requested 1000 trials but received far fewer, inspect the generation logs before training.

Step 5: Train BC with robomimic

Isaac Lab uses robomimic/train.py to train the visuomotor BC policy for nut pouring:

./isaaclab.sh -p scripts/imitation_learning/robomimic/train.py \
  --task Isaac-NutPour-GR1T2-Pink-IK-Abs-v0 \
  --algo bc \
  --normalize_training_actions \
  --dataset ./datasets/generated_dataset_gr1_nut_pouring.hdf5

The --normalize_training_actions flag matters. Isaac Lab saves the normalization parameters in the model log directory, and the same values must be passed back to play.py. Without this, the policy may train but produce actions at the wrong scale during rollout.

Robomimic treats an HDF5 dataset as the minibatch source for imitation learning and offline RL. BC is the supervised baseline: learn a mapping from observations to actions. For nut pouring, the observations can include camera images, hand poses, joint positions, object poses, and other fields exposed by the Isaac Lab task; actions correspond to target base, hand, and finger commands for the environment.

Evaluate multiple checkpoints, not just the final one:

./isaaclab.sh -p scripts/imitation_learning/robomimic/play.py \
  --task Isaac-NutPour-GR1T2-Pink-IK-Abs-v0 \
  --visualizer kit \
  --device cpu \
  --enable_cameras \
  --num_rollouts 50 \
  --horizon 350 \
  --norm_factor_min <NORM_FACTOR_MIN> \
  --norm_factor_max <NORM_FACTOR_MAX> \
  --checkpoint /PATH/TO/desired_model_checkpoint.pth

For nut pouring, Isaac Lab recommends 600 epochs on 1000 generated demonstrations and evaluating checkpoints between epochs 300 and 600. For locomanipulation, the recommendation is 2000 epochs and checkpoint selection between epochs 1000 and 2000.

Success-rate reference table

Use this as a QA anchor, not a product guarantee:

Workflow Dataset Expected metric How to interpret it
GR1T2 nut pouring visuomotor 1000 generated demos BC success 50-60% over 50 rollouts If much lower, inspect cameras, annotation, and action normalization
G1 locomanipulation pick-and-place 1000 generated demos Generation success 65-82% If lower, inspect source demos, subtask splits, object_ref, and interpolation
G1 locomanipulation BC 1000 generated demos, 2000 epochs BC success 75-85% over 50 rollouts A useful reference, not a universal SLA

If nut pouring reaches only 20%, the first explanation should not be "the model is too small." More often, the dataset is wrong: jerky demos, incorrect action indices, missing cameras, long horizons, or missing normalization parameters during rollout.

QA workflow for a humanoid data center

For a production data center, I would gate the dataset like this:

Gate Owner Pass condition
Raw demo QA Operator + data supervisor Successful, smooth demos with no long pauses and a short end buffer
Annotation QA Data supervisor Complete subtasks, no duplicate right-EEF action indices, no severe replay errors
Generation QA Simulation engineer Enough trials, success rate in the expected range, sane length distribution
Training QA ML engineer No NaNs, improving rollout metrics, multiple checkpoints compared
Release QA Tech lead Manifest, config, seed, source dataset, and checkpoint are versioned

A generated dataset should be promoted to the training data lake only when it has a manifest like this:

qa_status: approved
approved_by: simulation_lead
date: 2026-06-10
source_demo_count: 5
generated_demo_count: 1000
generation_success_rate: 0.78
train_algo: robomimic_bc
rollout_count: 50
best_checkpoint_epoch: 520
rollout_success_rate: 0.56
known_limitations:
  - "Fails when red beaker starts near workspace edge"
  - "Sensitive to camera exposure in reflective bowl scenes"

The benefit shows up in part 6 on evaluation and scaling: you do not have to guess which dataset trained which policy. When a policy fails, you can trace the model checkpoint back to the generated HDF5, the annotated demonstrations, the raw MCAP logs, and the operator notes.

Quick troubleshooting

Symptom Common cause Check
Generation keeps failing Source demos do not replay cleanly, or subtasks are too long Replay demos, reduce subtask count, recollect smoother demos
Low generation success despite good-looking demos Wrong object_ref, or random selection under heavy pose randomization Open the task config and use nearest_neighbor_object for object-dependent subtasks
Robot jumps between segments num_interpolation_steps is too low Increase from 5 to 8-10 for large pose transitions
Grasp degrades after adding noise action_noise is too high Lower noise for grasp and pour subtasks
BC trains but rollout is poor Missing action normalization parameters, or final checkpoint is not best Use normalization_params.txt and evaluate several checkpoints
Camera observations are missing --enable_cameras was omitted during annotation or generation Inspect HDF5 keys and run a dataloader smoke test

References

NT

Nguyễn Anh Tuấn

Robotics & AI Engineer. Building VnRobo — sharing knowledge about robot learning, VLA models, and automation.

Khám phá VnRobo

Related Posts

Chọn teleoperation stack cho humanoid
wholebody-vla

Chọn teleoperation stack cho humanoid

6/10/202616 min read
NT
Pilot 2 người cho dữ liệu humanoid VLA
wholebody-vla

Pilot 2 người cho dữ liệu humanoid VLA

6/10/202615 min read
NT
ROS 2 MCAP làm chuẩn raw log
wholebody-vla

ROS 2 MCAP làm chuẩn raw log

6/10/202616 min read
NT