Synthetic Data and QA with Isaac Lab

What this article gives you

In part 2, we focused on teleoperation that produces actions a policy can actually learn from. In part 3, we kept raw evidence in MCAP for replay and audit. In part 4, we organized training datasets into a LeRobot and Robo-DM data lake.

Part 5 adds the practical middle layer: synthetic data and QA with Isaac Lab Mimic. The goal is not to create random simulation data. The goal is to take a small number of successful teleoperation demonstrations, expand them into a much larger generated dataset, and reject bad data before it reaches behavior cloning or VLA fine-tuning.

The workflow is:

record_demos.py
  -> annotate_demos.py
  -> generate_dataset.py
  -> robomimic/train.py
  -> play.py + QA report

We will use the humanoid nut pouring example:

Isaac-NutPour-GR1T2-Pink-IK-Abs-v0
Isaac-NutPour-GR1T2-Pink-IK-Abs-Mimic-v0

Isaac Lab documents this as a GR1T2 task where the robot picks up the red beaker, pours the contents into the yellow bowl, drops the beaker into the blue bin, then places the bowl onto the white scale. The same documentation provides useful expectations: for the nut pouring visuomotor task, BC trained on 1000 generated demonstrations is typically around 50-60% success over 50 rollouts; for locomanipulation pick-and-place, data generation success is typically 65-82% over 1000 demonstrations, depending on demo quality, annotation quality, and hardware.

If you are entering the series here, the humanoid robot software stack and LeRobot ecosystem guide provide the broader context for why generated datasets still need raw logs, manifests, and versioning.

What "synthetic data" means here

In Isaac Lab Mimic, synthetic data does not mean diffusion-generated videos or random rendered images. It means data generation from real demonstrations. You collect a few successful demonstrations, split each trajectory into reusable subtasks, and let Mimic transform, stitch, and replay those subtask segments under new scene conditions.

For nut pouring, the task can be thought of as four phases:

Phase	Operational meaning	Why QA matters
Pick red beaker	Approach and grasp the red beaker	Jerky grasp demos produce many failed generated trials
Pour into yellow bowl	Tilt the beaker so the object falls into the bowl	Object pose and timing matter, not just end-effector pose
Drop beaker into blue bin	Move the beaker to the bin and release	Ending the annotation too early can make replay miss success
Place bowl on scale	Put the yellow bowl on the white scale	A small late-stage error can fail the full task

Mimic does not rescue bad demonstrations. It amplifies good demonstrations, and it also amplifies defects. The correct workflow is therefore conservative: collect more demos than you need, replay them, keep only smooth successes, annotate them carefully, generate with metrics, and train only after the generated dataset passes QA.

Step 1: Record 5 source demos

Isaac Lab uses record_demos.py to collect five source demonstrations for Isaac-NutPour-GR1T2-Pink-IK-Abs-v0. For the hand-tracking and Pink IK path, the reference command looks like this:

./isaaclab.sh -p scripts/tools/record_demos.py \
  --device cpu \
  --task Isaac-NutPour-GR1T2-Pink-IK-Abs-v0 \
  --teleop_device handtracking \
  --dataset_file ./datasets/dataset_gr1_nut_pouring.hdf5 \
  --num_demos 5 \
  --enable_pinocchio

For beginners, the important detail is that the recording task does not use the Mimic suffix. This is the original teleoperation environment. The output is an HDF5 file containing trajectory data, actions, observations, and environment metadata that the annotation script can read later.

Before annotation, run a small QA pass. Do not wait until after generating 1000 trials to discover that one source demo has a long pause or a broken grasp.

python - <<'PY'
import h5py
from pathlib import Path

path = Path("./datasets/dataset_gr1_nut_pouring.hdf5")
with h5py.File(path, "r") as f:
    demos = sorted(f["data"].keys())
    print("num demos:", len(demos))
    for key in demos:
        grp = f["data"][key]
        n = grp.attrs.get("num_samples", None)
        model = grp.attrs.get("model_file", None)
        print(key, "samples=", n, "has_model=", model is not None)
PY

Reject or recollect a source demo if you see these symptoms:

Symptom	Why it is dangerous	Practical fix
Episode is unusually long	BC learns waiting, hesitation, and correction motions	Re-record with a shorter path
Wrist pose jumps sharply	Pink IK may lose tracking and create action spikes	Slow down the operator motion, reset the XR anchor
Robot contacts the object too late	The pick subtask becomes too long and hard to stitch	Start closer to the object
Recording stops exactly at success	Replay may fail to re-trigger the success condition	Keep a short buffer after success
The wrong hand moves during a subtask	Multi-EEF annotation can become ambiguous	Re-record a cleaner demonstration

These checks match the common Isaac Lab Mimic pitfalls: overly long demos, non-smooth motion, pauses, too many subtasks, lack of action noise, tight recording endings, and non-deterministic replay.

Step 2: Annotate demos for Mimic

Annotation converts continuous trajectories into meaningful subtask segments. For nut pouring, use the task with the Mimic suffix and enable cameras because this is a visuomotor task:

./isaaclab.sh -p scripts/imitation_learning/isaaclab_mimic/annotate_demos.py \
  --device cpu \
  --enable_cameras \
  --rendering_mode balanced \
  --task Isaac-NutPour-GR1T2-Pink-IK-Abs-Mimic-v0 \
  --input_file ./datasets/dataset_gr1_nut_pouring.hdf5 \
  --output_file ./datasets/dataset_annotated_gr1_nut_pouring.hdf5 \
  --enable_pinocchio

Two mistakes are common:

Forgetting --enable_cameras for a visuomotor task. The dataset may still be useful for state-based debugging, but it will not contain the visual inputs expected by a camera policy.
Reusing the same action index for multiple right end-effector annotations. Isaac Lab explicitly warns that the nut pouring task has multiple right EEF annotations, so subtasks for the same EEF must use distinct action indices.

After annotation, inspect the file as a dataset, not just as a path that exists:

python - <<'PY'
import h5py

path = "./datasets/dataset_annotated_gr1_nut_pouring.hdf5"
with h5py.File(path, "r") as f:
    for demo in sorted(f["data"].keys()):
        grp = f["data"][demo]
        print("\n", demo)
        def walk(name, obj):
            if "subtask" in name.lower() or "datagen" in name.lower():
                print("  ", name, type(obj).__name__)
        grp.visititems(walk)
PY

The exact HDF5 group names can change across Isaac Lab versions, but the principle is stable: every accepted source demo needs enough subtask metadata for Mimic to generate from it. If a demo failed annotation, remove it or annotate it again. Do not leave "probably fine" demos in the source set. They reduce generation success and make debugging much harder.

Step 3: Understand `DataGenConfig` and `SubTaskConfig`

DataGenConfig controls the overall data generation run. In the Isaac Lab source, the important fields include:

Field	Practical meaning
`generation_guarantee`	If enabled, generation retries until the requested number of successful trials is reached, within failure limits
`generation_keep_failed`	Keeps failed trials for debugging; useful for diagnosis, not for the main training set
`max_num_failures`	Stops generation after too many failures
`seed`	Makes a generation run easier to reproduce
`source_dataset_path`	Path to the annotated source dataset
`generation_path`	Path to the generated HDF5 output
`generation_num_trials`	Number of trials to generate, for example 1000
`generation_select_src_per_subtask`	Selects source data per subtask; useful only when temporal constraints are well defined
`generation_interpolate_from_last_target_pose`	Interpolates from the previous target pose to reduce waypoint jumps
`use_skillgen`	Enables the advanced skill-generation path
`use_navigation_controller`	Enables navigation control for locomanipulation generation

SubTaskConfig is where generation quality is usually won or lost. A simplified skeleton:

from isaaclab.envs.mimic_env_cfg import DataGenConfig, SubTaskConfig, MimicEnvCfg

cfg = MimicEnvCfg()
cfg.datagen_config = DataGenConfig(
    name="nut_pour_gr1t2_qa",
    generation_guarantee=True,
    generation_keep_failed=False,
    max_num_failures=80,
    seed=42,
    generation_num_trials=1000,
)

cfg.subtask_configs = {
    "left_hand": [
        SubTaskConfig(
            object_ref="red_beaker",
            subtask_term_signal="left_grasp_beaker",
            selection_strategy="nearest_neighbor_object",
            action_noise=0.02,
            num_interpolation_steps=6,
        ),
    ],
    "right_hand": [
        SubTaskConfig(
            object_ref="yellow_bowl",
            subtask_term_signal="right_support_bowl",
            selection_strategy="nearest_neighbor_object",
            action_noise=0.015,
            num_interpolation_steps=8,
        ),
    ],
}

This is a reading skeleton, not a promise that your exact nut pouring task uses object keys named red_beaker or yellow_bowl. In a real environment, open the task config and use the actual object names. The stable idea is that subtasks should refer to the objects they depend on.

Parameter	When to tune it	Risk if wrong
`object_ref`	The subtask depends on a specific object pose	If it is `None`, object-based nearest-neighbor selection cannot work
`selection_strategy='nearest_neighbor_object'`	Object poses are randomized and source segments should be selected by object similarity	If the object reference is wrong, the selected source segment is irrelevant
`action_noise`	You want robustness to small action perturbations	Too much noise breaks grasping and pouring
`num_interpolation_steps`	Motion jumps between subtask segments	Too many steps can make motion slow and lengthen the horizon
`subtask_term_offset_range`	Annotation ends too close to the success trigger	Large offsets can bleed into the next subtask

Isaac Lab's source notes that nearest_neighbor_object and nearest_neighbor_robot_distance require object_ref to be set, and that these strategies often outperform random selection when the object reference is valid. For beginners, this is the reason not to copy a Mimic config blindly and only change the task name.

Step 4: Generate 1000 trials

After source demos pass QA and annotation is clean, generate the dataset:

./isaaclab.sh -p scripts/imitation_learning/isaaclab_mimic/generate_dataset.py \
  --device cpu \
  --headless \
  --enable_pinocchio \
  --enable_cameras \
  --rendering_mode balanced \
  --task Isaac-NutPour-GR1T2-Pink-IK-Abs-Mimic-v0 \
  --generation_num_trials 1000 \
  --num_envs 5 \
  --input_file ./datasets/dataset_annotated_gr1_nut_pouring.hdf5 \
  --output_file ./datasets/generated_dataset_gr1_nut_pouring.hdf5

At this stage, "the file exists" is not enough. Record the generation run in a manifest:

task: Isaac-NutPour-GR1T2-Pink-IK-Abs-Mimic-v0
source_dataset: dataset_annotated_gr1_nut_pouring.hdf5
generated_dataset: generated_dataset_gr1_nut_pouring.hdf5
source_demos: 5
target_trials: 1000
num_envs: 5
seed: 42
enable_cameras: true
enable_pinocchio: true
qa_owner: data_supervisor_01
qa_status: pending

Then produce a minimal QA report:

python - <<'PY'
import h5py
import numpy as np

path = "./datasets/generated_dataset_gr1_nut_pouring.hdf5"
lengths = []
with h5py.File(path, "r") as f:
    demos = sorted(f["data"].keys())
    for key in demos:
        n = int(f["data"][key].attrs.get("num_samples", 0))
        lengths.append(n)

print("demos:", len(lengths))
print("min/mean/max length:", min(lengths), np.mean(lengths), max(lengths))
print("p95 length:", np.percentile(lengths, 95))
PY

If the 95th percentile episode length is roughly twice the median, the dataset may contain too many wandering, paused, or over-corrected trials. If you requested 1000 trials but received far fewer, inspect the generation logs before training.

Step 5: Train BC with robomimic

Isaac Lab uses robomimic/train.py to train the visuomotor BC policy for nut pouring:

./isaaclab.sh -p scripts/imitation_learning/robomimic/train.py \
  --task Isaac-NutPour-GR1T2-Pink-IK-Abs-v0 \
  --algo bc \
  --normalize_training_actions \
  --dataset ./datasets/generated_dataset_gr1_nut_pouring.hdf5

The --normalize_training_actions flag matters. Isaac Lab saves the normalization parameters in the model log directory, and the same values must be passed back to play.py. Without this, the policy may train but produce actions at the wrong scale during rollout.

Robomimic treats an HDF5 dataset as the minibatch source for imitation learning and offline RL. BC is the supervised baseline: learn a mapping from observations to actions. For nut pouring, the observations can include camera images, hand poses, joint positions, object poses, and other fields exposed by the Isaac Lab task; actions correspond to target base, hand, and finger commands for the environment.

Evaluate multiple checkpoints, not just the final one:

./isaaclab.sh -p scripts/imitation_learning/robomimic/play.py \
  --task Isaac-NutPour-GR1T2-Pink-IK-Abs-v0 \
  --visualizer kit \
  --device cpu \
  --enable_cameras \
  --num_rollouts 50 \
  --horizon 350 \
  --norm_factor_min <NORM_FACTOR_MIN> \
  --norm_factor_max <NORM_FACTOR_MAX> \
  --checkpoint /PATH/TO/desired_model_checkpoint.pth

For nut pouring, Isaac Lab recommends 600 epochs on 1000 generated demonstrations and evaluating checkpoints between epochs 300 and 600. For locomanipulation, the recommendation is 2000 epochs and checkpoint selection between epochs 1000 and 2000.

Success-rate reference table

Use this as a QA anchor, not a product guarantee:

Workflow	Dataset	Expected metric	How to interpret it
GR1T2 nut pouring visuomotor	1000 generated demos	BC success 50-60% over 50 rollouts	If much lower, inspect cameras, annotation, and action normalization
G1 locomanipulation pick-and-place	1000 generated demos	Generation success 65-82%	If lower, inspect source demos, subtask splits, `object_ref`, and interpolation
G1 locomanipulation BC	1000 generated demos, 2000 epochs	BC success 75-85% over 50 rollouts	A useful reference, not a universal SLA

If nut pouring reaches only 20%, the first explanation should not be "the model is too small." More often, the dataset is wrong: jerky demos, incorrect action indices, missing cameras, long horizons, or missing normalization parameters during rollout.

QA workflow for a humanoid data center

For a production data center, I would gate the dataset like this:

Gate	Owner	Pass condition
Raw demo QA	Operator + data supervisor	Successful, smooth demos with no long pauses and a short end buffer
Annotation QA	Data supervisor	Complete subtasks, no duplicate right-EEF action indices, no severe replay errors
Generation QA	Simulation engineer	Enough trials, success rate in the expected range, sane length distribution
Training QA	ML engineer	No NaNs, improving rollout metrics, multiple checkpoints compared
Release QA	Tech lead	Manifest, config, seed, source dataset, and checkpoint are versioned

A generated dataset should be promoted to the training data lake only when it has a manifest like this:

qa_status: approved
approved_by: simulation_lead
date: 2026-06-10
source_demo_count: 5
generated_demo_count: 1000
generation_success_rate: 0.78
train_algo: robomimic_bc
rollout_count: 50
best_checkpoint_epoch: 520
rollout_success_rate: 0.56
known_limitations:
  - "Fails when red beaker starts near workspace edge"
  - "Sensitive to camera exposure in reflective bowl scenes"

The benefit shows up in part 6 on evaluation and scaling: you do not have to guess which dataset trained which policy. When a policy fails, you can trace the model checkpoint back to the generated HDF5, the annotated demonstrations, the raw MCAP logs, and the operator notes.

Quick troubleshooting

Symptom	Common cause	Check
Generation keeps failing	Source demos do not replay cleanly, or subtasks are too long	Replay demos, reduce subtask count, recollect smoother demos
Low generation success despite good-looking demos	Wrong `object_ref`, or `random` selection under heavy pose randomization	Open the task config and use `nearest_neighbor_object` for object-dependent subtasks
Robot jumps between segments	`num_interpolation_steps` is too low	Increase from 5 to 8-10 for large pose transitions
Grasp degrades after adding noise	`action_noise` is too high	Lower noise for grasp and pour subtasks
BC trains but rollout is poor	Missing action normalization parameters, or final checkpoint is not best	Use `normalization_params.txt` and evaluate several checkpoints
Camera observations are missing	`--enable_cameras` was omitted during annotation or generation	Inspect HDF5 keys and run a dataloader smoke test

References

What this article gives you

The workflow is:

record_demos.py
  -> annotate_demos.py
  -> generate_dataset.py
  -> robomimic/train.py
  -> play.py + QA report

We will use the humanoid nut pouring example:

Isaac-NutPour-GR1T2-Pink-IK-Abs-v0
Isaac-NutPour-GR1T2-Pink-IK-Abs-Mimic-v0

What "synthetic data" means here

For nut pouring, the task can be thought of as four phases:

Phase	Operational meaning	Why QA matters
Pick red beaker	Approach and grasp the red beaker	Jerky grasp demos produce many failed generated trials
Pour into yellow bowl	Tilt the beaker so the object falls into the bowl	Object pose and timing matter, not just end-effector pose
Drop beaker into blue bin	Move the beaker to the bin and release	Ending the annotation too early can make replay miss success
Place bowl on scale	Put the yellow bowl on the white scale	A small late-stage error can fail the full task

Step 1: Record 5 source demos

Isaac Lab uses record_demos.py to collect five source demonstrations for Isaac-NutPour-GR1T2-Pink-IK-Abs-v0. For the hand-tracking and Pink IK path, the reference command looks like this:

./isaaclab.sh -p scripts/tools/record_demos.py \
  --device cpu \
  --task Isaac-NutPour-GR1T2-Pink-IK-Abs-v0 \
  --teleop_device handtracking \
  --dataset_file ./datasets/dataset_gr1_nut_pouring.hdf5 \
  --num_demos 5 \
  --enable_pinocchio

Before annotation, run a small QA pass. Do not wait until after generating 1000 trials to discover that one source demo has a long pause or a broken grasp.

python - <<'PY'
import h5py
from pathlib import Path

path = Path("./datasets/dataset_gr1_nut_pouring.hdf5")
with h5py.File(path, "r") as f:
    demos = sorted(f["data"].keys())
    print("num demos:", len(demos))
    for key in demos:
        grp = f["data"][key]
        n = grp.attrs.get("num_samples", None)
        model = grp.attrs.get("model_file", None)
        print(key, "samples=", n, "has_model=", model is not None)
PY

Reject or recollect a source demo if you see these symptoms:

Symptom	Why it is dangerous	Practical fix
Episode is unusually long	BC learns waiting, hesitation, and correction motions	Re-record with a shorter path
Wrist pose jumps sharply	Pink IK may lose tracking and create action spikes	Slow down the operator motion, reset the XR anchor
Robot contacts the object too late	The pick subtask becomes too long and hard to stitch	Start closer to the object
Recording stops exactly at success	Replay may fail to re-trigger the success condition	Keep a short buffer after success
The wrong hand moves during a subtask	Multi-EEF annotation can become ambiguous	Re-record a cleaner demonstration

These checks match the common Isaac Lab Mimic pitfalls: overly long demos, non-smooth motion, pauses, too many subtasks, lack of action noise, tight recording endings, and non-deterministic replay.

Step 2: Annotate demos for Mimic

Annotation converts continuous trajectories into meaningful subtask segments. For nut pouring, use the task with the Mimic suffix and enable cameras because this is a visuomotor task:

./isaaclab.sh -p scripts/imitation_learning/isaaclab_mimic/annotate_demos.py \
  --device cpu \
  --enable_cameras \
  --rendering_mode balanced \
  --task Isaac-NutPour-GR1T2-Pink-IK-Abs-Mimic-v0 \
  --input_file ./datasets/dataset_gr1_nut_pouring.hdf5 \
  --output_file ./datasets/dataset_annotated_gr1_nut_pouring.hdf5 \
  --enable_pinocchio

Two mistakes are common:

Forgetting --enable_cameras for a visuomotor task. The dataset may still be useful for state-based debugging, but it will not contain the visual inputs expected by a camera policy.
Reusing the same action index for multiple right end-effector annotations. Isaac Lab explicitly warns that the nut pouring task has multiple right EEF annotations, so subtasks for the same EEF must use distinct action indices.

After annotation, inspect the file as a dataset, not just as a path that exists:

python - <<'PY'
import h5py

path = "./datasets/dataset_annotated_gr1_nut_pouring.hdf5"
with h5py.File(path, "r") as f:
    for demo in sorted(f["data"].keys()):
        grp = f["data"][demo]
        print("\n", demo)
        def walk(name, obj):
            if "subtask" in name.lower() or "datagen" in name.lower():
                print("  ", name, type(obj).__name__)
        grp.visititems(walk)
PY

Step 3: Understand `DataGenConfig` and `SubTaskConfig`

DataGenConfig controls the overall data generation run. In the Isaac Lab source, the important fields include:

Field	Practical meaning
`generation_guarantee`	If enabled, generation retries until the requested number of successful trials is reached, within failure limits
`generation_keep_failed`	Keeps failed trials for debugging; useful for diagnosis, not for the main training set
`max_num_failures`	Stops generation after too many failures
`seed`	Makes a generation run easier to reproduce
`source_dataset_path`	Path to the annotated source dataset
`generation_path`	Path to the generated HDF5 output
`generation_num_trials`	Number of trials to generate, for example 1000
`generation_select_src_per_subtask`	Selects source data per subtask; useful only when temporal constraints are well defined
`generation_interpolate_from_last_target_pose`	Interpolates from the previous target pose to reduce waypoint jumps
`use_skillgen`	Enables the advanced skill-generation path
`use_navigation_controller`	Enables navigation control for locomanipulation generation

SubTaskConfig is where generation quality is usually won or lost. A simplified skeleton:

from isaaclab.envs.mimic_env_cfg import DataGenConfig, SubTaskConfig, MimicEnvCfg

cfg = MimicEnvCfg()
cfg.datagen_config = DataGenConfig(
    name="nut_pour_gr1t2_qa",
    generation_guarantee=True,
    generation_keep_failed=False,
    max_num_failures=80,
    seed=42,
    generation_num_trials=1000,
)

cfg.subtask_configs = {
    "left_hand": [
        SubTaskConfig(
            object_ref="red_beaker",
            subtask_term_signal="left_grasp_beaker",
            selection_strategy="nearest_neighbor_object",
            action_noise=0.02,
            num_interpolation_steps=6,
        ),
    ],
    "right_hand": [
        SubTaskConfig(
            object_ref="yellow_bowl",
            subtask_term_signal="right_support_bowl",
            selection_strategy="nearest_neighbor_object",
            action_noise=0.015,
            num_interpolation_steps=8,
        ),
    ],
}

Parameter	When to tune it	Risk if wrong
`object_ref`	The subtask depends on a specific object pose	If it is `None`, object-based nearest-neighbor selection cannot work
`selection_strategy='nearest_neighbor_object'`	Object poses are randomized and source segments should be selected by object similarity	If the object reference is wrong, the selected source segment is irrelevant
`action_noise`	You want robustness to small action perturbations	Too much noise breaks grasping and pouring
`num_interpolation_steps`	Motion jumps between subtask segments	Too many steps can make motion slow and lengthen the horizon
`subtask_term_offset_range`	Annotation ends too close to the success trigger	Large offsets can bleed into the next subtask

Step 4: Generate 1000 trials

After source demos pass QA and annotation is clean, generate the dataset:

./isaaclab.sh -p scripts/imitation_learning/isaaclab_mimic/generate_dataset.py \
  --device cpu \
  --headless \
  --enable_pinocchio \
  --enable_cameras \
  --rendering_mode balanced \
  --task Isaac-NutPour-GR1T2-Pink-IK-Abs-Mimic-v0 \
  --generation_num_trials 1000 \
  --num_envs 5 \
  --input_file ./datasets/dataset_annotated_gr1_nut_pouring.hdf5 \
  --output_file ./datasets/generated_dataset_gr1_nut_pouring.hdf5

At this stage, "the file exists" is not enough. Record the generation run in a manifest:

task: Isaac-NutPour-GR1T2-Pink-IK-Abs-Mimic-v0
source_dataset: dataset_annotated_gr1_nut_pouring.hdf5
generated_dataset: generated_dataset_gr1_nut_pouring.hdf5
source_demos: 5
target_trials: 1000
num_envs: 5
seed: 42
enable_cameras: true
enable_pinocchio: true
qa_owner: data_supervisor_01
qa_status: pending

Then produce a minimal QA report:

python - <<'PY'
import h5py
import numpy as np

path = "./datasets/generated_dataset_gr1_nut_pouring.hdf5"
lengths = []
with h5py.File(path, "r") as f:
    demos = sorted(f["data"].keys())
    for key in demos:
        n = int(f["data"][key].attrs.get("num_samples", 0))
        lengths.append(n)

print("demos:", len(lengths))
print("min/mean/max length:", min(lengths), np.mean(lengths), max(lengths))
print("p95 length:", np.percentile(lengths, 95))
PY

Step 5: Train BC with robomimic

Isaac Lab uses robomimic/train.py to train the visuomotor BC policy for nut pouring:

./isaaclab.sh -p scripts/imitation_learning/robomimic/train.py \
  --task Isaac-NutPour-GR1T2-Pink-IK-Abs-v0 \
  --algo bc \
  --normalize_training_actions \
  --dataset ./datasets/generated_dataset_gr1_nut_pouring.hdf5

Evaluate multiple checkpoints, not just the final one:

./isaaclab.sh -p scripts/imitation_learning/robomimic/play.py \
  --task Isaac-NutPour-GR1T2-Pink-IK-Abs-v0 \
  --visualizer kit \
  --device cpu \
  --enable_cameras \
  --num_rollouts 50 \
  --horizon 350 \
  --norm_factor_min <NORM_FACTOR_MIN> \
  --norm_factor_max <NORM_FACTOR_MAX> \
  --checkpoint /PATH/TO/desired_model_checkpoint.pth

Success-rate reference table

Use this as a QA anchor, not a product guarantee:

Workflow	Dataset	Expected metric	How to interpret it
GR1T2 nut pouring visuomotor	1000 generated demos	BC success 50-60% over 50 rollouts	If much lower, inspect cameras, annotation, and action normalization
G1 locomanipulation pick-and-place	1000 generated demos	Generation success 65-82%	If lower, inspect source demos, subtask splits, `object_ref`, and interpolation
G1 locomanipulation BC	1000 generated demos, 2000 epochs	BC success 75-85% over 50 rollouts	A useful reference, not a universal SLA

QA workflow for a humanoid data center

For a production data center, I would gate the dataset like this:

Gate	Owner	Pass condition
Raw demo QA	Operator + data supervisor	Successful, smooth demos with no long pauses and a short end buffer
Annotation QA	Data supervisor	Complete subtasks, no duplicate right-EEF action indices, no severe replay errors
Generation QA	Simulation engineer	Enough trials, success rate in the expected range, sane length distribution
Training QA	ML engineer	No NaNs, improving rollout metrics, multiple checkpoints compared
Release QA	Tech lead	Manifest, config, seed, source dataset, and checkpoint are versioned

A generated dataset should be promoted to the training data lake only when it has a manifest like this:

qa_status: approved
approved_by: simulation_lead
date: 2026-06-10
source_demo_count: 5
generated_demo_count: 1000
generation_success_rate: 0.78
train_algo: robomimic_bc
rollout_count: 50
best_checkpoint_epoch: 520
rollout_success_rate: 0.56
known_limitations:
  - "Fails when red beaker starts near workspace edge"
  - "Sensitive to camera exposure in reflective bowl scenes"

Quick troubleshooting

Symptom	Common cause	Check
Generation keeps failing	Source demos do not replay cleanly, or subtasks are too long	Replay demos, reduce subtask count, recollect smoother demos
Low generation success despite good-looking demos	Wrong `object_ref`, or `random` selection under heavy pose randomization	Open the task config and use `nearest_neighbor_object` for object-dependent subtasks
Robot jumps between segments	`num_interpolation_steps` is too low	Increase from 5 to 8-10 for large pose transitions
Grasp degrades after adding noise	`action_noise` is too high	Lower noise for grasp and pour subtasks
BC trains but rollout is poor	Missing action normalization parameters, or final checkpoint is not best	Use `normalization_params.txt` and evaluate several checkpoints
Camera observations are missing	`--enable_cameras` was omitted during annotation or generation	Inspect HDF5 keys and run a dataloader smoke test

Synthetic Data and QA with Isaac Lab

What this article gives you

What "synthetic data" means here

Step 1: Record 5 source demos

Step 2: Annotate demos for Mimic

Step 3: Understand `DataGenConfig` and `SubTaskConfig`

Step 4: Generate 1000 trials

Step 5: Train BC with robomimic

Success-rate reference table

QA workflow for a humanoid data center

Quick troubleshooting

References

Nguyễn Anh Tuấn

Related Posts

Chọn teleoperation stack cho humanoid

ROS 2 MCAP làm chuẩn raw log

LeRobotDataset và Robo-DM cho data lake

Synthetic Data and QA with Isaac Lab

What this article gives you

What "synthetic data" means here

Step 1: Record 5 source demos

Step 2: Annotate demos for Mimic

Step 3: Understand `DataGenConfig` and `SubTaskConfig`

Step 4: Generate 1000 trials

Step 5: Train BC with robomimic

Success-rate reference table

QA workflow for a humanoid data center

Quick troubleshooting

References

Nguyễn Anh Tuấn

Related Posts

Chọn teleoperation stack cho humanoid

ROS 2 MCAP làm chuẩn raw log

LeRobotDataset và Robo-DM cho data lake

What this article gives you

What "synthetic data" means here

Step 1: Record 5 source demos

Step 2: Annotate demos for Mimic

Step 3: Understand DataGenConfig and SubTaskConfig

Step 4: Generate 1000 trials

Step 5: Train BC with robomimic

Success-rate reference table

QA workflow for a humanoid data center

Quick troubleshooting

References

Related Posts

Nguyễn Anh Tuấn

Related Posts

Chọn teleoperation stack cho humanoid

ROS 2 MCAP làm chuẩn raw log

LeRobotDataset và Robo-DM cho data lake

What this article gives you

What "synthetic data" means here

Step 1: Record 5 source demos

Step 2: Annotate demos for Mimic

Step 3: Understand DataGenConfig and SubTaskConfig

Step 4: Generate 1000 trials

Step 5: Train BC with robomimic

Success-rate reference table

QA workflow for a humanoid data center

Quick troubleshooting

References

Related Posts

Nguyễn Anh Tuấn

Related Posts

Chọn teleoperation stack cho humanoid

ROS 2 MCAP làm chuẩn raw log

LeRobotDataset và Robo-DM cho data lake

Step 3: Understand `DataGenConfig` and `SubTaskConfig`

Step 3: Understand `DataGenConfig` and `SubTaskConfig`