What this article gives you
In part 2, we focused on teleoperation that produces actions a policy can actually learn from. In part 3, we kept raw evidence in MCAP for replay and audit. In part 4, we organized training datasets into a LeRobot and Robo-DM data lake.
Part 5 adds the practical middle layer: synthetic data and QA with Isaac Lab Mimic. The goal is not to create random simulation data. The goal is to take a small number of successful teleoperation demonstrations, expand them into a much larger generated dataset, and reject bad data before it reaches behavior cloning or VLA fine-tuning.
The workflow is:
record_demos.py
-> annotate_demos.py
-> generate_dataset.py
-> robomimic/train.py
-> play.py + QA report
We will use the humanoid nut pouring example:
Isaac-NutPour-GR1T2-Pink-IK-Abs-v0
Isaac-NutPour-GR1T2-Pink-IK-Abs-Mimic-v0
Isaac Lab documents this as a GR1T2 task where the robot picks up the red beaker, pours the contents into the yellow bowl, drops the beaker into the blue bin, then places the bowl onto the white scale. The same documentation provides useful expectations: for the nut pouring visuomotor task, BC trained on 1000 generated demonstrations is typically around 50-60% success over 50 rollouts; for locomanipulation pick-and-place, data generation success is typically 65-82% over 1000 demonstrations, depending on demo quality, annotation quality, and hardware.
If you are entering the series here, the humanoid robot software stack and LeRobot ecosystem guide provide the broader context for why generated datasets still need raw logs, manifests, and versioning.
What "synthetic data" means here
In Isaac Lab Mimic, synthetic data does not mean diffusion-generated videos or random rendered images. It means data generation from real demonstrations. You collect a few successful demonstrations, split each trajectory into reusable subtasks, and let Mimic transform, stitch, and replay those subtask segments under new scene conditions.
For nut pouring, the task can be thought of as four phases:
| Phase | Operational meaning | Why QA matters |
|---|---|---|
| Pick red beaker | Approach and grasp the red beaker | Jerky grasp demos produce many failed generated trials |
| Pour into yellow bowl | Tilt the beaker so the object falls into the bowl | Object pose and timing matter, not just end-effector pose |
| Drop beaker into blue bin | Move the beaker to the bin and release | Ending the annotation too early can make replay miss success |
| Place bowl on scale | Put the yellow bowl on the white scale | A small late-stage error can fail the full task |
Mimic does not rescue bad demonstrations. It amplifies good demonstrations, and it also amplifies defects. The correct workflow is therefore conservative: collect more demos than you need, replay them, keep only smooth successes, annotate them carefully, generate with metrics, and train only after the generated dataset passes QA.
Step 1: Record 5 source demos
Isaac Lab uses record_demos.py to collect five source demonstrations for Isaac-NutPour-GR1T2-Pink-IK-Abs-v0. For the hand-tracking and Pink IK path, the reference command looks like this:
./isaaclab.sh -p scripts/tools/record_demos.py \
--device cpu \
--task Isaac-NutPour-GR1T2-Pink-IK-Abs-v0 \
--teleop_device handtracking \
--dataset_file ./datasets/dataset_gr1_nut_pouring.hdf5 \
--num_demos 5 \
--enable_pinocchio
For beginners, the important detail is that the recording task does not use the Mimic suffix. This is the original teleoperation environment. The output is an HDF5 file containing trajectory data, actions, observations, and environment metadata that the annotation script can read later.
Before annotation, run a small QA pass. Do not wait until after generating 1000 trials to discover that one source demo has a long pause or a broken grasp.
python - <<'PY'
import h5py
from pathlib import Path
path = Path("./datasets/dataset_gr1_nut_pouring.hdf5")
with h5py.File(path, "r") as f:
demos = sorted(f["data"].keys())
print("num demos:", len(demos))
for key in demos:
grp = f["data"][key]
n = grp.attrs.get("num_samples", None)
model = grp.attrs.get("model_file", None)
print(key, "samples=", n, "has_model=", model is not None)
PY
Reject or recollect a source demo if you see these symptoms:
| Symptom | Why it is dangerous | Practical fix |
|---|---|---|
| Episode is unusually long | BC learns waiting, hesitation, and correction motions | Re-record with a shorter path |
| Wrist pose jumps sharply | Pink IK may lose tracking and create action spikes | Slow down the operator motion, reset the XR anchor |
| Robot contacts the object too late | The pick subtask becomes too long and hard to stitch | Start closer to the object |
| Recording stops exactly at success | Replay may fail to re-trigger the success condition | Keep a short buffer after success |
| The wrong hand moves during a subtask | Multi-EEF annotation can become ambiguous | Re-record a cleaner demonstration |
These checks match the common Isaac Lab Mimic pitfalls: overly long demos, non-smooth motion, pauses, too many subtasks, lack of action noise, tight recording endings, and non-deterministic replay.
Step 2: Annotate demos for Mimic
Annotation converts continuous trajectories into meaningful subtask segments. For nut pouring, use the task with the Mimic suffix and enable cameras because this is a visuomotor task:
./isaaclab.sh -p scripts/imitation_learning/isaaclab_mimic/annotate_demos.py \
--device cpu \
--enable_cameras \
--rendering_mode balanced \
--task Isaac-NutPour-GR1T2-Pink-IK-Abs-Mimic-v0 \
--input_file ./datasets/dataset_gr1_nut_pouring.hdf5 \
--output_file ./datasets/dataset_annotated_gr1_nut_pouring.hdf5 \
--enable_pinocchio
Two mistakes are common:
- Forgetting
--enable_camerasfor a visuomotor task. The dataset may still be useful for state-based debugging, but it will not contain the visual inputs expected by a camera policy. - Reusing the same action index for multiple right end-effector annotations. Isaac Lab explicitly warns that the nut pouring task has multiple right EEF annotations, so subtasks for the same EEF must use distinct action indices.
After annotation, inspect the file as a dataset, not just as a path that exists:
python - <<'PY'
import h5py
path = "./datasets/dataset_annotated_gr1_nut_pouring.hdf5"
with h5py.File(path, "r") as f:
for demo in sorted(f["data"].keys()):
grp = f["data"][demo]
print("\n", demo)
def walk(name, obj):
if "subtask" in name.lower() or "datagen" in name.lower():
print(" ", name, type(obj).__name__)
grp.visititems(walk)
PY
The exact HDF5 group names can change across Isaac Lab versions, but the principle is stable: every accepted source demo needs enough subtask metadata for Mimic to generate from it. If a demo failed annotation, remove it or annotate it again. Do not leave "probably fine" demos in the source set. They reduce generation success and make debugging much harder.
Step 3: Understand DataGenConfig and SubTaskConfig
DataGenConfig controls the overall data generation run. In the Isaac Lab source, the important fields include:
| Field | Practical meaning |
|---|---|
generation_guarantee |
If enabled, generation retries until the requested number of successful trials is reached, within failure limits |
generation_keep_failed |
Keeps failed trials for debugging; useful for diagnosis, not for the main training set |
max_num_failures |
Stops generation after too many failures |
seed |
Makes a generation run easier to reproduce |
source_dataset_path |
Path to the annotated source dataset |
generation_path |
Path to the generated HDF5 output |
generation_num_trials |
Number of trials to generate, for example 1000 |
generation_select_src_per_subtask |
Selects source data per subtask; useful only when temporal constraints are well defined |
generation_interpolate_from_last_target_pose |
Interpolates from the previous target pose to reduce waypoint jumps |
use_skillgen |
Enables the advanced skill-generation path |
use_navigation_controller |
Enables navigation control for locomanipulation generation |
SubTaskConfig is where generation quality is usually won or lost. A simplified skeleton:
from isaaclab.envs.mimic_env_cfg import DataGenConfig, SubTaskConfig, MimicEnvCfg
cfg = MimicEnvCfg()
cfg.datagen_config = DataGenConfig(
name="nut_pour_gr1t2_qa",
generation_guarantee=True,
generation_keep_failed=False,
max_num_failures=80,
seed=42,
generation_num_trials=1000,
)
cfg.subtask_configs = {
"left_hand": [
SubTaskConfig(
object_ref="red_beaker",
subtask_term_signal="left_grasp_beaker",
selection_strategy="nearest_neighbor_object",
action_noise=0.02,
num_interpolation_steps=6,
),
],
"right_hand": [
SubTaskConfig(
object_ref="yellow_bowl",
subtask_term_signal="right_support_bowl",
selection_strategy="nearest_neighbor_object",
action_noise=0.015,
num_interpolation_steps=8,
),
],
}
This is a reading skeleton, not a promise that your exact nut pouring task uses object keys named red_beaker or yellow_bowl. In a real environment, open the task config and use the actual object names. The stable idea is that subtasks should refer to the objects they depend on.
| Parameter | When to tune it | Risk if wrong |
|---|---|---|
object_ref |
The subtask depends on a specific object pose | If it is None, object-based nearest-neighbor selection cannot work |
selection_strategy='nearest_neighbor_object' |
Object poses are randomized and source segments should be selected by object similarity | If the object reference is wrong, the selected source segment is irrelevant |
action_noise |
You want robustness to small action perturbations | Too much noise breaks grasping and pouring |
num_interpolation_steps |
Motion jumps between subtask segments | Too many steps can make motion slow and lengthen the horizon |
subtask_term_offset_range |
Annotation ends too close to the success trigger | Large offsets can bleed into the next subtask |
Isaac Lab's source notes that nearest_neighbor_object and nearest_neighbor_robot_distance require object_ref to be set, and that these strategies often outperform random selection when the object reference is valid. For beginners, this is the reason not to copy a Mimic config blindly and only change the task name.
Step 4: Generate 1000 trials
After source demos pass QA and annotation is clean, generate the dataset:
./isaaclab.sh -p scripts/imitation_learning/isaaclab_mimic/generate_dataset.py \
--device cpu \
--headless \
--enable_pinocchio \
--enable_cameras \
--rendering_mode balanced \
--task Isaac-NutPour-GR1T2-Pink-IK-Abs-Mimic-v0 \
--generation_num_trials 1000 \
--num_envs 5 \
--input_file ./datasets/dataset_annotated_gr1_nut_pouring.hdf5 \
--output_file ./datasets/generated_dataset_gr1_nut_pouring.hdf5
At this stage, "the file exists" is not enough. Record the generation run in a manifest:
task: Isaac-NutPour-GR1T2-Pink-IK-Abs-Mimic-v0
source_dataset: dataset_annotated_gr1_nut_pouring.hdf5
generated_dataset: generated_dataset_gr1_nut_pouring.hdf5
source_demos: 5
target_trials: 1000
num_envs: 5
seed: 42
enable_cameras: true
enable_pinocchio: true
qa_owner: data_supervisor_01
qa_status: pending
Then produce a minimal QA report:
python - <<'PY'
import h5py
import numpy as np
path = "./datasets/generated_dataset_gr1_nut_pouring.hdf5"
lengths = []
with h5py.File(path, "r") as f:
demos = sorted(f["data"].keys())
for key in demos:
n = int(f["data"][key].attrs.get("num_samples", 0))
lengths.append(n)
print("demos:", len(lengths))
print("min/mean/max length:", min(lengths), np.mean(lengths), max(lengths))
print("p95 length:", np.percentile(lengths, 95))
PY
If the 95th percentile episode length is roughly twice the median, the dataset may contain too many wandering, paused, or over-corrected trials. If you requested 1000 trials but received far fewer, inspect the generation logs before training.
Step 5: Train BC with robomimic
Isaac Lab uses robomimic/train.py to train the visuomotor BC policy for nut pouring:
./isaaclab.sh -p scripts/imitation_learning/robomimic/train.py \
--task Isaac-NutPour-GR1T2-Pink-IK-Abs-v0 \
--algo bc \
--normalize_training_actions \
--dataset ./datasets/generated_dataset_gr1_nut_pouring.hdf5
The --normalize_training_actions flag matters. Isaac Lab saves the normalization parameters in the model log directory, and the same values must be passed back to play.py. Without this, the policy may train but produce actions at the wrong scale during rollout.
Robomimic treats an HDF5 dataset as the minibatch source for imitation learning and offline RL. BC is the supervised baseline: learn a mapping from observations to actions. For nut pouring, the observations can include camera images, hand poses, joint positions, object poses, and other fields exposed by the Isaac Lab task; actions correspond to target base, hand, and finger commands for the environment.
Evaluate multiple checkpoints, not just the final one:
./isaaclab.sh -p scripts/imitation_learning/robomimic/play.py \
--task Isaac-NutPour-GR1T2-Pink-IK-Abs-v0 \
--visualizer kit \
--device cpu \
--enable_cameras \
--num_rollouts 50 \
--horizon 350 \
--norm_factor_min <NORM_FACTOR_MIN> \
--norm_factor_max <NORM_FACTOR_MAX> \
--checkpoint /PATH/TO/desired_model_checkpoint.pth
For nut pouring, Isaac Lab recommends 600 epochs on 1000 generated demonstrations and evaluating checkpoints between epochs 300 and 600. For locomanipulation, the recommendation is 2000 epochs and checkpoint selection between epochs 1000 and 2000.
Success-rate reference table
Use this as a QA anchor, not a product guarantee:
| Workflow | Dataset | Expected metric | How to interpret it |
|---|---|---|---|
| GR1T2 nut pouring visuomotor | 1000 generated demos | BC success 50-60% over 50 rollouts | If much lower, inspect cameras, annotation, and action normalization |
| G1 locomanipulation pick-and-place | 1000 generated demos | Generation success 65-82% | If lower, inspect source demos, subtask splits, object_ref, and interpolation |
| G1 locomanipulation BC | 1000 generated demos, 2000 epochs | BC success 75-85% over 50 rollouts | A useful reference, not a universal SLA |
If nut pouring reaches only 20%, the first explanation should not be "the model is too small." More often, the dataset is wrong: jerky demos, incorrect action indices, missing cameras, long horizons, or missing normalization parameters during rollout.
QA workflow for a humanoid data center
For a production data center, I would gate the dataset like this:
| Gate | Owner | Pass condition |
|---|---|---|
| Raw demo QA | Operator + data supervisor | Successful, smooth demos with no long pauses and a short end buffer |
| Annotation QA | Data supervisor | Complete subtasks, no duplicate right-EEF action indices, no severe replay errors |
| Generation QA | Simulation engineer | Enough trials, success rate in the expected range, sane length distribution |
| Training QA | ML engineer | No NaNs, improving rollout metrics, multiple checkpoints compared |
| Release QA | Tech lead | Manifest, config, seed, source dataset, and checkpoint are versioned |
A generated dataset should be promoted to the training data lake only when it has a manifest like this:
qa_status: approved
approved_by: simulation_lead
date: 2026-06-10
source_demo_count: 5
generated_demo_count: 1000
generation_success_rate: 0.78
train_algo: robomimic_bc
rollout_count: 50
best_checkpoint_epoch: 520
rollout_success_rate: 0.56
known_limitations:
- "Fails when red beaker starts near workspace edge"
- "Sensitive to camera exposure in reflective bowl scenes"
The benefit shows up in part 6 on evaluation and scaling: you do not have to guess which dataset trained which policy. When a policy fails, you can trace the model checkpoint back to the generated HDF5, the annotated demonstrations, the raw MCAP logs, and the operator notes.
Quick troubleshooting
| Symptom | Common cause | Check |
|---|---|---|
| Generation keeps failing | Source demos do not replay cleanly, or subtasks are too long | Replay demos, reduce subtask count, recollect smoother demos |
| Low generation success despite good-looking demos | Wrong object_ref, or random selection under heavy pose randomization |
Open the task config and use nearest_neighbor_object for object-dependent subtasks |
| Robot jumps between segments | num_interpolation_steps is too low |
Increase from 5 to 8-10 for large pose transitions |
| Grasp degrades after adding noise | action_noise is too high |
Lower noise for grasp and pour subtasks |
| BC trains but rollout is poor | Missing action normalization parameters, or final checkpoint is not best | Use normalization_params.txt and evaluate several checkpoints |
| Camera observations are missing | --enable_cameras was omitted during annotation or generation |
Inspect HDF5 keys and run a dataloader smoke test |
References
- Isaac Lab Mimic Teleoperation and Imitation Learning
- Isaac Lab
mimic_env_cfg.py - robomimic Getting Started
- robomimic Implemented Algorithms