humanoidhumanoidrobot-datalerobotisaac-labegohumanoidhumanoid-everydayvladata-ownership

Humanoid Data Map 2026

A beginner-friendly map of humanoid data: raw files, LeRobot, Isaac Lab, Humanoid Everyday, and who controls value at each layer.

Nguyễn Anh TuấnJune 10, 202615 min read
Humanoid Data Map 2026

Why start from concrete dataset surfaces?

When people ask "who owns humanoid robot data?", the discussion often jumps straight to legal ownership: the robot owner, the teleoperator, the company hosting the files, or the team training the model. That framing is too abstract for beginners because humanoid data is not one object. It is a chain of data surfaces: raw recordings from robots or humans, synchronized camera files, training formats such as LeRobot, simulated demonstrations from Isaac Lab, model checkpoints, evaluation logs, and sometimes videos of real robots running in a cloud benchmark.

This first article in the series takes a more practical route: look at the files first, then discuss control over value. If you open a dataset folder and see episode_0.hdf5, episode_0.svo2, data/chunk-000/file-000.parquet, videos/observation.images.front/file-000.mp4, or generated_dataset_gr1_nut_pouring.hdf5, you should be able to identify where that file sits in the value chain.

This is not legal advice. The goal is technical classification. By the end, you should be able to reason about which layer is controlled by the hardware owner, which layer depends on the teleoperator, which layer is governed by the dataset host, which layer creates leverage for the model trainer, and which layer belongs to the cloud evaluation platform.

If you have already read LeRobot v0.5 with G1 whole-body control or GR00T-VisualSim2Real for G1 in Isaac Lab, treat this article as the data map underneath those tutorials.

Roadmap series

  1. Humanoid Data Map 2026 — this article, starting from files and mapping value control by layer.
  2. VR teleoperation and operator data — why PICO, Apple Vision Pro, and hand tracking turn human work into robot data assets.
  3. View alignment and action alignment — the conversion layer that makes human video usable for humanoids.
  4. Simulation and synthetic demonstrations — Isaac Lab, Mimic, domain randomization, and ownership of generated demos.
  5. Human video and robot-free data — when videos of people become pretraining data for VLA policies.
  6. The VLA stack and downstream control — from datasets to checkpoints, inference APIs, benchmarks, and product moats.

Four concrete examples that define the map

Instead of speaking in generalities, start with four dataset surfaces that matter in humanoid research in 2026.

Data surface Example file or scale What it contains Who usually controls it directly?
Raw egocentric humanoid data episode_*.hdf5 + episode_*.svo2 in EgoHumanoid Skeletons, hand pose, timestamps, raw ZED video, later merged into image-bearing HDF5 The collecting lab, hardware owner, and data pipeline team
LeRobotDataset Parquet + MP4, sometimes image files for debugging/export State, action, and timestamp in Parquet; camera data in MP4; metadata, tasks, stats Dataset host, schema owner, Hub uploader
Isaac Lab synthetic dataset generated_dataset_gr1_nut_pouring.hdf5 1000 demonstrations of a GR1 humanoid for nut pouring/placing, generated with Isaac Lab Mimic Scene owner, asset owner, generation script owner, compute owner
Humanoid Everyday 10.3k trajectories, over 3M frames, 260 tasks RGB, depth, LiDAR, tactile, state/action, language annotations, Apple Vision Pro teleop streams Dataset authors, G1/H1 robot owners, cloud evaluation operator

The key point is that all four rows are "humanoid data", but their economics are different. episode_0.svo2 is a raw sensor recording. A LeRobot Parquet file is a normalized training table. An Isaac Lab HDF5 file is a generated demonstration dataset in simulation. Humanoid Everyday is not just a folder; it is an ecosystem of dataset, loader, benchmark, and cloud evaluation. Each layer has a different gatekeeper.

Layer 1: Raw files from humans or robots

EgoHumanoid is a useful starting point because it makes the problem explicit: use egocentric human demonstrations to improve humanoid loco-manipulation, then co-train a VLA policy with a smaller amount of robot data. The official repository describes a pipeline that collects data from Unitree G1, PICO VR, and ZED Mini, processes view/action alignment, fine-tunes a π₀.₅-based VLA model, and deploys policies back to a real robot. Sources: OpenDriveLab/EgoHumanoid and arXiv 2602.10106.

At the file level, EgoHumanoid's human data pipeline expects raw data to be organized in dated batch folders:

raw_data/
  2025-01-15_batch1/
    episode_0.hdf5
    episode_0.svo2
    episode_1.hdf5
    episode_1.svo2
  2025-01-15_batch2/
    ...

The episode_*.hdf5 files contain structured streams such as body_pose, left_hand_pose, right_hand_pose, and local_timestamps_ns. The episode_*.svo2 files are ZED camera recordings. The pipeline then reorders episodes, computes navigation commands from body pose, downsamples streams, reads frames from SVO2, aligns timestamps, compresses left/right camera images as JPEG inside the final HDF5, and computes binary hand_status for left and right hands. Technical source: EgoHumanoid Human Data Processing Pipeline.

For beginners, the rule is simple: raw files are the layer closest to the real world. If a company owns the robot, camera, VR headset, workstation, and workspace, it usually controls whether those raw files can be produced. If the teleoperator is an employee or contractor, labor contracts and data policies determine their rights. If the teleoperator is an end user, privacy and consent become much harder.

A minimal chain looks like this:

Human/robot activity
  -> sensor recording
  -> episode_0.hdf5      # pose, hand, timestamp, command
  -> episode_0.svo2      # camera stream from ZED
  -> processed_episode.hdf5
  -> LeRobot format
  -> model training

The value at this layer comes from rarity: egocentric viewpoint, real objects, real lighting, real failures, human hand behavior, and annoying edge cases that simulation has not captured. But raw data is also the riskiest layer. It may contain faces, private rooms, factory layouts, computer screens, license plates, voices, or personally identifying work habits.

Layer 2: Training formats such as LeRobotDataset

Raw files are not necessarily good training files. Model trainers need fast sampling, clear metadata, stable episode segmentation, consistent task labels, and portable loaders. That is why LeRobotDataset has become an important value layer.

According to Hugging Face's LeRobotDataset v3.0 documentation, the format decouples efficient storage from the user-facing API. Low-dimensional, high-frequency streams such as states, actions, and timestamps are stored in Apache Parquet. Camera frames are concatenated and encoded into MP4 files, sharded by camera to keep file counts manageable. Metadata under meta/ describes schema, FPS, statistics, tasks, and episode offsets. Sources: LeRobotDataset v3.0 documentation and the LeRobot datasets v3 blog.

A typical LeRobotDataset v3 layout looks like this:

my-humanoid-dataset/
  meta/
    info.json
    stats.json
    tasks.jsonl
    episodes/
      chunk-000/file-000.parquet
  data/
    chunk-000/file-000.parquet
  videos/
    observation.images.front/
      chunk-000/file-000.mp4
    observation.images.wrist/
      chunk-000/file-000.mp4

This creates a new control layer. The party that owns the raw files may not be the party that controls the standardized dataset. Once data is uploaded to the Hugging Face Hub or an internal object store, the dataset host controls access tokens, licenses, versions, visibility, takedowns, mirrors, and metadata. A model trainer may never see episode_0.svo2, yet still train a policy if the Parquet, MP4, and metadata are good enough.

If you are building a humanoid startup, do not treat "convert to LeRobot" as a minor cleanup step. This is where you decide:

Schema decision Technical effect Control effect
Camera name: observation.images.front or head_rgb Whether model configs can read it Whoever defines the convention lowers their own integration cost
State/action dimension Whether G1, H1, GR1, or another robot can reuse it The action-space owner controls downstream portability
FPS and downsampling I/O, latency, action smoothness Fine teleoperator behavior may be erased
Task text Whether VLA models understand instructions Annotators add semantic value
Normalization stats More stable training Stats can reveal internal data distributions
License and access Sharing or closed training Dataset hosts control downstream use

A quick inspection script:

from lerobot.datasets.lerobot_dataset import LeRobotDataset

dataset = LeRobotDataset("org/humanoid-demo")
sample = dataset[100]

print(sample.keys())
print(sample["observation.state"].shape)
print(sample["action"].shape)
print(sample["observation.images.front"].shape)

Once this works, the value is no longer just in the raw recording. The value is in turning messy episodes into a stable training API. That is why organizations without a large robot fleet can still create leverage by standardizing, validating, indexing, and distributing robotics datasets.

Layer 3: Synthetic demonstrations in Isaac Lab

Humanoid data does not only come from real robots. Isaac Lab Mimic can generate demonstrations in simulation, then train policies with Robomimic or post-train VLA models. The Isaac Lab documentation gives a concrete example: download generated_dataset_gr1_nut_pouring.hdf5, place it under IsaacLab/datasets/, note that it is about 12GB, and use it as a 1000-demonstration dataset of a GR1 humanoid performing a pouring/placing task generated with Isaac Lab Mimic for Isaac-NutPour-GR1T2-Pink-IK-Abs-Mimic-v0. Source: Isaac Lab teleoperation and imitation learning documentation.

The task is not just "pick up an object." The robot first picks up the red beaker, pours its contents into the yellow bowl, drops the beaker into the blue bin, and places the yellow bowl on the white scale. The success criteria are also multi-condition: the beaker must be in the bin, the nut must be in the bowl, and the bowl must be on the scale. That makes the dataset valuable for visuomotor learning because it combines perception, manipulation, and long-horizon sequencing.

A shortened Isaac Lab command:

./isaaclab.sh -p scripts/imitation_learning/isaaclab_mimic/generate_dataset.py \
  --device cpu \
  --headless \
  --enable_pinocchio \
  --enable_cameras \
  --rendering_mode balanced \
  --task Isaac-NutPour-GR1T2-Pink-IK-Abs-Mimic-v0 \
  --generation_num_trials 1000 \
  --num_envs 5 \
  --input_file ./datasets/dataset_annotated_gr1_nut_pouring.hdf5 \
  --output_file ./datasets/generated_dataset_gr1_nut_pouring.hdf5

Synthetic data complicates ownership. There is no direct teleoperator in every final frame, but many contributors still shape the dataset: the person who built the USD scene, the team that modeled the GR1 robot, the person who wrote the task, the annotator who marked subtasks, the engineer who ran generation, the owner of the object assets, the group paying for compute, and the simulator/toolchain license.

A practical classification:

Synthetic data component Value created Common controller
Robot asset and controller Valid action limits, dynamics, joint behavior Robot vendor, lab, simulator vendor
Scene/object asset Environment and object diversity Asset owner, simulation team
Task definition Rewards, success criteria, subtask graph Research team or benchmark owner
Demonstration generator Rollout volume, quality filtering Isaac Lab Mimic operator
Output HDF5 Direct training dataset Storage and access owner
Conversion to LeRobot Compatibility with VLA stacks Dataset engineer or model team

In short: synthetic datasets are not "ownership-free" just because they do not record real people. They trade privacy risk for license risk, asset provenance risk, and benchmark leakage risk.

Layer 4: Ecosystem datasets such as Humanoid Everyday

Humanoid Everyday shows a different direction: the dataset is not merely a folder, but an ecosystem of data, code, policy analysis, and cloud evaluation. The arXiv abstract describes 10.3k trajectories, over 3 million frames, 260 tasks across 7 categories, including RGB, depth, LiDAR, tactile inputs, language annotations, and a human-supervised teleoperation pipeline. The paper also introduces a cloud-based evaluation platform that lets researchers deploy policies in a controlled setting and receive performance feedback. Source: Humanoid Everyday arXiv 2510.08807.

The Humanoid Everyday repository adds practical details: demonstrations are recorded on Unitree G1 and H1 at 30Hz, across loco-manipulation, basic manipulation, tool use, deformable objects, articulated objects, and human-robot interaction. Low-dimensional modalities include joint states, IMU, odometry/kinematics, G1 hand pressure sensors, teleoperator hands/head actions from Apple Vision Pro, and inverse-kinematics data. High-dimensional modalities include 480x640 RGB, 480x640 depth, and LiDAR point clouds of roughly 6k points per step. The repository also provides a dataloader and a conversion script to LeRobot. Source: physical-superintelligence-lab/Humanoid-Everyday.

The important shift is cloud evaluation. When a group owns real robots and lets the community send policies to a benchmark, it is no longer just a dataset host. It becomes the evaluation platform owner. It can control:

Evaluation layer Generated data Value
Input stream from the real robot Current RGB/depth/state Lets policies run without owning the robot
Policy action stream Commands, latency, failure modes Reveals behavior style and control quality
Rollout video Evidence of success or failure Powers leaderboards, papers, demos
Metrics and protocol Success rate, time, safety reset Determines who is considered best
Operations logs Crashes, timeouts, interventions Valuable debugging and productization data

In humanoids, evaluation data can be nearly as valuable as training data. Offline training loss is not enough. A real humanoid has balance, contact, latency, motor heat, safety intervention, and objects that are never perfectly placed. Whoever owns the real benchmark can see the failure distribution of many models without necessarily publishing all logs.

A five-layer map of value control

From the four examples above, we can define a general map:

Layer 0: Physical world
  people, robot, room, object, lighting, safety rig

Layer 1: Raw capture
  HDF5, SVO2, ROS bag, camera stream, VR hand/head tracking

Layer 2: Processed dataset
  synchronized HDF5, LeRobot Parquet + MP4/images, metadata, stats

Layer 3: Model training
  sampling code, normalization, checkpoints, VLA adapters, finetune recipes

Layer 4: Evaluation and deployment
  cloud benchmark, rollout videos, success metrics, inference API, logs

And here is the "who controls value?" table:

Actor What they control Value they hold Risk if ignored
Hardware owner Robot, camera, VR headset, workspace, safety rig Ability to create exclusive raw data Dataset cannot be reproduced if robot access disappears
Teleoperator Skill, correction strategy, speed, operating style Demonstration quality and long-tail behavior Ambiguous contracts/consent, untracked operator bias
Dataset host Storage, schema, license, version, access Distribution and standardization power Downstream users may not know provenance or license constraints
Model trainer Recipe, compute, checkpoint, normalization, model card Turning data into capability Model may absorb sensitive data or violate data terms
Cloud evaluation platform Real robot benchmark, protocol, logs, leaderboard Real-world measurement and failure visibility Benchmark can become an opaque gatekeeper

This map helps you read dataset announcements more carefully. When a paper says "we release data", ask: raw or processed? Does it include source videos or only state/action? Is commercial use allowed? Can you download everything or must you use a cloud API? Are evaluation logs included? Is there a task spreadsheet but no asset provenance? Is there a conversion script? Are normalization statistics published?

Beginner checklist for humanoid datasets

Use this checklist before training any policy:

1. Is the dataset from real robots, humans, simulation, or a mixture?
2. What are the source files: HDF5, SVO2, ROS bag, Parquet, MP4, PNG, PCD?
3. Are camera streams and state/action streams timestamp-aligned?
4. Is task text consistent with the episode?
5. Is the action space joint targets, end-effector poses, base velocity, or mixed action?
6. Are FPS, robot type, camera intrinsics, and normalization stats documented?
7. Does the license permit research, commercial use, redistribution, or closed-model fine-tuning?
8. Does the data include people, faces, voices, private spaces, or factory information?
9. Is there a separate benchmark or cloud evaluation platform?
10. Can the model trainer keep checkpoints and rollout logs, or must they share them back?

If questions 2, 3, 5, and 7 are unclear, you do not really know what you are using. With a fixed robot arm, a mismatch may only reduce performance. With a humanoid, a wrong action space or timestamp alignment can cause falls, collisions, or invalid evaluation.

Conclusion: humanoid data is a chain of control

In 2026, the advantage in humanoid robotics is not only the number of robots you own. It is the ability to move from trustworthy raw files, through standard training formats, into strong checkpoints, and finally prove capability through real-robot evaluation. Each step adds value and shifts control.

EgoHumanoid reminds us that human egocentric data can scale environmental diversity beyond lab teleoperation, but only with alignment. LeRobotDataset reminds us that schema and metadata can turn scattered recordings into trainable assets. Isaac Lab reminds us that synthetic demonstrations have their own provenance. Humanoid Everyday reminds us that a large dataset plus cloud evaluation can become benchmark infrastructure, not just a download link.

The next article will focus on VR teleoperation and operator data: when a person wears a headset to control G1/H1, what counts as robot data, what counts as labor data, and why that distinction directly affects the value of a VLA stack.

Technical sources

Topic Source
EgoHumanoid framework OpenDriveLab/EgoHumanoid, arXiv 2602.10106
EgoHumanoid raw episode_*.hdf5 + episode_*.svo2 Human Data Processing Pipeline
LeRobotDataset v3 Hugging Face docs, LeRobot v3 blog
Isaac Lab GR1 nut-pouring dataset Isaac Lab imitation learning docs
Humanoid Everyday arXiv 2510.08807, GitHub repo
NT

Nguyễn Anh Tuấn

Robotics & AI Engineer. Building VnRobo — sharing knowledge about robot learning, VLA models, and automation.

Khám phá VnRobo

Related Posts

Teleop VR: từ PICO/ZED đến HDF5
humanoid

Teleop VR: từ PICO/ZED đến HDF5

6/10/202618 min read
NT
Stack VLA: dữ liệu đến triển khai
humanoid

Stack VLA: dữ liệu đến triển khai

6/10/202613 min read
NT
Căn góc nhìn người sang robot
humanoid

Căn góc nhìn người sang robot

6/10/202616 min read
NT