unifolm-vla + Unitree G1 (Post 3): data pipeline — JSON → LeRobot → HDF5 → RLDS

This is post 3 of the unifolm-vla + Unitree G1 series. The previous post collected 50+ demos as JSON files. This post: converting data through 3 steps to make it ready for training.

It sounds like a lot of steps, but each step is just one command — this pipeline exists because it's designed to be reusable across different robots and training frameworks.

Pipeline overview

[xr_teleoperate output]        [unitree_IL_lerobot]
  JSON + video files    ──→    LeRobot V2.1 format
        ↓
[unifolm-vla prepare_data]
  LeRobot → HDF5               (more compact)
        ↓
[unifolm-vla prepare_data]
  HDF5 → RLDS                  (standard training format)
        ↓
[unifolm-vla training]
  RLDS → train_unifolm_vla.py

Why 3 steps?

LeRobot V2.1 is HuggingFace's standard format for robot datasets — easy to share, comes with verification tooling
HDF5 is a compact format with efficient I/O for training
RLDS (Reinforcement Learning Dataset Standard) is TensorFlow-based and used by many VLA training pipelines

Each format has separate verification tools — useful for debugging when something goes wrong.

Step 1: JSON → LeRobot V2.1

Using unitree_IL_lerobot:

conda activate xr_teleop   # same env as xr_teleoperate
cd ~/unifolm_ws/unitree_IL_lerobot

# Basic command
python unitree_lerobot/utils/convert_unitree_json_to_lerobot.py \
    --raw-dir $HOME/unifolm_ws/xr_teleoperate/teleop/utils/data \
    --repo-id your_name/g1_pickplace_demo \
    --robot_type Unitree_G1_Dex3 \
    --push_to_hub

Flag breakdown:

Flag	Example value	Meaning
`--raw-dir`	path to JSON data	Output from xr_teleoperate
`--repo-id`	`your_name/g1_task`	HuggingFace repo ID (for push or local save)
`--robot_type`	`Unitree_G1_Dex3`	G1 with Dex3 hands joint config
`--push_to_hub`	(flag)	Push to HuggingFace Hub (needs HF token)

To run offline (no HuggingFace push):

python unitree_lerobot/utils/convert_unitree_json_to_lerobot.py \
    --raw-dir $HOME/unifolm_ws/xr_teleoperate/teleop/utils/data \
    --repo-id local/g1_pickplace_demo \
    --robot_type Unitree_G1_Dex3 \
    --local-dir $HOME/datasets/g1_pickplace_lerobot
    # drop --push_to_hub

Expected output:

Processing demo_0001... OK (87 frames)
Processing demo_0002... OK (92 frames)
...
Processing demo_0050... OK (81 frames)
Skipped: 2 demos (too short)
Converted: 48/50 demos
LeRobot dataset saved to: /home/user/datasets/g1_pickplace_lerobot/

Verify LeRobot dataset:

pip install lerobot

python -c "
from lerobot.common.datasets.lerobot_dataset import LeRobotDataset

dataset = LeRobotDataset('local/g1_pickplace_demo',
                         local_files_only=True,
                         root='/home/user/datasets/g1_pickplace_lerobot')
print('Episodes:', dataset.num_episodes)
print('Frames:', len(dataset))
print('Features:', list(dataset.features.keys()))
"

# Expected output:
# Episodes: 48
# Frames: 4128
# Features: ['observation.images.left_wrist', 'observation.images.right_wrist',
#             'observation.state', 'action', 'timestamp', 'episode_index']

Step 2: LeRobot → HDF5

Using scripts in unifolm-vla:

conda activate unifolm
cd ~/unifolm_ws/unifolm-vla

python prepare_data/convert_lerobot_to_hdf5.py \
    --lerobot_dir $HOME/datasets/g1_pickplace_lerobot \
    --output_dir $HOME/datasets/g1_pickplace_hdf5 \
    --robot_type g1_dex3

Expected output:

Converting episode 0/48...
Converting episode 1/48...
...
Conversion complete!
Output: /home/user/datasets/g1_pickplace_hdf5/
  ├── train/
  │   ├── episode_000.hdf5
  │   └── ... (43 files)
  └── val/
      └── ... (5 files)
Total: 43 train + 5 val episodes

Verify HDF5:

python -c "
import h5py, glob

files = glob.glob('/home/user/datasets/g1_pickplace_hdf5/train/*.hdf5')
with h5py.File(files[0], 'r') as f:
    print('Keys:', list(f.keys()))
    print('Actions shape:', f['action'].shape)
    print('Frames:', f['action'].shape[0])
"

# Expected:
# Keys: ['action', 'obs', 'language_instruction']
# Actions shape: (87, 28)   ← 28 joints for G1 Dex3
# Frames: 87

Step 3: HDF5 → RLDS

Final conversion step to training format:

cd ~/unifolm_ws/unifolm-vla/prepare_data/hdf5_to_rlds

# Install RLDS dependencies
pip install tensorflow tensorflow_datasets

# Run conversion
python hdf5_to_rlds.py \
    --hdf5_dir $HOME/datasets/g1_pickplace_hdf5 \
    --output_dir $HOME/datasets/g1_pickplace_rlds \
    --dataset_name g1_pickplace

Expected output:

Converting HDF5 → RLDS...
Writing train split: 43 episodes
Writing val split: 5 episodes
Shuffling train data...
Done! RLDS dataset at: /home/user/datasets/g1_pickplace_rlds/
  ├── g1_pickplace/
  │   └── 1.0.0/
  │       ├── g1_pickplace-train.tfrecord-00000-of-00001
  │       ├── g1_pickplace-val.tfrecord-00000-of-00001
  │       └── dataset_info.json

Verify RLDS:

python -c "
import tensorflow_datasets as tfds

dataset = tfds.load(
    'g1_pickplace',
    data_dir='/home/user/datasets/g1_pickplace_rlds',
    split='train'
)

for batch in dataset.take(1):
    print('Steps per episode:', len(batch['steps']))
    step = batch['steps'][0]
    print('Action shape:', step['action'].shape)
    print('Language instruction:', step['language_instruction'].numpy().decode())
"

# Expected:
# Steps per episode: 87
# Action shape: (28,)
# Language instruction: pick up the red cup

Checklist before training

[ ] Step 1 (LeRobot):
    - Episodes = number of valid demos
    - Features include: images + state + action + timestamp
    
[ ] Step 2 (HDF5):
    - Action shape = (N_frames, 28) for G1 Dex3
    - Both train/ and val/ splits present
    - Train/val ratio ≈ 90/10
    
[ ] Step 3 (RLDS):
    - TFRecord files exist and are readable
    - Language instruction correct in each episode
    - No abruptly truncated episodes

Common errors

"Too few demos after filtering"

Cause: many demos filtered out (too short < 2 seconds)
Fix: collect more demos, or reduce min episode length:

python convert_unitree_json_to_lerobot.py \
    ... \
    --min_episode_length 30   # frames, default is 60

"Action shape mismatch (expected 28, got 14)"

Cause: wrong --robot_type — G1 without Dex3 has 14 arm joints only
Fix: check your hardware

# G1 with Dex3 (3-finger dex hands): --robot_type Unitree_G1_Dex3  → 28 joints
# G1 with standard gripper:          --robot_type Unitree_G1        → 14 joints

"CUDA not found" when running hdf5_to_rlds

Cause: TensorFlow uses GPU if available, but it's not required
Fix: force CPU (3x slower but works)

CUDA_VISIBLE_DEVICES="" python hdf5_to_rlds.py ...

Next: Fine-tune unifolm-vla from Qwen2.5-VL-7B — including single-GPU LoRA workaround.

References

unifolm-vla + Unitree G1 (Post 3): data pipeline — JSON → LeRobot → HDF5 → RLDS

This is post 3 of the unifolm-vla + Unitree G1 series. The previous post collected 50+ demos as JSON files. This post: converting data through 3 steps to make it ready for training.

It sounds like a lot of steps, but each step is just one command — this pipeline exists because it's designed to be reusable across different robots and training frameworks.

Pipeline overview

[xr_teleoperate output]        [unitree_IL_lerobot]
  JSON + video files    ──→    LeRobot V2.1 format
        ↓
[unifolm-vla prepare_data]
  LeRobot → HDF5               (more compact)
        ↓
[unifolm-vla prepare_data]
  HDF5 → RLDS                  (standard training format)
        ↓
[unifolm-vla training]
  RLDS → train_unifolm_vla.py

Why 3 steps?

LeRobot V2.1 is HuggingFace's standard format for robot datasets — easy to share, comes with verification tooling
HDF5 is a compact format with efficient I/O for training
RLDS (Reinforcement Learning Dataset Standard) is TensorFlow-based and used by many VLA training pipelines

Each format has separate verification tools — useful for debugging when something goes wrong.

Step 1: JSON → LeRobot V2.1

Using unitree_IL_lerobot:

conda activate xr_teleop   # same env as xr_teleoperate
cd ~/unifolm_ws/unitree_IL_lerobot

# Basic command
python unitree_lerobot/utils/convert_unitree_json_to_lerobot.py \
    --raw-dir $HOME/unifolm_ws/xr_teleoperate/teleop/utils/data \
    --repo-id your_name/g1_pickplace_demo \
    --robot_type Unitree_G1_Dex3 \
    --push_to_hub

Flag breakdown:

Flag	Example value	Meaning
`--raw-dir`	path to JSON data	Output from xr_teleoperate
`--repo-id`	`your_name/g1_task`	HuggingFace repo ID (for push or local save)
`--robot_type`	`Unitree_G1_Dex3`	G1 with Dex3 hands joint config
`--push_to_hub`	(flag)	Push to HuggingFace Hub (needs HF token)

To run offline (no HuggingFace push):

python unitree_lerobot/utils/convert_unitree_json_to_lerobot.py \
    --raw-dir $HOME/unifolm_ws/xr_teleoperate/teleop/utils/data \
    --repo-id local/g1_pickplace_demo \
    --robot_type Unitree_G1_Dex3 \
    --local-dir $HOME/datasets/g1_pickplace_lerobot
    # drop --push_to_hub

Expected output:

Processing demo_0001... OK (87 frames)
Processing demo_0002... OK (92 frames)
...
Processing demo_0050... OK (81 frames)
Skipped: 2 demos (too short)
Converted: 48/50 demos
LeRobot dataset saved to: /home/user/datasets/g1_pickplace_lerobot/

Verify LeRobot dataset:

pip install lerobot

python -c "
from lerobot.common.datasets.lerobot_dataset import LeRobotDataset

dataset = LeRobotDataset('local/g1_pickplace_demo',
                         local_files_only=True,
                         root='/home/user/datasets/g1_pickplace_lerobot')
print('Episodes:', dataset.num_episodes)
print('Frames:', len(dataset))
print('Features:', list(dataset.features.keys()))
"

# Expected output:
# Episodes: 48
# Frames: 4128
# Features: ['observation.images.left_wrist', 'observation.images.right_wrist',
#             'observation.state', 'action', 'timestamp', 'episode_index']

Step 2: LeRobot → HDF5

Using scripts in unifolm-vla:

conda activate unifolm
cd ~/unifolm_ws/unifolm-vla

python prepare_data/convert_lerobot_to_hdf5.py \
    --lerobot_dir $HOME/datasets/g1_pickplace_lerobot \
    --output_dir $HOME/datasets/g1_pickplace_hdf5 \
    --robot_type g1_dex3

Expected output:

Converting episode 0/48...
Converting episode 1/48...
...
Conversion complete!
Output: /home/user/datasets/g1_pickplace_hdf5/
  ├── train/
  │   ├── episode_000.hdf5
  │   └── ... (43 files)
  └── val/
      └── ... (5 files)
Total: 43 train + 5 val episodes

Verify HDF5:

python -c "
import h5py, glob

files = glob.glob('/home/user/datasets/g1_pickplace_hdf5/train/*.hdf5')
with h5py.File(files[0], 'r') as f:
    print('Keys:', list(f.keys()))
    print('Actions shape:', f['action'].shape)
    print('Frames:', f['action'].shape[0])
"

# Expected:
# Keys: ['action', 'obs', 'language_instruction']
# Actions shape: (87, 28)   ← 28 joints for G1 Dex3
# Frames: 87

Step 3: HDF5 → RLDS

Final conversion step to training format:

cd ~/unifolm_ws/unifolm-vla/prepare_data/hdf5_to_rlds

# Install RLDS dependencies
pip install tensorflow tensorflow_datasets

# Run conversion
python hdf5_to_rlds.py \
    --hdf5_dir $HOME/datasets/g1_pickplace_hdf5 \
    --output_dir $HOME/datasets/g1_pickplace_rlds \
    --dataset_name g1_pickplace

Expected output:

Converting HDF5 → RLDS...
Writing train split: 43 episodes
Writing val split: 5 episodes
Shuffling train data...
Done! RLDS dataset at: /home/user/datasets/g1_pickplace_rlds/
  ├── g1_pickplace/
  │   └── 1.0.0/
  │       ├── g1_pickplace-train.tfrecord-00000-of-00001
  │       ├── g1_pickplace-val.tfrecord-00000-of-00001
  │       └── dataset_info.json

Verify RLDS:

python -c "
import tensorflow_datasets as tfds

dataset = tfds.load(
    'g1_pickplace',
    data_dir='/home/user/datasets/g1_pickplace_rlds',
    split='train'
)

for batch in dataset.take(1):
    print('Steps per episode:', len(batch['steps']))
    step = batch['steps'][0]
    print('Action shape:', step['action'].shape)
    print('Language instruction:', step['language_instruction'].numpy().decode())
"

# Expected:
# Steps per episode: 87
# Action shape: (28,)
# Language instruction: pick up the red cup

Checklist before training

[ ] Step 1 (LeRobot):
    - Episodes = number of valid demos
    - Features include: images + state + action + timestamp
    
[ ] Step 2 (HDF5):
    - Action shape = (N_frames, 28) for G1 Dex3
    - Both train/ and val/ splits present
    - Train/val ratio ≈ 90/10
    
[ ] Step 3 (RLDS):
    - TFRecord files exist and are readable
    - Language instruction correct in each episode
    - No abruptly truncated episodes

Common errors

"Too few demos after filtering"

Cause: many demos filtered out (too short < 2 seconds)
Fix: collect more demos, or reduce min episode length:

python convert_unitree_json_to_lerobot.py \
    ... \
    --min_episode_length 30   # frames, default is 60

"Action shape mismatch (expected 28, got 14)"

Cause: wrong --robot_type — G1 without Dex3 has 14 arm joints only
Fix: check your hardware

# G1 with Dex3 (3-finger dex hands): --robot_type Unitree_G1_Dex3  → 28 joints
# G1 with standard gripper:          --robot_type Unitree_G1        → 14 joints

"CUDA not found" when running hdf5_to_rlds

Cause: TensorFlow uses GPU if available, but it's not required
Fix: force CPU (3x slower but works)

CUDA_VISIBLE_DEVICES="" python hdf5_to_rlds.py ...

Next: Fine-tune unifolm-vla from Qwen2.5-VL-7B — including single-GPU LoRA workaround.

unifolm-vla + Unitree G1 (Post 3): data pipeline — JSON → LeRobot → HDF5 → RLDS

unifolm-vla + Unitree G1 (Post 3): data pipeline — JSON → LeRobot → HDF5 → RLDS

Pipeline overview

Why 3 steps?

Step 1: JSON → LeRobot V2.1

Step 2: LeRobot → HDF5

Step 3: HDF5 → RLDS

Checklist before training

Common errors

"Too few demos after filtering"

"Action shape mismatch (expected 28, got 14)"

"CUDA not found" when running hdf5_to_rlds

References

Nguyễn Anh Tuấn

Related Posts

unifolm-vla + Unitree G1 (Bài 5): deploy inference server, SSH tunnel, và locomotion song song

unifolm-vla + Unitree G1 (Bài 4): fine-tune từ Qwen2.5-VL-7B — 8-GPU và single-GPU LoRA

unifolm-vla + Unitree G1 (Bài 2): thu thập dữ liệu với xr_teleoperate + Meta Quest 3

unifolm-vla + Unitree G1 (Post 3): data pipeline — JSON → LeRobot → HDF5 → RLDS

unifolm-vla + Unitree G1 (Post 3): data pipeline — JSON → LeRobot → HDF5 → RLDS

Pipeline overview

Why 3 steps?

Step 1: JSON → LeRobot V2.1

Step 2: LeRobot → HDF5

Step 3: HDF5 → RLDS

Checklist before training

Common errors

"Too few demos after filtering"

"Action shape mismatch (expected 28, got 14)"

"CUDA not found" when running hdf5_to_rlds

References

Nguyễn Anh Tuấn

Related Posts

unifolm-vla + Unitree G1 (Bài 5): deploy inference server, SSH tunnel, và locomotion song song

unifolm-vla + Unitree G1 (Bài 4): fine-tune từ Qwen2.5-VL-7B — 8-GPU và single-GPU LoRA

unifolm-vla + Unitree G1 (Bài 2): thu thập dữ liệu với xr_teleoperate + Meta Quest 3

unifolm-vla + Unitree G1 (Post 3): data pipeline — JSON → LeRobot → HDF5 → RLDS

Pipeline overview

Why 3 steps?

Step 1: JSON → LeRobot V2.1

Step 2: LeRobot → HDF5

Step 3: HDF5 → RLDS

Checklist before training

Common errors

"Too few demos after filtering"

"Action shape mismatch (expected 28, got 14)"

"CUDA not found" when running hdf5_to_rlds

References

Related posts

Nguyễn Anh Tuấn

Related Posts

unifolm-vla + Unitree G1 (Bài 5): deploy inference server, SSH tunnel, và locomotion song song

unifolm-vla + Unitree G1 (Bài 4): fine-tune từ Qwen2.5-VL-7B — 8-GPU và single-GPU LoRA

unifolm-vla + Unitree G1 (Bài 2): thu thập dữ liệu với xr_teleoperate + Meta Quest 3

unifolm-vla + Unitree G1 (Post 3): data pipeline — JSON → LeRobot → HDF5 → RLDS

Pipeline overview

Why 3 steps?

Step 1: JSON → LeRobot V2.1

Step 2: LeRobot → HDF5

Step 3: HDF5 → RLDS

Checklist before training

Common errors

"Too few demos after filtering"

"Action shape mismatch (expected 28, got 14)"

"CUDA not found" when running hdf5_to_rlds

References

Related posts

Nguyễn Anh Tuấn

Related Posts

unifolm-vla + Unitree G1 (Bài 5): deploy inference server, SSH tunnel, và locomotion song song

unifolm-vla + Unitree G1 (Bài 4): fine-tune từ Qwen2.5-VL-7B — 8-GPU và single-GPU LoRA

unifolm-vla + Unitree G1 (Bài 2): thu thập dữ liệu với xr_teleoperate + Meta Quest 3