unifolm-vla + Unitree G1 (Post 3): data pipeline — JSON → LeRobot → HDF5 → RLDS
This is post 3 of the unifolm-vla + Unitree G1 series. The previous post collected 50+ demos as JSON files. This post: converting data through 3 steps to make it ready for training.
It sounds like a lot of steps, but each step is just one command — this pipeline exists because it's designed to be reusable across different robots and training frameworks.
Pipeline overview
[xr_teleoperate output] [unitree_IL_lerobot]
JSON + video files ──→ LeRobot V2.1 format
↓
[unifolm-vla prepare_data]
LeRobot → HDF5 (more compact)
↓
[unifolm-vla prepare_data]
HDF5 → RLDS (standard training format)
↓
[unifolm-vla training]
RLDS → train_unifolm_vla.py
Why 3 steps?
- LeRobot V2.1 is HuggingFace's standard format for robot datasets — easy to share, comes with verification tooling
- HDF5 is a compact format with efficient I/O for training
- RLDS (Reinforcement Learning Dataset Standard) is TensorFlow-based and used by many VLA training pipelines
Each format has separate verification tools — useful for debugging when something goes wrong.
Step 1: JSON → LeRobot V2.1
Using unitree_IL_lerobot:
conda activate xr_teleop # same env as xr_teleoperate
cd ~/unifolm_ws/unitree_IL_lerobot
# Basic command
python unitree_lerobot/utils/convert_unitree_json_to_lerobot.py \
--raw-dir $HOME/unifolm_ws/xr_teleoperate/teleop/utils/data \
--repo-id your_name/g1_pickplace_demo \
--robot_type Unitree_G1_Dex3 \
--push_to_hub
Flag breakdown:
| Flag | Example value | Meaning |
|---|---|---|
--raw-dir |
path to JSON data | Output from xr_teleoperate |
--repo-id |
your_name/g1_task |
HuggingFace repo ID (for push or local save) |
--robot_type |
Unitree_G1_Dex3 |
G1 with Dex3 hands joint config |
--push_to_hub |
(flag) | Push to HuggingFace Hub (needs HF token) |
To run offline (no HuggingFace push):
python unitree_lerobot/utils/convert_unitree_json_to_lerobot.py \
--raw-dir $HOME/unifolm_ws/xr_teleoperate/teleop/utils/data \
--repo-id local/g1_pickplace_demo \
--robot_type Unitree_G1_Dex3 \
--local-dir $HOME/datasets/g1_pickplace_lerobot
# drop --push_to_hub
Expected output:
Processing demo_0001... OK (87 frames)
Processing demo_0002... OK (92 frames)
...
Processing demo_0050... OK (81 frames)
Skipped: 2 demos (too short)
Converted: 48/50 demos
LeRobot dataset saved to: /home/user/datasets/g1_pickplace_lerobot/
Verify LeRobot dataset:
pip install lerobot
python -c "
from lerobot.common.datasets.lerobot_dataset import LeRobotDataset
dataset = LeRobotDataset('local/g1_pickplace_demo',
local_files_only=True,
root='/home/user/datasets/g1_pickplace_lerobot')
print('Episodes:', dataset.num_episodes)
print('Frames:', len(dataset))
print('Features:', list(dataset.features.keys()))
"
# Expected output:
# Episodes: 48
# Frames: 4128
# Features: ['observation.images.left_wrist', 'observation.images.right_wrist',
# 'observation.state', 'action', 'timestamp', 'episode_index']
Step 2: LeRobot → HDF5
Using scripts in unifolm-vla:
conda activate unifolm
cd ~/unifolm_ws/unifolm-vla
python prepare_data/convert_lerobot_to_hdf5.py \
--lerobot_dir $HOME/datasets/g1_pickplace_lerobot \
--output_dir $HOME/datasets/g1_pickplace_hdf5 \
--robot_type g1_dex3
Expected output:
Converting episode 0/48...
Converting episode 1/48...
...
Conversion complete!
Output: /home/user/datasets/g1_pickplace_hdf5/
├── train/
│ ├── episode_000.hdf5
│ └── ... (43 files)
└── val/
└── ... (5 files)
Total: 43 train + 5 val episodes
Verify HDF5:
python -c "
import h5py, glob
files = glob.glob('/home/user/datasets/g1_pickplace_hdf5/train/*.hdf5')
with h5py.File(files[0], 'r') as f:
print('Keys:', list(f.keys()))
print('Actions shape:', f['action'].shape)
print('Frames:', f['action'].shape[0])
"
# Expected:
# Keys: ['action', 'obs', 'language_instruction']
# Actions shape: (87, 28) ← 28 joints for G1 Dex3
# Frames: 87
Step 3: HDF5 → RLDS
Final conversion step to training format:
cd ~/unifolm_ws/unifolm-vla/prepare_data/hdf5_to_rlds
# Install RLDS dependencies
pip install tensorflow tensorflow_datasets
# Run conversion
python hdf5_to_rlds.py \
--hdf5_dir $HOME/datasets/g1_pickplace_hdf5 \
--output_dir $HOME/datasets/g1_pickplace_rlds \
--dataset_name g1_pickplace
Expected output:
Converting HDF5 → RLDS...
Writing train split: 43 episodes
Writing val split: 5 episodes
Shuffling train data...
Done! RLDS dataset at: /home/user/datasets/g1_pickplace_rlds/
├── g1_pickplace/
│ └── 1.0.0/
│ ├── g1_pickplace-train.tfrecord-00000-of-00001
│ ├── g1_pickplace-val.tfrecord-00000-of-00001
│ └── dataset_info.json
Verify RLDS:
python -c "
import tensorflow_datasets as tfds
dataset = tfds.load(
'g1_pickplace',
data_dir='/home/user/datasets/g1_pickplace_rlds',
split='train'
)
for batch in dataset.take(1):
print('Steps per episode:', len(batch['steps']))
step = batch['steps'][0]
print('Action shape:', step['action'].shape)
print('Language instruction:', step['language_instruction'].numpy().decode())
"
# Expected:
# Steps per episode: 87
# Action shape: (28,)
# Language instruction: pick up the red cup
Checklist before training
[ ] Step 1 (LeRobot):
- Episodes = number of valid demos
- Features include: images + state + action + timestamp
[ ] Step 2 (HDF5):
- Action shape = (N_frames, 28) for G1 Dex3
- Both train/ and val/ splits present
- Train/val ratio ≈ 90/10
[ ] Step 3 (RLDS):
- TFRecord files exist and are readable
- Language instruction correct in each episode
- No abruptly truncated episodes
Common errors
"Too few demos after filtering"
Cause: many demos filtered out (too short < 2 seconds)
Fix: collect more demos, or reduce min episode length:
python convert_unitree_json_to_lerobot.py \
... \
--min_episode_length 30 # frames, default is 60
"Action shape mismatch (expected 28, got 14)"
Cause: wrong --robot_type — G1 without Dex3 has 14 arm joints only
Fix: check your hardware
# G1 with Dex3 (3-finger dex hands): --robot_type Unitree_G1_Dex3 → 28 joints
# G1 with standard gripper: --robot_type Unitree_G1 → 14 joints
"CUDA not found" when running hdf5_to_rlds
Cause: TensorFlow uses GPU if available, but it's not required
Fix: force CPU (3x slower but works)
CUDA_VISIBLE_DEVICES="" python hdf5_to_rlds.py ...
Next: Fine-tune unifolm-vla from Qwen2.5-VL-7B — including single-GPU LoRA workaround.



