PICO Teleop and LeRobot Data for VLA

In part 4, we followed the C++ deployment layer: ONNX, TensorRT, ZMQ output, and the real-time control loop. Part 5 connects that runtime to the layer that makes VLA training possible: PICO VR teleoperation and LeRobot data collection. If part 4 answered "how does the policy run on the robot?", this article answers "how do we collect human demonstrations that a VLA can learn from?".

The useful idea in SONIC is that the VLA does not need to learn the full low-level humanoid controller from scratch. The NVlabs VLA workflow documentation describes a latent-action interface where the VLA predicts a 64-dimensional SONIC motion token plus 7 left-hand joints and 7 right-hand joints. SONIC then decodes that compact action into full-body control at 50 Hz. Because of that, teleop data is not just video and joint angles. It is a synchronized story containing language, ego-view camera, robot proprioception, motion tokens, SMPL pose, planner commands, and hand actions.

Keep these technical references open while reading:

Source	What to verify
Data Collection for VLA	tmux/manual commands, ports 5555/5556/5557, camera viewer, LeRobot output
VLA Workflow	How teleop data flows into Isaac-GR00T N1.7 fine-tuning and latent actions
`install_pico.sh`	`.venv_teleop`, Python 3.10, `gear_sonic[teleop]`, XRoboToolkit
`pico_manager_thread_server.py`	POSE/PLANNER/VR_3PT modes, PICO combos, ZMQ ports 5556/5557
`launch_data_collection.py`	tmux launcher using `--input-type zmq_manager`
`run_data_exporter.py`	Frame writing, camera, SMPL pose, robot state, dataset creation
GR00T LeRobot v2 data format	`data/`, `videos/`, `meta/info.json`, `modality.json`, `tasks.jsonl` layout

If you are new to the series, read part 1 on SONIC architecture first to understand encoders, decoders, and tokens, then part 3 on SONIC data and training to see why SMPL, motion libraries, and token state appear in this pipeline. Outside the series, LeRobot for G1 humanoids and dual-arm VLA fine-tuning provide useful dataset context.

PICO VR teleoperation walking demo on a humanoid - source: NVlabs/GR00T-WholeBodyControl repo

1. The big picture: teleop is not just remote control

In traditional industrial robotics, teleoperation often means sending velocity or pose targets directly from a human operator. In SONIC, teleop has an additional purpose: creating learnable demonstrations. A good recording session must synchronize at least four data streams:

Data stream	Source	Why it matters in the dataset
Ego-view camera	Camera server or MuJoCo image publisher, port `5555`	Visual observation for the VLA
PICO/SMPL/VR pose	`pico_manager_thread_server.py`, port `5556`	Action intent, SMPL pose, wrist/hand targets, stream mode
Robot state	C++ deploy `g1_debug`, port `5557`	Proprioception, WBC action, root orientation, token state
Task prompt	CLI `--task-prompt`	Language annotation in `tasks.jsonl` and parquet

PICO does not directly command motors. It publishes pose, planner command, or VR 3-point targets over ZMQ. The C++ deployment process runs with --input-type zmq_manager, receives those commands, routes them through the SONIC policy or planner, and sends motor commands to the robot. In parallel, the data exporter subscribes to the same ZMQ streams so it can record what the robot saw, what state the robot was in, what the human requested, and what SONIC produced.

Here is the simplified flow:

PICO headset + controllers
        |
        v
pico_manager_thread_server.py  -- PUB tcp://*:5556
        |                         topics: pose, planner, manager_state
        +---------------------> C++ deploy --input-type zmq_manager
        |                         publishes g1_debug + robot_config on 5557
        v
run_data_exporter.py <--------- camera server on 5555
        |
        v
LeRobot v2.1 dataset: parquet + MP4 + meta/*.json

For VLA collection, --input-type zmq_manager is the right deployment input type because it tells the C++ runtime that commands come from a ZMQ manager, not from local keyboard or gamepad input. If this flag is wrong, the PICO streamer may still publish data and the exporter may still record some signals, but the policy will not receive the operator's command through the intended path.

2. `install_pico.sh`: a dedicated teleop environment

install_scripts/install_pico.sh is more than a package install script. It builds a dedicated .venv_teleop environment with the correct Python version, native SDK, and teleop dependencies. Its main flow is:

Detect the machine architecture with uname -m, such as x86_64 on a workstation or aarch64 on a Jetson Orin.
Install uv if it is missing.
Install a uv-managed Python 3.10 with development headers.
Remove the old .venv_teleop, then create a new virtual environment with the prompt gear_sonic_teleop.
Install gear_sonic[teleop].
Install cmake, pybind11, and setuptools, then install the XRoboToolkit SDK.
On aarch64, if the native library is missing, build PXREARobotSDK from the orin branch.
On desktop or non-onboard machines, also install gear_sonic[sim] and unitree_sdk2_python for sim bridge support.

This is a common beginner trap: PICO teleop needs the XRoboToolkit native SDK, not only Python modules. If you see an xrobotoolkit_sdk import error, or if body data never arrives from PICO, check this install path before blaming the policy. The standard command from the repository root is:

bash install_scripts/install_pico.sh
source .venv_teleop/bin/activate
python gear_sonic/scripts/pico_manager_thread_server.py --manager

On a real robot setup, the PICO headset, workstation, and G1 should be on a stable network. The official data collection docs mention 192.168.123.164 as the default G1 robot IP when a workstation connects to the robot camera server. For teleop, network quality affects more than video. It affects pose command timing and dataset synchronization.

3. The three ZMQ ports: 5555, 5556, 5557

This stack has three default ports that are worth memorizing:

Port	Producer	Consumer	Content
`5555`	Camera server or `run_sim_loop.py --enable-image-publish`	`run_data_exporter.py`, `run_camera_viewer.py`	Camera frames, usually `ego_view`, optionally wrist cameras
`5556`	`pico_manager_thread_server.py`	C++ deploy and data exporter	Topics `pose`, `planner`, `manager_state`
`5557`	C++ deploy `zmq_output_handler`	Data exporter and PICO planner feedback	Topics `g1_debug`, `robot_config`

run_data_exporter.py explicitly avoids a ROS 2 dependency. Robot state comes from the g1_debug topic on port 5557, SMPL pose comes from the pose topic on port 5556, and camera data comes through ComposedCameraClientSensor. Robot configuration is read from the robot_config topic on the same state socket. If the exporter cannot receive robot_config, it cannot confidently write correct robot metadata.

For simulation, the docs use:

python gear_sonic/scripts/run_sim_loop.py \
  --enable-image-publish --enable-offscreen --camera-port 5555

The C++ deployment pane usually runs:

cd gear_sonic_deploy
./deploy.sh --input-type zmq_manager sim

For a real robot with the camera server on the G1:

python gear_sonic/scripts/run_data_exporter.py \
  --task-prompt "pick up the cup" \
  --camera-host 192.168.123.164 \
  --camera-port 5555

To inspect camera feeds before recording:

python gear_sonic/scripts/run_camera_viewer.py \
  --camera-host localhost \
  --camera-port 5555

The camera viewer does not write a LeRobot dataset. It displays all detected camera streams in an OpenCV window, uses R to start or stop raw MP4 recording, and uses Q to quit. This is the step you should run before collecting real demonstrations: check exposure, ego-view angle, vibration, frame drops, and whether wrist cameras are actually present.

4. PICO manager: POSE, PLANNER, and VR_3PT

In pico_manager_thread_server.py, StreamMode has these values:

Mode	Value	Practical meaning
`OFF`	0	No command stream for policy control
`POSE`	1	Stream full SMPL/body pose from PICO for SONIC tracking
`PLANNER`	2	Use joystick/controller input to drive the locomotion planner
`PLANNER_FROZEN_UPPER_BODY`	3	Move the lower body with the planner while holding upper-body targets
`POSE_PAUSE`	4	Pause pose while the left menu button is held, then return to POSE
`PLANNER_VR_3PT`	5	Use planner locomotion plus VR 3-point upper-body targets

What users often call "VR_3PT" is implemented as PLANNER_VR_3PT: the robot still needs the planner for walking, while the upper body follows three VR keypoints. The _process_3pt_pose() helper extracts Root/Pelvis, Left Wrist, Right Wrist, and Neck from SMPL joints, converts Unity coordinates into the robot frame, applies rotation offsets, and returns the three non-root keypoints. The code uses Neck instead of Head because Neck is more stable for upper-body orientation.

Important PICO combinations:

Combo	Effect
`A+B+X+Y`	From `OFF`, start policy in `PLANNER`; while running, emergency stop to `OFF`
`A+X`	Toggle between `POSE` and `PLANNER` in the main chain
`B+Y`	Toggle between `POSE` and `PLANNER_FROZEN_UPPER_BODY`
Left axis click	Enter or leave `PLANNER_VR_3PT` from the current planner mode
Hold left menu	In `POSE`, switch to `POSE_PAUSE`; release to return to `POSE`
Left Grip + A	Toggle episode recording
Left Grip + B	Mark the current episode as aborted/discarded

Inside the planner loop, A+B increments the locomotion mode and X+Y decrements it. The available modes include IDLE, SLOW_WALK, WALK, RUN, kneeling and lying states, crawling, boxing, hooks, jump, stealth walk, and injured walk. Joysticks control movement, facing direction, speed, and height. From the operator's perspective, the PICO controllers act as a mode switch, a pose source, and a locomotion remote at the same time.

SONIC overview: multiple motion sources and VLA commands enter a universal token interface - source: NVlabs GEAR-SONIC project page

5. Collecting data with the tmux launcher

launch_data_collection.py is the easiest entry point when you want the full stack in one tmux session. Its defaults are important: deploy_input_type is zmq_manager, pico_manager is enabled, camera_viewer is enabled, and camera_port is 5555. For simulation:

python gear_sonic/scripts/launch_data_collection.py --sim

The launcher creates a sonic_data_collection session with panes for C++ deployment, PICO manager, data exporter, and camera viewer. If --sim is passed, it also opens a MuJoCo simulator window. The official docs note that the launcher automatically uses .venv_data_collection when the current Python environment does not provide the required dependencies, so you do not always need to activate that environment manually.

For a real robot:

python gear_sonic/scripts/launch_data_collection.py \
  --camera-host 192.168.123.164 \
  --task-prompt "pick up the cup"

With wrist cameras:

python gear_sonic/scripts/launch_data_collection.py \
  --camera-host 192.168.123.164 \
  --task-prompt "pick up the cup" \
  --record-wrist-cameras

One practical detail: the C++ deployment pane may wait for confirmation before the robot starts control. Do not begin recording just because tmux opened. Wait until C++ initialization is complete, verify that camera viewer has frames, confirm that the PICO manager sees body data, and only then use Left Grip + A to start an episode. To detach the session, use Ctrl+b then d; to reattach, run tmux attach -t sonic_data_collection.

VR 3-point teleoperation for object manipulation - source: NVlabs GEAR-SONIC project page

6. `run_data_exporter.py`: what does one LeRobot frame contain?

The exporter runs at --data-collection-frequency 50 Hz by default. On each tick, it polls robot state ZMQ, polls PICO ZMQ, checks recording commands, reads camera frames, and adds one frame to Gr00tDataExporter.

Important fields include:

Field	Source	Meaning
`observation.images.ego_view`	Camera	Main visual observation for the VLA
`observation.state`	Robot model + `g1_debug`	Full robot configuration, including body and hands
`observation.eef_state`	Forward kinematics	Left and right wrist pose, position plus quaternion per side
`action.wbc`	Last action from SONIC/C++	Whole-body action after WBC
`action.motion_token`	`token_state`, if present	64-dimensional latent motion token
`teleop.smpl_joints`	PICO pose mode	24 joints x 3, flattened to 72
`teleop.smpl_pose`	PICO pose mode	63-dimensional SMPL body pose
`teleop.stream_mode`	`manager_state`	Whether the frame came from POSE, PLANNER, or VR_3PT
`teleop.left_hand_joints`, `teleop.right_hand_joints`	PICO trigger/grip or planner message	7-dimensional hand actions per side
`teleop.vr_3pt_position`, `teleop.vr_3pt_orientation`	VR_3PT	Three keypoints and orientations for upper-body control

The exporter uses SMPL pose only when the stream mode is POSE or POSE_PAUSE and the message is not stale. The code uses a roughly 100 ms age threshold for SMPL pose; if the message is too old, it writes zeros instead of trusting stale pose data. For PLANNER_VR_3PT, it uses planner messages and a roughly 200 ms threshold. This is why process_dataset.py matters after collection: stale frames and aborted episodes can still exist on disk.

Recording has two control paths. From PICO, Left Grip + A toggles recording, and Left Grip + B marks the episode for discard. From keyboard-over-ZMQ, key c toggles recording and key x discards the episode through port 5580. During real teleop, the PICO combinations are usually better because the operator does not have to leave the controllers.

7. LeRobot v2.1 output: parquet, MP4, and metadata

The data collection docs say datasets are saved under <root-output-dir>/<dataset-name>/, with outputs/<timestamp>/ as the common default. The LeRobot v2.1 / GR00T LeRobot structure contains tabular data, videos, and metadata:

outputs/my_dataset/
├── data/
│   └── chunk-000/
│       ├── episode_000000.parquet
│       └── episode_000001.parquet
├── videos/
│   └── chunk-000/
│       └── observation.images.ego_view/
│           ├── episode_000000.mp4
│           └── episode_000001.mp4
└── meta/
    ├── info.json
    ├── modality.json
    ├── episodes.jsonl
    └── tasks.jsonl

Some older exporter documentation may show a simpler data/train-00000.parquet layout, but the GR00T LeRobot v2 data-format docs and process_dataset.py both handle chunked patterns such as data/chunk-{episode_chunk:03d}/episode_{episode_index:06d}.parquet and video episode_*.mp4 paths. When in doubt, trust meta/info.json because it stores the path templates and feature schema.

Each component has a specific job:

File	Role
`episode_*.parquet`	One row per frame: state, action, timestamp, episode index, task index, annotation
`episode_*.mp4`	Encoded camera video, usually ego view and optionally wrist cameras
`meta/info.json`	FPS, feature schema, total frames/episodes, path templates, `script_config`, possibly `discarded_episode_indices`
`meta/modality.json`	GR00T-specific metadata that splits concatenated state/action arrays into semantic fields
`meta/tasks.jsonl`	Mapping from `task_index` to natural-language task prompt, such as `"pick up the cup"`
`meta/episodes.jsonl`	Per-episode metadata: length, tasks, and index

modality.json is especially important for GR00T. In standard LeRobot, state and action can be concatenated arrays. GR00T needs to know which slice means left leg, right leg, wrist pose, root orientation, motion token, SMPL pose, planner movement, or VR 3-point orientation. features_sonic_vla.py builds this modality config from the RobotModel rather than manually hardcoding every index. If you copy parquet files but forget meta/modality.json, training may load the files while interpreting dimensions incorrectly.

8. `process_dataset.py`: clean before fine-tuning

After collecting demonstrations, do not immediately start fine-tuning. process_dataset.py does three main jobs:

Remove episodes flagged as discarded in meta/info.json.
Remove frames where teleop.smpl_pose is all zeros, plus frozen lead-in frames before them.
Merge multiple sessions into one dataset when script_config matches.

Clean one dataset into a new directory:

python gear_sonic/scripts/process_dataset.py \
  --dataset-path outputs/my_dataset \
  --output-path outputs/my_dataset_cleaned

Process in place:

python gear_sonic/scripts/process_dataset.py \
  --dataset-path outputs/my_dataset

If the session was collected with VR_3PT, pay attention to the official warning: teleop.smpl_pose will be all zeros because VR_3PT uses raw VR positions and orientations instead of SMPL body parameters. If stale-SMPL cleaning remains enabled, the script can remove every frame. Use:

python gear_sonic/scripts/process_dataset.py \
  --dataset-path outputs/my_dataset \
  --output-path outputs/my_dataset_cleaned \
  --no-remove-stale-smpl

To merge multiple sessions:

python gear_sonic/scripts/process_dataset.py \
  --dataset-path outputs/session1 outputs/session2 outputs/session3 \
  --output-path outputs/merged_dataset

The script checks script_config before merging. That guard matters: a VLA can silently learn the wrong mapping if the same observation.state dimension refers to different robot, camera, or wrist-camera setups across sessions.

9. Checklist for a clean recording session

Before recording, walk through this checklist:

Step	Check
Teleop install	`install_pico.sh` completed and `.venv_teleop` can import `xrobotoolkit_sdk`
Camera	`run_camera_viewer.py` sees frames on port `5555`; exposure and viewpoint are usable
Deployment	`deploy.sh --input-type zmq_manager` initialized and C++ publishes `robot_config`
PICO	`pico_manager_thread_server.py --manager` receives body data and mode combos work
Prompt	`--task-prompt` describes the task; avoid `"demo"` for real training data
Recording	Left Grip + A starts/stops; Left Grip + B discards failed attempts
Post-processing	Run `process_dataset.py`; use `--no-remove-stale-smpl` for VR_3PT sessions

For beginners, the most common failure is not a bad model. It is a misaligned dataset: camera recording from the wrong host, PICO publishing while C++ is not using zmq_manager, exporter missing robot_config, or a single task prompt reused across different tasks. Treat every episode as a supervised learning example: visual input, robot state, action, and language annotation must describe the same event.

Conclusion

Part 5 turns SONIC from a controller into a complete VLA data pipeline. install_pico.sh builds the teleop environment; pico_manager_thread_server.py turns PICO into a multi-mode manager; launch_data_collection.py combines C++ deploy, PICO, exporter, and camera viewer in tmux; run_data_exporter.py synchronizes camera, robot state, and teleop signals; process_dataset.py cleans the dataset before fine-tuning.

If you remember one sentence, remember this: --input-type zmq_manager is the input contract for the PICO manager, and LeRobot v2.1 is the output contract for VLA training. When both ends are correct, you can collect whole-body manipulation demonstrations where the VLA learns what to do while SONIC continues to handle how the humanoid body should move.

In the final part, we will move from teleop data to MotionBricks, the latent generative motion layer that complements the SONIC ecosystem.

Keep these technical references open while reading:

Source	What to verify
Data Collection for VLA	tmux/manual commands, ports 5555/5556/5557, camera viewer, LeRobot output
VLA Workflow	How teleop data flows into Isaac-GR00T N1.7 fine-tuning and latent actions
`install_pico.sh`	`.venv_teleop`, Python 3.10, `gear_sonic[teleop]`, XRoboToolkit
`pico_manager_thread_server.py`	POSE/PLANNER/VR_3PT modes, PICO combos, ZMQ ports 5556/5557
`launch_data_collection.py`	tmux launcher using `--input-type zmq_manager`
`run_data_exporter.py`	Frame writing, camera, SMPL pose, robot state, dataset creation
GR00T LeRobot v2 data format	`data/`, `videos/`, `meta/info.json`, `modality.json`, `tasks.jsonl` layout

PICO VR teleoperation walking demo on a humanoid - source: NVlabs/GR00T-WholeBodyControl repo

1. The big picture: teleop is not just remote control

Data stream	Source	Why it matters in the dataset
Ego-view camera	Camera server or MuJoCo image publisher, port `5555`	Visual observation for the VLA
PICO/SMPL/VR pose	`pico_manager_thread_server.py`, port `5556`	Action intent, SMPL pose, wrist/hand targets, stream mode
Robot state	C++ deploy `g1_debug`, port `5557`	Proprioception, WBC action, root orientation, token state
Task prompt	CLI `--task-prompt`	Language annotation in `tasks.jsonl` and parquet

Here is the simplified flow:

PICO headset + controllers
        |
        v
pico_manager_thread_server.py  -- PUB tcp://*:5556
        |                         topics: pose, planner, manager_state
        +---------------------> C++ deploy --input-type zmq_manager
        |                         publishes g1_debug + robot_config on 5557
        v
run_data_exporter.py <--------- camera server on 5555
        |
        v
LeRobot v2.1 dataset: parquet + MP4 + meta/*.json

2. `install_pico.sh`: a dedicated teleop environment

Detect the machine architecture with uname -m, such as x86_64 on a workstation or aarch64 on a Jetson Orin.
Install uv if it is missing.
Install a uv-managed Python 3.10 with development headers.
Remove the old .venv_teleop, then create a new virtual environment with the prompt gear_sonic_teleop.
Install gear_sonic[teleop].
Install cmake, pybind11, and setuptools, then install the XRoboToolkit SDK.
On aarch64, if the native library is missing, build PXREARobotSDK from the orin branch.
On desktop or non-onboard machines, also install gear_sonic[sim] and unitree_sdk2_python for sim bridge support.

bash install_scripts/install_pico.sh
source .venv_teleop/bin/activate
python gear_sonic/scripts/pico_manager_thread_server.py --manager

3. The three ZMQ ports: 5555, 5556, 5557

This stack has three default ports that are worth memorizing:

Port	Producer	Consumer	Content
`5555`	Camera server or `run_sim_loop.py --enable-image-publish`	`run_data_exporter.py`, `run_camera_viewer.py`	Camera frames, usually `ego_view`, optionally wrist cameras
`5556`	`pico_manager_thread_server.py`	C++ deploy and data exporter	Topics `pose`, `planner`, `manager_state`
`5557`	C++ deploy `zmq_output_handler`	Data exporter and PICO planner feedback	Topics `g1_debug`, `robot_config`

For simulation, the docs use:

python gear_sonic/scripts/run_sim_loop.py \
  --enable-image-publish --enable-offscreen --camera-port 5555

The C++ deployment pane usually runs:

cd gear_sonic_deploy
./deploy.sh --input-type zmq_manager sim

For a real robot with the camera server on the G1:

python gear_sonic/scripts/run_data_exporter.py \
  --task-prompt "pick up the cup" \
  --camera-host 192.168.123.164 \
  --camera-port 5555

To inspect camera feeds before recording:

python gear_sonic/scripts/run_camera_viewer.py \
  --camera-host localhost \
  --camera-port 5555

4. PICO manager: POSE, PLANNER, and VR_3PT

In pico_manager_thread_server.py, StreamMode has these values:

Mode	Value	Practical meaning
`OFF`	0	No command stream for policy control
`POSE`	1	Stream full SMPL/body pose from PICO for SONIC tracking
`PLANNER`	2	Use joystick/controller input to drive the locomotion planner
`PLANNER_FROZEN_UPPER_BODY`	3	Move the lower body with the planner while holding upper-body targets
`POSE_PAUSE`	4	Pause pose while the left menu button is held, then return to POSE
`PLANNER_VR_3PT`	5	Use planner locomotion plus VR 3-point upper-body targets

Important PICO combinations:

Combo	Effect
`A+B+X+Y`	From `OFF`, start policy in `PLANNER`; while running, emergency stop to `OFF`
`A+X`	Toggle between `POSE` and `PLANNER` in the main chain
`B+Y`	Toggle between `POSE` and `PLANNER_FROZEN_UPPER_BODY`
Left axis click	Enter or leave `PLANNER_VR_3PT` from the current planner mode
Hold left menu	In `POSE`, switch to `POSE_PAUSE`; release to return to `POSE`
Left Grip + A	Toggle episode recording
Left Grip + B	Mark the current episode as aborted/discarded

SONIC overview: multiple motion sources and VLA commands enter a universal token interface - source: NVlabs GEAR-SONIC project page

5. Collecting data with the tmux launcher

python gear_sonic/scripts/launch_data_collection.py --sim

For a real robot:

python gear_sonic/scripts/launch_data_collection.py \
  --camera-host 192.168.123.164 \
  --task-prompt "pick up the cup"

With wrist cameras:

python gear_sonic/scripts/launch_data_collection.py \
  --camera-host 192.168.123.164 \
  --task-prompt "pick up the cup" \
  --record-wrist-cameras

VR 3-point teleoperation for object manipulation - source: NVlabs GEAR-SONIC project page

6. `run_data_exporter.py`: what does one LeRobot frame contain?

Important fields include:

Field	Source	Meaning
`observation.images.ego_view`	Camera	Main visual observation for the VLA
`observation.state`	Robot model + `g1_debug`	Full robot configuration, including body and hands
`observation.eef_state`	Forward kinematics	Left and right wrist pose, position plus quaternion per side
`action.wbc`	Last action from SONIC/C++	Whole-body action after WBC
`action.motion_token`	`token_state`, if present	64-dimensional latent motion token
`teleop.smpl_joints`	PICO pose mode	24 joints x 3, flattened to 72
`teleop.smpl_pose`	PICO pose mode	63-dimensional SMPL body pose
`teleop.stream_mode`	`manager_state`	Whether the frame came from POSE, PLANNER, or VR_3PT
`teleop.left_hand_joints`, `teleop.right_hand_joints`	PICO trigger/grip or planner message	7-dimensional hand actions per side
`teleop.vr_3pt_position`, `teleop.vr_3pt_orientation`	VR_3PT	Three keypoints and orientations for upper-body control

7. LeRobot v2.1 output: parquet, MP4, and metadata

outputs/my_dataset/
├── data/
│   └── chunk-000/
│       ├── episode_000000.parquet
│       └── episode_000001.parquet
├── videos/
│   └── chunk-000/
│       └── observation.images.ego_view/
│           ├── episode_000000.mp4
│           └── episode_000001.mp4
└── meta/
    ├── info.json
    ├── modality.json
    ├── episodes.jsonl
    └── tasks.jsonl

Each component has a specific job:

File	Role
`episode_*.parquet`	One row per frame: state, action, timestamp, episode index, task index, annotation
`episode_*.mp4`	Encoded camera video, usually ego view and optionally wrist cameras
`meta/info.json`	FPS, feature schema, total frames/episodes, path templates, `script_config`, possibly `discarded_episode_indices`
`meta/modality.json`	GR00T-specific metadata that splits concatenated state/action arrays into semantic fields
`meta/tasks.jsonl`	Mapping from `task_index` to natural-language task prompt, such as `"pick up the cup"`
`meta/episodes.jsonl`	Per-episode metadata: length, tasks, and index

8. `process_dataset.py`: clean before fine-tuning

After collecting demonstrations, do not immediately start fine-tuning. process_dataset.py does three main jobs:

Remove episodes flagged as discarded in meta/info.json.
Remove frames where teleop.smpl_pose is all zeros, plus frozen lead-in frames before them.
Merge multiple sessions into one dataset when script_config matches.

Clean one dataset into a new directory:

python gear_sonic/scripts/process_dataset.py \
  --dataset-path outputs/my_dataset \
  --output-path outputs/my_dataset_cleaned

Process in place:

python gear_sonic/scripts/process_dataset.py \
  --dataset-path outputs/my_dataset

python gear_sonic/scripts/process_dataset.py \
  --dataset-path outputs/my_dataset \
  --output-path outputs/my_dataset_cleaned \
  --no-remove-stale-smpl

To merge multiple sessions:

python gear_sonic/scripts/process_dataset.py \
  --dataset-path outputs/session1 outputs/session2 outputs/session3 \
  --output-path outputs/merged_dataset

9. Checklist for a clean recording session

Before recording, walk through this checklist:

Step	Check
Teleop install	`install_pico.sh` completed and `.venv_teleop` can import `xrobotoolkit_sdk`
Camera	`run_camera_viewer.py` sees frames on port `5555`; exposure and viewpoint are usable
Deployment	`deploy.sh --input-type zmq_manager` initialized and C++ publishes `robot_config`
PICO	`pico_manager_thread_server.py --manager` receives body data and mode combos work
Prompt	`--task-prompt` describes the task; avoid `"demo"` for real training data
Recording	Left Grip + A starts/stops; Left Grip + B discards failed attempts
Post-processing	Run `process_dataset.py`; use `--no-remove-stale-smpl` for VR_3PT sessions

Conclusion

In the final part, we will move from teleop data to MotionBricks, the latent generative motion layer that complements the SONIC ecosystem.

PICO Teleop and LeRobot Data for VLA

1. The big picture: teleop is not just remote control

2. `install_pico.sh`: a dedicated teleop environment

3. The three ZMQ ports: 5555, 5556, 5557

4. PICO manager: POSE, PLANNER, and VR_3PT

5. Collecting data with the tmux launcher

6. `run_data_exporter.py`: what does one LeRobot frame contain?

7. LeRobot v2.1 output: parquet, MP4, and metadata

8. `process_dataset.py`: clean before fine-tuning

9. Checklist for a clean recording session

Conclusion

Nguyễn Anh Tuấn

Related Posts

Triển khai C++: TensorRT, ZMQ, ONNX

Kiến trúc SONIC cho WBC humanoid

Dữ liệu BONES-SEED và huấn luyện SONIC

PICO Teleop and LeRobot Data for VLA

1. The big picture: teleop is not just remote control

2. `install_pico.sh`: a dedicated teleop environment

3. The three ZMQ ports: 5555, 5556, 5557

4. PICO manager: POSE, PLANNER, and VR_3PT

5. Collecting data with the tmux launcher

6. `run_data_exporter.py`: what does one LeRobot frame contain?

7. LeRobot v2.1 output: parquet, MP4, and metadata

8. `process_dataset.py`: clean before fine-tuning

9. Checklist for a clean recording session

Conclusion

Nguyễn Anh Tuấn

Related Posts

Triển khai C++: TensorRT, ZMQ, ONNX

Kiến trúc SONIC cho WBC humanoid

Dữ liệu BONES-SEED và huấn luyện SONIC

1. The big picture: teleop is not just remote control

2. install_pico.sh: a dedicated teleop environment

3. The three ZMQ ports: 5555, 5556, 5557

4. PICO manager: POSE, PLANNER, and VR_3PT

5. Collecting data with the tmux launcher

6. run_data_exporter.py: what does one LeRobot frame contain?

7. LeRobot v2.1 output: parquet, MP4, and metadata

8. process_dataset.py: clean before fine-tuning

9. Checklist for a clean recording session

Conclusion

Related Posts

Nguyễn Anh Tuấn

Related Posts

Triển khai C++: TensorRT, ZMQ, ONNX

Kiến trúc SONIC cho WBC humanoid

Dữ liệu BONES-SEED và huấn luyện SONIC

1. The big picture: teleop is not just remote control

2. install_pico.sh: a dedicated teleop environment

3. The three ZMQ ports: 5555, 5556, 5557

4. PICO manager: POSE, PLANNER, and VR_3PT

5. Collecting data with the tmux launcher

6. run_data_exporter.py: what does one LeRobot frame contain?

7. LeRobot v2.1 output: parquet, MP4, and metadata

8. process_dataset.py: clean before fine-tuning

9. Checklist for a clean recording session

Conclusion

Related Posts

Nguyễn Anh Tuấn

Related Posts

Triển khai C++: TensorRT, ZMQ, ONNX

Kiến trúc SONIC cho WBC humanoid

Dữ liệu BONES-SEED và huấn luyện SONIC

2. `install_pico.sh`: a dedicated teleop environment

6. `run_data_exporter.py`: what does one LeRobot frame contain?

8. `process_dataset.py`: clean before fine-tuning

2. `install_pico.sh`: a dedicated teleop environment

6. `run_data_exporter.py`: what does one LeRobot frame contain?

8. `process_dataset.py`: clean before fine-tuning