Teleoperation: Real-World Robot Data Collection

In part 1 of this series, we mapped the humanoid robot data war: who has data, who opens data, and who keeps data as a moat. Part 2 moves down to the practical layer: how is that data actually collected?

The short answer is teleoperation. A human controls a real robot. The robot records camera streams, joint states, action commands, force or tactile signals if available, then turns that session into trajectories for imitation learning, ACT, Diffusion Policy, or VLA training. This is how AgiBot built AgiBot World, how the ALOHA community created bimanual manipulation datasets, and why repositories like Unitree xr_teleoperate matter: they turn expensive robots into data collection machines.

This article assumes no prior teleoperation experience. The goal is simple: by the end, you should know which stack fits your budget.

If you want a real humanoid: look at Unitree G1/H1 plus XR teleoperation.
If you want high-quality bimanual manipulation in a lab: use ALOHA 2 or Mobile ALOHA.
If you need the cheapest credible starting point: use GELLO or a LeRobot-compatible arm.
If you want to scale like a large company: study AgiBot, but understand that the real difficulty is operator operations, QA, and data standardization.

Series roadmap

Part	Title	Status
1	The Data War: Who Owns Humanoid Robot Data?	Published
2	Teleoperation: Real-World Robot Data Collection	This article
3	Human Video Mining: Learning from Humans	Next
4	Synthetic Data Pipelines: From Sim to Real	Coming
5	VLA Data Scaling Laws	Coming
6	Data Strategy: What Should You Collect?	Coming
7	Open vs Closed: Licenses, Data Moats & What's Next	Coming

What teleoperation means in robot learning

Teleoperation means controlling a robot with a human input device: leader arms, XR headsets, control cabins, joysticks, haptic devices, or a mechanical copy of the target arm. In robot learning, teleoperation has a specific purpose: record demonstrations clean enough for a policy to learn from.

A minimal trajectory usually contains:

episode_000123
├── observations
│   ├── images/cam_high        # RGB or RGB-D
│   ├── images/cam_left_wrist  # wrist camera
│   ├── state                  # joint position, velocity, gripper state
│   └── optional: force, tactile, IMU, base pose
├── actions
│   ├── joint targets          # next-step action
│   └── optional: base action, hand target, force target
├── metadata
│   ├── task: "fold towel"
│   ├── success: true
│   ├── operator_id
│   └── reset notes
└── language
    └── "fold the towel and place it on the tray"

For beginners, the key rule is: do not record only video. Video shows what happened. A robot policy also needs to know what command the robot received at each time step. Real robot datasets must synchronize three streams: vision, robot state, and action.

AgiBot World: scaling through robots, operators, and QA

AgiBot World Colosseo is the clearest public example of industrial-scale teleoperation. The technical report describes more than 1 million trajectories across 217 tasks, collected by over 100 homogeneous robots in real-world scenarios. The OpenDriveLab/AgiBot-World repository states that the Beta release contains 1,003,672 trajectories and is built on LeRobot v2.1. The paper also emphasizes human-in-the-loop verification: humans do not only operate robots; they also check data quality before the data enters training.

The impressive part is not only volume. AgiBot turns teleoperation into a production line:

task design
  -> operator teleoperates AgiBot G1
  -> robot records multimodal streams
  -> success/failure is checked
  -> bad trajectories are filtered
  -> task, skill, and keyframe annotations are added
  -> data is converted into training format
  -> GO-1 or internal policies are trained

One public-source detail needs careful reading. The original AgiBot World Colosseo material identifies AgiBot G1 as the collection platform: a dual-arm humanoid with dexterous hands and visuo-tactile sensing. Some newer AGIBOT WORLD 2026 pages mention G2 or expansion to G2. For this article, treat G1 as the Colosseo collection stack and G2 as a later industrial platform within the same ecosystem.

Who should use this pattern? Companies with robot fleets, a data collection facility, trained operators, and a need to train generalist policies. For a small lab, the lesson is not "buy 100 robots." The lesson is to design the pipeline for scale from day one: one schema, one success checklist, one reset protocol, one format, and one QA process.

Unitree `xr_teleoperate`: humanoid teleop with Quest 3/PICO/AVP

If AgiBot is the fleet-scale reference, Unitree is the more accessible path for teams that want to work with a real humanoid. The unitreerobotics/xr_teleoperate repository supports Unitree G1, H1, H1_2, H2, Dex1-1 grippers, Dex3-1 hands, Inspire hands, and BrainCo hands. The supported XR devices include Apple Vision Pro, PICO 4 Ultra Enterprise, and Meta Quest 3.

A typical launch command looks like this:

python teleop_hand_and_arm.py \
  --xr-mode=hand \
  --arm=G1_29 \
  --ee=dex3 \
  --record

The operator wears a headset, opens the Vuer/WebRTC interface, sees the robot's first-person camera view, presses r to begin teleoperation, and presses s to start or stop recording an episode. The repository documents a default --frequency of 30 Hz and stores recorded data under xr_teleoperate/teleop/utils/data, with follow-up instructions for Unitree imitation learning and LeRobot usage.

Strengths:

It feels natural for humanoid work: the operator uses head and hands to control view, arms, and hands.
There is a path from simulation to physical deployment.
It uses commercial hardware. Unitree lists G1 from around 13.5K USD publicly, while H1 is much more expensive.
It fits tasks that need body context: opening doors, picking objects from shelves, and manipulating in rooms.

Weaknesses:

Latency depends on Wi-Fi, WebRTC, headset browser behavior, image service, IK, and DDS. Unstable networking produces bad trajectories.
XR hand tracking is not always precise enough for small contact-rich manipulation.
Safety is real. Humanoids have mass, inertia, and fall risk.

Unitree XR is practical if you want humanoid data at a much lower budget than building a custom humanoid. But it does not automatically make you AgiBot. You still need task protocols, success/failure labels, reset discipline, camera standardization, and conversion into a training-ready format.

ALOHA 2: the lab standard for bimanual manipulation

ALOHA and ALOHA 2 are the best-known stacks for bimanual manipulation data. Instead of XR, ALOHA uses leader-follower arms. The operator moves two leader arms, and the follower arms reproduce the motion in the workspace. The mapping is mechanical and intuitive: left hand controls the left arm, right hand controls the right arm, and the leader grippers control the follower grippers.

ALOHA 2 improves the original ALOHA design: lower-friction grippers, better ergonomics, a simpler frame, smaller Intel RealSense D405 cameras, and a more accurate MuJoCo model. The ALOHA 2 page explicitly frames these changes around scaling data collection: more robots, more hours per robot, more task diversity, and less downtime.

The traditional ALOHA data format is HDF5. According to the Trossen ALOHA documentation, a stationary episode looks like this:

episode_000001.hdf5
├── observations/images/cam_high         uint8  (480, 640, 3)
├── observations/images/cam_low          uint8  (480, 640, 3)
├── observations/images/cam_left_wrist   uint8  (480, 640, 3)
├── observations/images/cam_right_wrist  uint8  (480, 640, 3)
├── observations/qpos                    float64 (14,)
├── observations/qvel                    float64 (14,)
└── action                               float64 (14,)

For Mobile ALOHA, the schema adds base_action:

action       float64 (14,)  # two arms plus grippers
base_action  float64 (2,)   # linear/angular or equivalent base command

ALOHA 2 is a strong fit when you need precise two-handed manipulation: opening boxes, folding towels, plugging cables, picking small objects, or assembly. It is not a full-body humanoid, but its bimanual data is extremely valuable because contact-rich manipulation is one of the hardest parts of robotics.

Mobile ALOHA: adding a base for whole-body manipulation

Mobile ALOHA extends ALOHA with a mobile base and whole-body teleoperation. The paper reports that with about 50 demonstrations per task, co-training with existing static ALOHA datasets can substantially improve success rates on mobile manipulation tasks such as opening cabinets, cooking, using an elevator, or rinsing a pan.

From a data collection perspective, Mobile ALOHA addresses a major gap: many real tasks do not fit on a tabletop. The robot must move to the right location, rotate its base, put both arms into a useful workspace, and only then manipulate. Stationary ALOHA teaches "what the hands do." Mobile ALOHA also teaches "where the body brings the hands."

Current commercial prices for ALOHA-style systems are not identical to the original research bill of materials. Trossen's 2026 pages list Stationary AI around 23,995.95 USD, Mobile AI around 33,695.95 USD without a laptop, and 37,845.95 USD with a laptop. These are not mandatory prices if you self-build open-source hardware, but they are realistic planning numbers for teams that want a supported system.

GELLO: the cheapest serious way to start

GELLO is a practical idea: if the target robot arm is Franka, UR5, or xArm, build a controller with the same kinematic structure using 3D-printed parts and inexpensive motors. The OpenReview summary describes GELLO as a teleoperation device under 300 USD that is easy to build and intuitive to use.

GELLO is not a humanoid. It is not a complete bimanual system if you only build one controller. But for beginners, it is a very good way to learn the entire data loop:

build controller
  -> map joint positions to the robot arm
  -> record image + qpos + action
  -> train ACT or Diffusion Policy
  -> deploy policy
  -> collect failure recovery data

GELLO's strength is latency and mapping. Because the controller shares the target arm's kinematic structure, the operator does not need to think about which joystick axis maps to which robot axis. The human moves the controller, and the robot follows. Compared with VR controllers or 3D mice, demonstrations are often smoother and more consistent.

The limitation is scope. If you need locomotion, dual-arm full-body control, or complex dexterous hands, GELLO is only one component. But for low budgets, it is easy to underestimate. A small clean GELLO dataset can be more useful than a thousand shaky XR episodes with delay, dropped frames, and weak annotation.

LeRobotDataset or HDF5?

The two formats you will see most often are ALOHA/ACT-style HDF5 and Hugging Face LeRobotDataset.

HDF5 is strong for local research. Each episode is a file, easy to open with h5py, and easy to train with the original ACT codebase. The weakness appears when you scale to many cameras and thousands of episodes: file management, metadata, and sharing become painful.

LeRobotDataset solves the standardization problem. The LeRobotDataset v3.0 documentation describes a format for multimodal time-series data, sensorimotor signals, multi-camera video, and metadata. The huggingface/lerobot repository stores vision as MP4 or images and state/action as Parquet. The advantages are PyTorch loading, Hugging Face Hub integration, streaming large datasets, and mixing data from different robots.

A practical rule:

If you are...	Choose
Reproducing the original ACT/ALOHA papers	HDF5
Prototyping in a lab with fewer than a few hundred episodes	HDF5 is fine
Sharing data, training with LeRobot, or mixing robots	LeRobotDataset
Scaling to thousands or millions of trajectories	LeRobotDataset or an internal equivalent with serious metadata

A minimal LeRobot-style layout looks like:

my_robot_dataset/
├── meta/
│   ├── info.json
│   ├── episodes.jsonl
│   └── tasks.jsonl
├── data/
│   └── chunk-000/*.parquet
└── videos/
    └── chunk-000/observation.images.cam_high/*.mp4

Cost, throughput, and latency comparison

No public source reports throughput and latency under the same standard for every stack. The table below separates public facts from operational estimates. For short trajectories of 20-60 seconds and resets of 30-90 seconds, a skilled operator rarely records a full 60 minutes of usable data per hour. Labeling, quality checks, reset time, and fatigue all matter.

Stack	Practical hardware cost	Estimated throughput/operator	Latency target	Natural output format	Best fit
AgiBot-style fleet	Not public; requires robot fleet, facility, and QA	20-60 trajectories/hour/operator, scaled by robots/operators	Low enough for contact; stability matters more than demo appeal	LeRobot v2.1/Parquet + video, rich metadata	Funded company training generalist humanoid policies
Unitree G1/H1 + XR	G1 from about 13.5K USD, H1 much higher; plus Quest/PICO/AVP, PC, cameras	10-35 trajectories/hour/operator depending on task and reset	Repository default is 30 Hz; keep network delay and jitter low	`xr_teleoperate` recording, then Unitree IL/LeRobot conversion	Teams wanting real humanoid data at moderate budget
ALOHA 2 / Stationary AI	Self-build can be cheaper; supported kit around 24K USD	20-50 trajectories/hour/operator for tabletop tasks	Low due to leader-follower control; ALOHA 2 improves gripper feel	HDF5, convertible to LeRobot	Lab bimanual manipulation
Mobile ALOHA / Mobile AI	Supported kit around 34K-38K USD; self-build varies	8-25 trajectories/hour/operator because environment resets take longer	Base and arm must be smooth; high latency makes path control hard	HDF5 with `base_action`, or LeRobot after conversion	Mobile manipulation tasks
GELLO + arm	Controller under 300 USD, excluding robot arm and cameras	20-60 trajectories/hour/operator for single-arm tasks	Very good if joint mapping is stable	Whatever you record: HDF5 or LeRobot	Low-budget pipeline learning and single-arm manipulation

Choosing a stack by budget

If your budget is under 5,000 USD, do not start with a humanoid. Use a small arm, a fixed camera, and a GELLO-style controller or low-cost leader arm. Your goal is to learn the pipeline: timestamp synchronization, correct action logging, first policy training, and failure recovery.

If your budget is 10,000-25,000 USD, there are two paths. Choose Unitree G1 if your primary goal is humanoid embodiment, locomotion context, and XR teleoperation. Choose stationary ALOHA/Trossen AI if your goal is high-quality two-handed manipulation. For the same budget, ALOHA usually gives cleaner manipulation data; Unitree gives a closer humanoid embodiment.

If your budget is 30,000-50,000 USD, Mobile ALOHA or Mobile AI becomes compelling. You pay more for a base, frame, cameras, and a more stable workflow for real-world tasks. This is a reasonable range for labs that want mobile manipulation without operating a humanoid fleet.

If your budget is above 100,000 USD, the question is no longer which robot to buy. The question is how to run a data factory: how many operators, how many tasks per day, QA process, format, versioning, privacy, license, and how data enters the training loop. That is the topic of part 6 on data strategy.

Data collection shift checklist

Before recording the first episode, write a checklist. A good teleoperation stack does not start with a model; it starts with operational discipline.

1. Define the task
   - Short, clear task name
   - Initial condition
   - Success condition
   - Failure condition

2. Lock the setup
   - Camera poses
   - Lighting
   - Object positions
   - Firmware/control-code version

3. Record data
   - Synchronized timestamps
   - No severe image frame drops
   - Correct qpos/qvel/action dimensions
   - Metadata includes task, operator, success

4. QA after each batch
   - Open 10 random episodes
   - Visualize action/state
   - Remove unsafe collisions, bad resets, and shifted cameras
   - Log failures for recovery-data collection

The beginner mistake is ignoring failure data. If you only save successful demonstrations, the policy learns the clean path but not how to recover when it drifts. Large pipelines treat verification, cleaning, and recovery as part of the data engine, not a side task.

Technical sources checked

Conclusion: teleoperation is the first data moat

In humanoid robotics, model architectures change quickly, but real-world data remains expensive and slow. Teleoperation turns money, operator time, and robot uptime into a training asset. AgiBot wins on scale. Unitree opens a path for smaller teams to collect real humanoid data. ALOHA 2 and Mobile ALOHA win on bimanual data quality. GELLO wins on starting cost.

The practical advice: choose the stack by task, not by how impressive the robot looks. If your task is plugging cables, opening boxes, or folding towels on a table, ALOHA or GELLO may beat a humanoid. If the task requires moving through space, reaching shelves, and coordinating body and arms, Unitree or Mobile ALOHA makes more sense. Once you have real data, part 3 asks the next question: can we reduce teleoperation cost by mining human video from the internet?

This article assumes no prior teleoperation experience. The goal is simple: by the end, you should know which stack fits your budget.

If you want a real humanoid: look at Unitree G1/H1 plus XR teleoperation.
If you want high-quality bimanual manipulation in a lab: use ALOHA 2 or Mobile ALOHA.
If you need the cheapest credible starting point: use GELLO or a LeRobot-compatible arm.
If you want to scale like a large company: study AgiBot, but understand that the real difficulty is operator operations, QA, and data standardization.

Series roadmap

Part	Title	Status
1	The Data War: Who Owns Humanoid Robot Data?	Published
2	Teleoperation: Real-World Robot Data Collection	This article
3	Human Video Mining: Learning from Humans	Next
4	Synthetic Data Pipelines: From Sim to Real	Coming
5	VLA Data Scaling Laws	Coming
6	Data Strategy: What Should You Collect?	Coming
7	Open vs Closed: Licenses, Data Moats & What's Next	Coming

What teleoperation means in robot learning

A minimal trajectory usually contains:

episode_000123
├── observations
│   ├── images/cam_high        # RGB or RGB-D
│   ├── images/cam_left_wrist  # wrist camera
│   ├── state                  # joint position, velocity, gripper state
│   └── optional: force, tactile, IMU, base pose
├── actions
│   ├── joint targets          # next-step action
│   └── optional: base action, hand target, force target
├── metadata
│   ├── task: "fold towel"
│   ├── success: true
│   ├── operator_id
│   └── reset notes
└── language
    └── "fold the towel and place it on the tray"

AgiBot World: scaling through robots, operators, and QA

The impressive part is not only volume. AgiBot turns teleoperation into a production line:

task design
  -> operator teleoperates AgiBot G1
  -> robot records multimodal streams
  -> success/failure is checked
  -> bad trajectories are filtered
  -> task, skill, and keyframe annotations are added
  -> data is converted into training format
  -> GO-1 or internal policies are trained

Unitree `xr_teleoperate`: humanoid teleop with Quest 3/PICO/AVP

A typical launch command looks like this:

python teleop_hand_and_arm.py \
  --xr-mode=hand \
  --arm=G1_29 \
  --ee=dex3 \
  --record

Strengths:

It feels natural for humanoid work: the operator uses head and hands to control view, arms, and hands.
There is a path from simulation to physical deployment.
It uses commercial hardware. Unitree lists G1 from around 13.5K USD publicly, while H1 is much more expensive.
It fits tasks that need body context: opening doors, picking objects from shelves, and manipulating in rooms.

Weaknesses:

Latency depends on Wi-Fi, WebRTC, headset browser behavior, image service, IK, and DDS. Unstable networking produces bad trajectories.
XR hand tracking is not always precise enough for small contact-rich manipulation.
Safety is real. Humanoids have mass, inertia, and fall risk.

ALOHA 2: the lab standard for bimanual manipulation

The traditional ALOHA data format is HDF5. According to the Trossen ALOHA documentation, a stationary episode looks like this:

episode_000001.hdf5
├── observations/images/cam_high         uint8  (480, 640, 3)
├── observations/images/cam_low          uint8  (480, 640, 3)
├── observations/images/cam_left_wrist   uint8  (480, 640, 3)
├── observations/images/cam_right_wrist  uint8  (480, 640, 3)
├── observations/qpos                    float64 (14,)
├── observations/qvel                    float64 (14,)
└── action                               float64 (14,)

For Mobile ALOHA, the schema adds base_action:

action       float64 (14,)  # two arms plus grippers
base_action  float64 (2,)   # linear/angular or equivalent base command

Mobile ALOHA: adding a base for whole-body manipulation

GELLO: the cheapest serious way to start

GELLO is not a humanoid. It is not a complete bimanual system if you only build one controller. But for beginners, it is a very good way to learn the entire data loop:

build controller
  -> map joint positions to the robot arm
  -> record image + qpos + action
  -> train ACT or Diffusion Policy
  -> deploy policy
  -> collect failure recovery data

LeRobotDataset or HDF5?

The two formats you will see most often are ALOHA/ACT-style HDF5 and Hugging Face LeRobotDataset.

A practical rule:

If you are...	Choose
Reproducing the original ACT/ALOHA papers	HDF5
Prototyping in a lab with fewer than a few hundred episodes	HDF5 is fine
Sharing data, training with LeRobot, or mixing robots	LeRobotDataset
Scaling to thousands or millions of trajectories	LeRobotDataset or an internal equivalent with serious metadata

A minimal LeRobot-style layout looks like:

my_robot_dataset/
├── meta/
│   ├── info.json
│   ├── episodes.jsonl
│   └── tasks.jsonl
├── data/
│   └── chunk-000/*.parquet
└── videos/
    └── chunk-000/observation.images.cam_high/*.mp4

Cost, throughput, and latency comparison

Stack	Practical hardware cost	Estimated throughput/operator	Latency target	Natural output format	Best fit
AgiBot-style fleet	Not public; requires robot fleet, facility, and QA	20-60 trajectories/hour/operator, scaled by robots/operators	Low enough for contact; stability matters more than demo appeal	LeRobot v2.1/Parquet + video, rich metadata	Funded company training generalist humanoid policies
Unitree G1/H1 + XR	G1 from about 13.5K USD, H1 much higher; plus Quest/PICO/AVP, PC, cameras	10-35 trajectories/hour/operator depending on task and reset	Repository default is 30 Hz; keep network delay and jitter low	`xr_teleoperate` recording, then Unitree IL/LeRobot conversion	Teams wanting real humanoid data at moderate budget
ALOHA 2 / Stationary AI	Self-build can be cheaper; supported kit around 24K USD	20-50 trajectories/hour/operator for tabletop tasks	Low due to leader-follower control; ALOHA 2 improves gripper feel	HDF5, convertible to LeRobot	Lab bimanual manipulation
Mobile ALOHA / Mobile AI	Supported kit around 34K-38K USD; self-build varies	8-25 trajectories/hour/operator because environment resets take longer	Base and arm must be smooth; high latency makes path control hard	HDF5 with `base_action`, or LeRobot after conversion	Mobile manipulation tasks
GELLO + arm	Controller under 300 USD, excluding robot arm and cameras	20-60 trajectories/hour/operator for single-arm tasks	Very good if joint mapping is stable	Whatever you record: HDF5 or LeRobot	Low-budget pipeline learning and single-arm manipulation

Choosing a stack by budget

Data collection shift checklist

Before recording the first episode, write a checklist. A good teleoperation stack does not start with a model; it starts with operational discipline.

1. Define the task
   - Short, clear task name
   - Initial condition
   - Success condition
   - Failure condition

2. Lock the setup
   - Camera poses
   - Lighting
   - Object positions
   - Firmware/control-code version

3. Record data
   - Synchronized timestamps
   - No severe image frame drops
   - Correct qpos/qvel/action dimensions
   - Metadata includes task, operator, success

4. QA after each batch
   - Open 10 random episodes
   - Visualize action/state
   - Remove unsafe collisions, bad resets, and shifted cameras
   - Log failures for recovery-data collection

Teleoperation: Real-World Robot Data Collection

Series roadmap

What teleoperation means in robot learning

AgiBot World: scaling through robots, operators, and QA

Unitree `xr_teleoperate`: humanoid teleop with Quest 3/PICO/AVP

ALOHA 2: the lab standard for bimanual manipulation

Mobile ALOHA: adding a base for whole-body manipulation

GELLO: the cheapest serious way to start

LeRobotDataset or HDF5?

Cost, throughput, and latency comparison

Choosing a stack by budget

Data collection shift checklist

Technical sources checked

Conclusion: teleoperation is the first data moat

Nguyễn Anh Tuấn

Related Posts

Open vs Closed: License, Data Moat Và Tương Lai 2027

Data Strategy: Team Nhỏ Nên Thu Thập Dữ Liệu Gì?

Human Video Mining: Khai Thác Video Người Cho Robot

Teleoperation: Real-World Robot Data Collection

Series roadmap

What teleoperation means in robot learning

AgiBot World: scaling through robots, operators, and QA

Unitree `xr_teleoperate`: humanoid teleop with Quest 3/PICO/AVP

ALOHA 2: the lab standard for bimanual manipulation

Mobile ALOHA: adding a base for whole-body manipulation

GELLO: the cheapest serious way to start

LeRobotDataset or HDF5?

Cost, throughput, and latency comparison

Choosing a stack by budget

Data collection shift checklist

Technical sources checked

Conclusion: teleoperation is the first data moat

Nguyễn Anh Tuấn

Related Posts

Open vs Closed: License, Data Moat Và Tương Lai 2027

Data Strategy: Team Nhỏ Nên Thu Thập Dữ Liệu Gì?

Human Video Mining: Khai Thác Video Người Cho Robot

Series roadmap

What teleoperation means in robot learning

AgiBot World: scaling through robots, operators, and QA

Unitree xr_teleoperate: humanoid teleop with Quest 3/PICO/AVP

ALOHA 2: the lab standard for bimanual manipulation

Mobile ALOHA: adding a base for whole-body manipulation

GELLO: the cheapest serious way to start

LeRobotDataset or HDF5?

Cost, throughput, and latency comparison

Choosing a stack by budget

Data collection shift checklist

Technical sources checked

Conclusion: teleoperation is the first data moat

Related Posts

Nguyễn Anh Tuấn

Related Posts

Open vs Closed: License, Data Moat Và Tương Lai 2027

Data Strategy: Team Nhỏ Nên Thu Thập Dữ Liệu Gì?

Human Video Mining: Khai Thác Video Người Cho Robot

Series roadmap

What teleoperation means in robot learning

AgiBot World: scaling through robots, operators, and QA

Unitree xr_teleoperate: humanoid teleop with Quest 3/PICO/AVP

ALOHA 2: the lab standard for bimanual manipulation

Mobile ALOHA: adding a base for whole-body manipulation

GELLO: the cheapest serious way to start

LeRobotDataset or HDF5?

Cost, throughput, and latency comparison

Choosing a stack by budget

Data collection shift checklist

Technical sources checked

Conclusion: teleoperation is the first data moat

Related Posts

Nguyễn Anh Tuấn

Related Posts

Open vs Closed: License, Data Moat Và Tương Lai 2027

Data Strategy: Team Nhỏ Nên Thu Thập Dữ Liệu Gì?

Human Video Mining: Khai Thác Video Người Cho Robot

Unitree `xr_teleoperate`: humanoid teleop with Quest 3/PICO/AVP

Unitree `xr_teleoperate`: humanoid teleop with Quest 3/PICO/AVP