VLA + WBC repos from China: Unitree, THU RDT-1B, and the open community

This is post 3 of the VLA + WBC repos landscape series. This post deep dives into repos from China — Unitree Robotics, Tsinghua University, RobotEra, and open benchmarks.

The biggest differentiator from the US group: immediate focus on hardware deployment — because these companies build both hardware and software. Unitree G1/H1 is the real-world platform many labs worldwide are using, so their repos have large actual user bases.

Unitree Robotics: full stack from sim to deploy

Unitree has 3 important repos solving 3 different problems in the pipeline.

unitree_rl_gym (~3.3k stars)

Repo: unitreerobotics/unitree_rl_gym

What it is: RL training environment in Isaac Gym for all Unitree robots (Go1, Go2, B2, H1, G1). Baseline locomotion policies for each platform.

This is where to start when you want to:

Train locomotion policy from scratch for Unitree robot
Customize gait (speed, terrain, style)
Transfer from sim to real (sim2real)

Pipeline:

Isaac Gym (sim) → PPO training → policy checkpoint
    ↓
Export to ONNX or TorchScript
    ↓
Deploy on robot (onboard computer)

Terrain curriculum in the repo:

Flat ground (basic)
Slope, stairs, rough terrain
Discrete obstacles

Getting started:

git clone https://github.com/unitreerobotics/unitree_rl_gym.git
cd unitree_rl_gym
pip install -r requirements.txt

# Train H1 locomotion
python legged_gym/scripts/train.py \
  --task=h1 --run_name=baseline_h1

# Play (visualize in sim)
python legged_gym/scripts/play.py \
  --task=h1 --run_name=baseline_h1

Note: Requires Isaac Gym from NVIDIA (register a NVIDIA developer account to download).

xr_teleoperate (~1.5k stars)

Repo: unitreerobotics/xr_teleoperate

What it is: Data collection and teleoperation for Unitree G1/H1 using Apple Vision Pro, Meta Quest 3, or Dexterous Gloves. Equivalent to TeleVision (Stanford) but optimized for Unitree hardware.

Differences from TeleVision:

Native SDK integration with Unitree G1/H1 (no ROS2 bridge needed)
Supports Dexterous Gloves → capture finger motion if robot has dexterous hands
Lower latency because of direct SDK

Supported devices:

Device	Coverage	Cost
Apple Vision Pro	Head + hands (no fingers)	~$3,500
Meta Quest 3	Hands + body (limited)	~$500
Dexterous Gloves (e.g., Manus)	Full finger data	~$5,000+

Data collection workflow:

1. Connect HMD/Gloves → xr_teleoperate daemon
2. Teleop robot through task (watch video stream while controlling)
3. Data automatically recorded in format: 
   {joint_positions, end_effector_poses, camera_frames, timestamps}
4. Convert to LeRobot format for training

unifolm-vla (~477 stars)

Repo: unitreerobotics/unifolm-vla

What it is: VLA model for Unitree humanoids (G1, H1) — pretrain + fine-tune pipeline. "Unifolm" = Unified Foundation Model for Loco-Manipulation.

Architecture:

Backbone: InternVL2 (CASIA + Shanghai AI Lab)
Action head: diffusion policy (continuous)
Frequency: ~10Hz (upper body), 200Hz (locomotion)
Control: upper body end-effector + locomotion velocity command

Important design point: unifolm-vla only outputs high-level commands — desired wrist poses and velocity command for locomotion. A low-level controller (running at 200Hz) handles joint commands and balance. Architecture mirrors GR00T-WBC from NVIDIA but built for Unitree hardware.

How to use:

git clone https://github.com/unitreerobotics/unifolm-vla.git
cd unifolm-vla

# Fine-tune with data from xr_teleoperate
python train.py \
  --model_name unifolm_base \
  --dataset_path /path/to/lerobot_dataset \
  --robot_type g1 \
  --output_dir ./finetuned

# Deploy (requires Unitree G1 or H1 + onboard compute)
python deploy.py \
  --checkpoint ./finetuned/checkpoint_best.pth \
  --robot_type g1

Note: Pretrained checkpoint not yet public at time of writing. Repo has code for training from scratch or fine-tuning from InternVL2 checkpoint.

The complete Unitree pipeline

The three Unitree repos form a closed pipeline:

[unitree_rl_gym]          → locomotion baseline (sim)
        ↓
[xr_teleoperate]           → collect manipulation data (real)
        ↓
[unifolm-vla]              → VLA fine-tune + deploy

If you have a Unitree G1 or H1, this is the shortest path from zero to a working whole-body VLA policy. No cross-embodiment fine-tuning needed unlike openpi or GR00T which are designed for many robots.

Tsinghua University (THU): RDT-1B (~1.7k stars)

Repo: thu-ml/RoboticsDiffusionTransformer

What it is: RDT-1B — foundation model for bimanual manipulation using diffusion transformer. Currently one of the strongest bimanual models (LIBERO, RLBench benchmarks).

Architecture:

Backbone: DiT (Diffusion Transformer, 1B params)
Inputs: stereo cameras + wrist cameras + language
Action: diffusion denoising → 7-DoF joint actions (×2 arms)
Frequency: ~25Hz

Why it differs from openpi/GR00T: RDT-1B is more narrowly focused — bimanual manipulation only, no locomotion. But in this domain it outperforms many foundation models, especially for tasks requiring precise two-arm coordination.

Training datasets:

Bridge V2, DROID, Open X-Embodiment
LIBERO benchmark suite
Custom bimanual teleoperation data (ALOHA platform)

Getting started:

git clone https://github.com/thu-ml/RoboticsDiffusionTransformer.git
cd RoboticsDiffusionTransformer
pip install -r requirements.txt

# Download pretrained checkpoint
python scripts/download_model.py --model rdt-1b

# Inference
python inference.py \
  --model_path checkpoints/rdt-1b \
  --image path/to/obs.jpg \
  --instruction "fold the cloth in half"

Fine-tuning: repo has fine-tuning script with custom data. Format: HDF5 or RLDS.

Paper: RDT-1B: A Diffusion Foundation Model for Bimanual Manipulation (2024)

RobotEra: humanoid-gym (~2k stars)

Repo: roboterax/humanoid-gym

What it is: RL training framework for humanoids (based on legged_gym, similar to unitree_rl_gym) but more focused on sim2real transfer with RobotEra's XBot-L robot.

Key differences:

XBot-L specific (doesn't support as many robots as unitree_rl_gym)
Heavier focus on sim2real gap: domain randomization, actuator modeling
Detailed tutorials on tuning for real robot

Use when: you want to learn sim2real transfer techniques or are using XBot-L.

Benchmark: humanoid-bench (~772 stars)

Repo: carlosferrazza/humanoid-bench

What it is: Benchmark suite for humanoid robot tasks — standard to evaluate performance and compare methods.

Tasks in benchmark:

Stand, walk, run (locomotion only)
Reach, grasp, place (manipulation only)
Walk-then-grasp (loco-manipulation)
Door open, drawer open (long-horizon)

Available humanoid models: Unitree H1, G1, Agility Digit, custom models.

How to use:

git clone https://github.com/carlosferrazza/humanoid-bench.git
cd humanoid-bench
pip install -e .

# Evaluate policy
python evaluate.py \
  --task "walk_and_grasp" \
  --policy path/to/your/policy \
  --num_episodes 100

Important: if you train a new policy, run humanoid-bench to get benchmark numbers for comparison with papers. Very hard to publish without benchmark numbers.

Papers without repos: LeVERB and WoCoCo

Some important works currently have no public repo:

LeVERB

What: Language-conditioned Whole-Body Control from Videos
When: Paper April 2026
Approach: Learning WBC policy from internet video (no robot demos needed)
Status: No public code yet — waiting

WoCoCo

What: Learning Whole-Body Humanoid Control with Sequential Contacts (CoRL 2024)
Where: Peking University
Approach: RL with contact sequence planning
Status: No public repo

ExBody2 (Tsinghua)

What: Extended body imitation learning (full body + expression)
Status: Paper + code in preparation for release

Summary comparison — Chinese group + open benchmarks

Repo	Problem	Barrier	Notes
unitree_rl_gym	Locomotion sim	Low	Needs Isaac Gym
xr_teleoperate	Data collection	Medium	Needs HMD or gloves
unifolm-vla	VLA for Unitree	High	Needs Unitree G1/H1
RDT-1B	Bimanual VLA	Low	No humanoid required
humanoid-gym	Sim training	Low	XBot-L focused
humanoid-bench	Evaluation	Low	No real robot needed

US vs China comparison

US group (NVIDIA, Physical Intelligence, Berkeley) excels at:

Foundation model quality and generalization
Research novelty (EgoHumanoid, HumanPlus)
Open culture (paper + code + dataset released together)

Chinese group (Unitree, THU) excels at:

Hardware-software co-design (Unitree builds both robot and software)
Cost-effective hardware (G1 is 5-10x cheaper than GR1, Figure)
Large-scale deployment (Unitree ships more robots than anyone else)

Reality: many US labs are using Unitree G1/H1 as their platform and fine-tuning openpi or GR00T on top of it — combining Chinese hardware + US software.

Series wrap-up

Across these 3 posts, you have a complete picture:

Post 1: Taxonomy, how to choose starting point by hardware
Post 2: US group — strong foundation models, research-first
This post: Chinese group — deployment-first, hardware-integrated stack

Final recommendation: if you want to start immediately with realistic budget:

Hardware: Unitree G1 (cheapest full humanoid)
Data collection: xr_teleoperate with Meta Quest 3
VLA: fine-tune openpi (best generalization) or unifolm-vla (Unitree-native)
WBC: GR00T-WholeBodyControl or unitree_rl_gym locomotion + custom arm control

References

VLA + WBC repos from China: Unitree, THU RDT-1B, and the open community

This is post 3 of the VLA + WBC repos landscape series. This post deep dives into repos from China — Unitree Robotics, Tsinghua University, RobotEra, and open benchmarks.

Unitree Robotics: full stack from sim to deploy

Unitree has 3 important repos solving 3 different problems in the pipeline.

unitree_rl_gym (~3.3k stars)

Repo: unitreerobotics/unitree_rl_gym

What it is: RL training environment in Isaac Gym for all Unitree robots (Go1, Go2, B2, H1, G1). Baseline locomotion policies for each platform.

This is where to start when you want to:

Train locomotion policy from scratch for Unitree robot
Customize gait (speed, terrain, style)
Transfer from sim to real (sim2real)

Pipeline:

Isaac Gym (sim) → PPO training → policy checkpoint
    ↓
Export to ONNX or TorchScript
    ↓
Deploy on robot (onboard computer)

Terrain curriculum in the repo:

Flat ground (basic)
Slope, stairs, rough terrain
Discrete obstacles

Getting started:

git clone https://github.com/unitreerobotics/unitree_rl_gym.git
cd unitree_rl_gym
pip install -r requirements.txt

# Train H1 locomotion
python legged_gym/scripts/train.py \
  --task=h1 --run_name=baseline_h1

# Play (visualize in sim)
python legged_gym/scripts/play.py \
  --task=h1 --run_name=baseline_h1

Note: Requires Isaac Gym from NVIDIA (register a NVIDIA developer account to download).

xr_teleoperate (~1.5k stars)

Repo: unitreerobotics/xr_teleoperate

What it is: Data collection and teleoperation for Unitree G1/H1 using Apple Vision Pro, Meta Quest 3, or Dexterous Gloves. Equivalent to TeleVision (Stanford) but optimized for Unitree hardware.

Differences from TeleVision:

Native SDK integration with Unitree G1/H1 (no ROS2 bridge needed)
Supports Dexterous Gloves → capture finger motion if robot has dexterous hands
Lower latency because of direct SDK

Supported devices:

Device	Coverage	Cost
Apple Vision Pro	Head + hands (no fingers)	~$3,500
Meta Quest 3	Hands + body (limited)	~$500
Dexterous Gloves (e.g., Manus)	Full finger data	~$5,000+

Data collection workflow:

1. Connect HMD/Gloves → xr_teleoperate daemon
2. Teleop robot through task (watch video stream while controlling)
3. Data automatically recorded in format: 
   {joint_positions, end_effector_poses, camera_frames, timestamps}
4. Convert to LeRobot format for training

unifolm-vla (~477 stars)

Repo: unitreerobotics/unifolm-vla

What it is: VLA model for Unitree humanoids (G1, H1) — pretrain + fine-tune pipeline. "Unifolm" = Unified Foundation Model for Loco-Manipulation.

Architecture:

Backbone: InternVL2 (CASIA + Shanghai AI Lab)
Action head: diffusion policy (continuous)
Frequency: ~10Hz (upper body), 200Hz (locomotion)
Control: upper body end-effector + locomotion velocity command

How to use:

git clone https://github.com/unitreerobotics/unifolm-vla.git
cd unifolm-vla

# Fine-tune with data from xr_teleoperate
python train.py \
  --model_name unifolm_base \
  --dataset_path /path/to/lerobot_dataset \
  --robot_type g1 \
  --output_dir ./finetuned

# Deploy (requires Unitree G1 or H1 + onboard compute)
python deploy.py \
  --checkpoint ./finetuned/checkpoint_best.pth \
  --robot_type g1

Note: Pretrained checkpoint not yet public at time of writing. Repo has code for training from scratch or fine-tuning from InternVL2 checkpoint.

The complete Unitree pipeline

The three Unitree repos form a closed pipeline:

[unitree_rl_gym]          → locomotion baseline (sim)
        ↓
[xr_teleoperate]           → collect manipulation data (real)
        ↓
[unifolm-vla]              → VLA fine-tune + deploy

Tsinghua University (THU): RDT-1B (~1.7k stars)

Repo: thu-ml/RoboticsDiffusionTransformer

What it is: RDT-1B — foundation model for bimanual manipulation using diffusion transformer. Currently one of the strongest bimanual models (LIBERO, RLBench benchmarks).

Architecture:

Backbone: DiT (Diffusion Transformer, 1B params)
Inputs: stereo cameras + wrist cameras + language
Action: diffusion denoising → 7-DoF joint actions (×2 arms)
Frequency: ~25Hz

Training datasets:

Bridge V2, DROID, Open X-Embodiment
LIBERO benchmark suite
Custom bimanual teleoperation data (ALOHA platform)

Getting started:

git clone https://github.com/thu-ml/RoboticsDiffusionTransformer.git
cd RoboticsDiffusionTransformer
pip install -r requirements.txt

# Download pretrained checkpoint
python scripts/download_model.py --model rdt-1b

# Inference
python inference.py \
  --model_path checkpoints/rdt-1b \
  --image path/to/obs.jpg \
  --instruction "fold the cloth in half"

Fine-tuning: repo has fine-tuning script with custom data. Format: HDF5 or RLDS.

Paper: RDT-1B: A Diffusion Foundation Model for Bimanual Manipulation (2024)

RobotEra: humanoid-gym (~2k stars)

Repo: roboterax/humanoid-gym

What it is: RL training framework for humanoids (based on legged_gym, similar to unitree_rl_gym) but more focused on sim2real transfer with RobotEra's XBot-L robot.

Key differences:

XBot-L specific (doesn't support as many robots as unitree_rl_gym)
Heavier focus on sim2real gap: domain randomization, actuator modeling
Detailed tutorials on tuning for real robot

Use when: you want to learn sim2real transfer techniques or are using XBot-L.

Benchmark: humanoid-bench (~772 stars)

Repo: carlosferrazza/humanoid-bench

What it is: Benchmark suite for humanoid robot tasks — standard to evaluate performance and compare methods.

Tasks in benchmark:

Stand, walk, run (locomotion only)
Reach, grasp, place (manipulation only)
Walk-then-grasp (loco-manipulation)
Door open, drawer open (long-horizon)

Available humanoid models: Unitree H1, G1, Agility Digit, custom models.

How to use:

git clone https://github.com/carlosferrazza/humanoid-bench.git
cd humanoid-bench
pip install -e .

# Evaluate policy
python evaluate.py \
  --task "walk_and_grasp" \
  --policy path/to/your/policy \
  --num_episodes 100

Important: if you train a new policy, run humanoid-bench to get benchmark numbers for comparison with papers. Very hard to publish without benchmark numbers.

Papers without repos: LeVERB and WoCoCo

Some important works currently have no public repo:

LeVERB

What: Language-conditioned Whole-Body Control from Videos
When: Paper April 2026
Approach: Learning WBC policy from internet video (no robot demos needed)
Status: No public code yet — waiting

WoCoCo

What: Learning Whole-Body Humanoid Control with Sequential Contacts (CoRL 2024)
Where: Peking University
Approach: RL with contact sequence planning
Status: No public repo

ExBody2 (Tsinghua)

What: Extended body imitation learning (full body + expression)
Status: Paper + code in preparation for release

Summary comparison — Chinese group + open benchmarks

Repo	Problem	Barrier	Notes
unitree_rl_gym	Locomotion sim	Low	Needs Isaac Gym
xr_teleoperate	Data collection	Medium	Needs HMD or gloves
unifolm-vla	VLA for Unitree	High	Needs Unitree G1/H1
RDT-1B	Bimanual VLA	Low	No humanoid required
humanoid-gym	Sim training	Low	XBot-L focused
humanoid-bench	Evaluation	Low	No real robot needed

US vs China comparison

US group (NVIDIA, Physical Intelligence, Berkeley) excels at:

Foundation model quality and generalization
Research novelty (EgoHumanoid, HumanPlus)
Open culture (paper + code + dataset released together)

Chinese group (Unitree, THU) excels at:

Hardware-software co-design (Unitree builds both robot and software)
Cost-effective hardware (G1 is 5-10x cheaper than GR1, Figure)
Large-scale deployment (Unitree ships more robots than anyone else)

Reality: many US labs are using Unitree G1/H1 as their platform and fine-tuning openpi or GR00T on top of it — combining Chinese hardware + US software.

Series wrap-up

Across these 3 posts, you have a complete picture:

Post 1: Taxonomy, how to choose starting point by hardware
Post 2: US group — strong foundation models, research-first
This post: Chinese group — deployment-first, hardware-integrated stack

Final recommendation: if you want to start immediately with realistic budget:

Hardware: Unitree G1 (cheapest full humanoid)
Data collection: xr_teleoperate with Meta Quest 3
VLA: fine-tune openpi (best generalization) or unifolm-vla (Unitree-native)
WBC: GR00T-WholeBodyControl or unitree_rl_gym locomotion + custom arm control

VLA + WBC repos from China: Unitree, THU RDT-1B, and the open community

Unitree Robotics: full stack from sim to deploy

unitree_rl_gym (~3.3k stars)

xr_teleoperate (~1.5k stars)

unifolm-vla (~477 stars)

The complete Unitree pipeline

Tsinghua University (THU): RDT-1B (~1.7k stars)

RobotEra: humanoid-gym (~2k stars)

Benchmark: humanoid-bench (~772 stars)

Papers without repos: LeVERB and WoCoCo

LeVERB

WoCoCo

ExBody2 (Tsinghua)

Summary comparison — Chinese group + open benchmarks

US vs China comparison

Series wrap-up

References

Related posts

Nguyễn Anh Tuấn

Related Posts

Bản đồ repos VLA + WBC 2025-2026: tổng quan các GitHub repos humanoid

VLA + WBC repos từ Mỹ: NVIDIA GR00T, openpi, HumanPlus, TeleVision

unifolm-vla + Unitree G1 (Bài 5): deploy inference server, SSH tunnel, và locomotion song song

VLA + WBC repos from China: Unitree, THU RDT-1B, and the open community

Unitree Robotics: full stack from sim to deploy

unitree_rl_gym (~3.3k stars)

xr_teleoperate (~1.5k stars)

unifolm-vla (~477 stars)

The complete Unitree pipeline

Tsinghua University (THU): RDT-1B (~1.7k stars)

RobotEra: humanoid-gym (~2k stars)

Benchmark: humanoid-bench (~772 stars)

Papers without repos: LeVERB and WoCoCo

LeVERB

WoCoCo

ExBody2 (Tsinghua)

Summary comparison — Chinese group + open benchmarks

US vs China comparison

Series wrap-up

References

Related posts

Nguyễn Anh Tuấn

Related Posts

Bản đồ repos VLA + WBC 2025-2026: tổng quan các GitHub repos humanoid

VLA + WBC repos từ Mỹ: NVIDIA GR00T, openpi, HumanPlus, TeleVision

unifolm-vla + Unitree G1 (Bài 5): deploy inference server, SSH tunnel, và locomotion song song