humanoidhumanoidvlawhole-bodyunitreerdt-1bresearchchinaopen-source

VLA + WBC repos from China: Unitree, THU RDT-1B, and the open community

Deep dive into VLA and Whole-Body Control GitHub repos from China and the open community — Unitree (unifolm-vla, xr_teleoperate, unitree_rl_gym), THU RDT-1B, RobotEra humanoid-gym, and benchmark tools.

Nguyễn Anh TuấnJune 6, 20268 min read
VLA + WBC repos from China: Unitree, THU RDT-1B, and the open community

VLA + WBC repos from China: Unitree, THU RDT-1B, and the open community

This is post 3 of the VLA + WBC repos landscape series. This post deep dives into repos from China — Unitree Robotics, Tsinghua University, RobotEra, and open benchmarks.

The biggest differentiator from the US group: immediate focus on hardware deployment — because these companies build both hardware and software. Unitree G1/H1 is the real-world platform many labs worldwide are using, so their repos have large actual user bases.

Unitree Robotics: full stack from sim to deploy

Unitree has 3 important repos solving 3 different problems in the pipeline.

unitree_rl_gym (~3.3k stars)

Repo: unitreerobotics/unitree_rl_gym

What it is: RL training environment in Isaac Gym for all Unitree robots (Go1, Go2, B2, H1, G1). Baseline locomotion policies for each platform.

This is where to start when you want to:

  • Train locomotion policy from scratch for Unitree robot
  • Customize gait (speed, terrain, style)
  • Transfer from sim to real (sim2real)

Pipeline:

Isaac Gym (sim) → PPO training → policy checkpoint
    ↓
Export to ONNX or TorchScript
    ↓
Deploy on robot (onboard computer)

Terrain curriculum in the repo:

  • Flat ground (basic)
  • Slope, stairs, rough terrain
  • Discrete obstacles

Getting started:

git clone https://github.com/unitreerobotics/unitree_rl_gym.git
cd unitree_rl_gym
pip install -r requirements.txt

# Train H1 locomotion
python legged_gym/scripts/train.py \
  --task=h1 --run_name=baseline_h1

# Play (visualize in sim)
python legged_gym/scripts/play.py \
  --task=h1 --run_name=baseline_h1

Note: Requires Isaac Gym from NVIDIA (register a NVIDIA developer account to download).


xr_teleoperate (~1.5k stars)

Repo: unitreerobotics/xr_teleoperate

What it is: Data collection and teleoperation for Unitree G1/H1 using Apple Vision Pro, Meta Quest 3, or Dexterous Gloves. Equivalent to TeleVision (Stanford) but optimized for Unitree hardware.

Differences from TeleVision:

  • Native SDK integration with Unitree G1/H1 (no ROS2 bridge needed)
  • Supports Dexterous Gloves → capture finger motion if robot has dexterous hands
  • Lower latency because of direct SDK

Supported devices:

Device Coverage Cost
Apple Vision Pro Head + hands (no fingers) ~$3,500
Meta Quest 3 Hands + body (limited) ~$500
Dexterous Gloves (e.g., Manus) Full finger data ~$5,000+

Data collection workflow:

1. Connect HMD/Gloves → xr_teleoperate daemon
2. Teleop robot through task (watch video stream while controlling)
3. Data automatically recorded in format: 
   {joint_positions, end_effector_poses, camera_frames, timestamps}
4. Convert to LeRobot format for training

unifolm-vla (~477 stars)

Repo: unitreerobotics/unifolm-vla

What it is: VLA model for Unitree humanoids (G1, H1) — pretrain + fine-tune pipeline. "Unifolm" = Unified Foundation Model for Loco-Manipulation.

Architecture:

Backbone: InternVL2 (CASIA + Shanghai AI Lab)
Action head: diffusion policy (continuous)
Frequency: ~10Hz (upper body), 200Hz (locomotion)
Control: upper body end-effector + locomotion velocity command

Important design point: unifolm-vla only outputs high-level commands — desired wrist poses and velocity command for locomotion. A low-level controller (running at 200Hz) handles joint commands and balance. Architecture mirrors GR00T-WBC from NVIDIA but built for Unitree hardware.

How to use:

git clone https://github.com/unitreerobotics/unifolm-vla.git
cd unifolm-vla

# Fine-tune with data from xr_teleoperate
python train.py \
  --model_name unifolm_base \
  --dataset_path /path/to/lerobot_dataset \
  --robot_type g1 \
  --output_dir ./finetuned

# Deploy (requires Unitree G1 or H1 + onboard compute)
python deploy.py \
  --checkpoint ./finetuned/checkpoint_best.pth \
  --robot_type g1

Note: Pretrained checkpoint not yet public at time of writing. Repo has code for training from scratch or fine-tuning from InternVL2 checkpoint.


The complete Unitree pipeline

The three Unitree repos form a closed pipeline:

[unitree_rl_gym]          → locomotion baseline (sim)
        ↓
[xr_teleoperate]           → collect manipulation data (real)
        ↓
[unifolm-vla]              → VLA fine-tune + deploy

If you have a Unitree G1 or H1, this is the shortest path from zero to a working whole-body VLA policy. No cross-embodiment fine-tuning needed unlike openpi or GR00T which are designed for many robots.


Tsinghua University (THU): RDT-1B (~1.7k stars)

Repo: thu-ml/RoboticsDiffusionTransformer

What it is: RDT-1B — foundation model for bimanual manipulation using diffusion transformer. Currently one of the strongest bimanual models (LIBERO, RLBench benchmarks).

Architecture:

Backbone: DiT (Diffusion Transformer, 1B params)
Inputs: stereo cameras + wrist cameras + language
Action: diffusion denoising → 7-DoF joint actions (×2 arms)
Frequency: ~25Hz

Why it differs from openpi/GR00T: RDT-1B is more narrowly focused — bimanual manipulation only, no locomotion. But in this domain it outperforms many foundation models, especially for tasks requiring precise two-arm coordination.

Training datasets:

  • Bridge V2, DROID, Open X-Embodiment
  • LIBERO benchmark suite
  • Custom bimanual teleoperation data (ALOHA platform)

Getting started:

git clone https://github.com/thu-ml/RoboticsDiffusionTransformer.git
cd RoboticsDiffusionTransformer
pip install -r requirements.txt

# Download pretrained checkpoint
python scripts/download_model.py --model rdt-1b

# Inference
python inference.py \
  --model_path checkpoints/rdt-1b \
  --image path/to/obs.jpg \
  --instruction "fold the cloth in half"

Fine-tuning: repo has fine-tuning script with custom data. Format: HDF5 or RLDS.

Paper: RDT-1B: A Diffusion Foundation Model for Bimanual Manipulation (2024)


RobotEra: humanoid-gym (~2k stars)

Repo: roboterax/humanoid-gym

What it is: RL training framework for humanoids (based on legged_gym, similar to unitree_rl_gym) but more focused on sim2real transfer with RobotEra's XBot-L robot.

Key differences:

  • XBot-L specific (doesn't support as many robots as unitree_rl_gym)
  • Heavier focus on sim2real gap: domain randomization, actuator modeling
  • Detailed tutorials on tuning for real robot

Use when: you want to learn sim2real transfer techniques or are using XBot-L.


Benchmark: humanoid-bench (~772 stars)

Repo: carlosferrazza/humanoid-bench

What it is: Benchmark suite for humanoid robot tasks — standard to evaluate performance and compare methods.

Tasks in benchmark:

  • Stand, walk, run (locomotion only)
  • Reach, grasp, place (manipulation only)
  • Walk-then-grasp (loco-manipulation)
  • Door open, drawer open (long-horizon)

Available humanoid models: Unitree H1, G1, Agility Digit, custom models.

How to use:

git clone https://github.com/carlosferrazza/humanoid-bench.git
cd humanoid-bench
pip install -e .

# Evaluate policy
python evaluate.py \
  --task "walk_and_grasp" \
  --policy path/to/your/policy \
  --num_episodes 100

Important: if you train a new policy, run humanoid-bench to get benchmark numbers for comparison with papers. Very hard to publish without benchmark numbers.


Papers without repos: LeVERB and WoCoCo

Some important works currently have no public repo:

LeVERB

  • What: Language-conditioned Whole-Body Control from Videos
  • When: Paper April 2026
  • Approach: Learning WBC policy from internet video (no robot demos needed)
  • Status: No public code yet — waiting

WoCoCo

  • What: Learning Whole-Body Humanoid Control with Sequential Contacts (CoRL 2024)
  • Where: Peking University
  • Approach: RL with contact sequence planning
  • Status: No public repo

ExBody2 (Tsinghua)

  • What: Extended body imitation learning (full body + expression)
  • Status: Paper + code in preparation for release

Summary comparison — Chinese group + open benchmarks

Repo Problem Barrier Notes
unitree_rl_gym Locomotion sim Low Needs Isaac Gym
xr_teleoperate Data collection Medium Needs HMD or gloves
unifolm-vla VLA for Unitree High Needs Unitree G1/H1
RDT-1B Bimanual VLA Low No humanoid required
humanoid-gym Sim training Low XBot-L focused
humanoid-bench Evaluation Low No real robot needed

US vs China comparison

US group (NVIDIA, Physical Intelligence, Berkeley) excels at:

  • Foundation model quality and generalization
  • Research novelty (EgoHumanoid, HumanPlus)
  • Open culture (paper + code + dataset released together)

Chinese group (Unitree, THU) excels at:

  • Hardware-software co-design (Unitree builds both robot and software)
  • Cost-effective hardware (G1 is 5-10x cheaper than GR1, Figure)
  • Large-scale deployment (Unitree ships more robots than anyone else)

Reality: many US labs are using Unitree G1/H1 as their platform and fine-tuning openpi or GR00T on top of it — combining Chinese hardware + US software.


Series wrap-up

Across these 3 posts, you have a complete picture:

  1. Post 1: Taxonomy, how to choose starting point by hardware
  2. Post 2: US group — strong foundation models, research-first
  3. This post: Chinese group — deployment-first, hardware-integrated stack

Final recommendation: if you want to start immediately with realistic budget:

  • Hardware: Unitree G1 (cheapest full humanoid)
  • Data collection: xr_teleoperate with Meta Quest 3
  • VLA: fine-tune openpi (best generalization) or unifolm-vla (Unitree-native)
  • WBC: GR00T-WholeBodyControl or unitree_rl_gym locomotion + custom arm control

References


NT

Nguyễn Anh Tuấn

Robotics & AI Engineer. Building VnRobo — sharing knowledge about robot learning, VLA models, and automation.

Khám phá VnRobo

Related Posts

Bản đồ repos VLA + WBC 2025-2026: tổng quan các GitHub repos humanoid
humanoid

Bản đồ repos VLA + WBC 2025-2026: tổng quan các GitHub repos humanoid

6/4/20266 min read
NT
VLA + WBC repos từ Mỹ: NVIDIA GR00T, openpi, HumanPlus, TeleVision
humanoid

VLA + WBC repos từ Mỹ: NVIDIA GR00T, openpi, HumanPlus, TeleVision

6/5/20268 min read
NT
Whole-body VLA: kết hợp UMI + mocap/VR để thu data toàn thân
humanoid

Whole-body VLA: kết hợp UMI + mocap/VR để thu data toàn thân

6/6/20268 min read
NT