VLA + WBC repos from China: Unitree, THU RDT-1B, and the open community
This is post 3 of the VLA + WBC repos landscape series. This post deep dives into repos from China — Unitree Robotics, Tsinghua University, RobotEra, and open benchmarks.
The biggest differentiator from the US group: immediate focus on hardware deployment — because these companies build both hardware and software. Unitree G1/H1 is the real-world platform many labs worldwide are using, so their repos have large actual user bases.
Unitree Robotics: full stack from sim to deploy
Unitree has 3 important repos solving 3 different problems in the pipeline.
unitree_rl_gym (~3.3k stars)
Repo: unitreerobotics/unitree_rl_gym
What it is: RL training environment in Isaac Gym for all Unitree robots (Go1, Go2, B2, H1, G1). Baseline locomotion policies for each platform.
This is where to start when you want to:
- Train locomotion policy from scratch for Unitree robot
- Customize gait (speed, terrain, style)
- Transfer from sim to real (sim2real)
Pipeline:
Isaac Gym (sim) → PPO training → policy checkpoint
↓
Export to ONNX or TorchScript
↓
Deploy on robot (onboard computer)
Terrain curriculum in the repo:
- Flat ground (basic)
- Slope, stairs, rough terrain
- Discrete obstacles
Getting started:
git clone https://github.com/unitreerobotics/unitree_rl_gym.git
cd unitree_rl_gym
pip install -r requirements.txt
# Train H1 locomotion
python legged_gym/scripts/train.py \
--task=h1 --run_name=baseline_h1
# Play (visualize in sim)
python legged_gym/scripts/play.py \
--task=h1 --run_name=baseline_h1
Note: Requires Isaac Gym from NVIDIA (register a NVIDIA developer account to download).
xr_teleoperate (~1.5k stars)
Repo: unitreerobotics/xr_teleoperate
What it is: Data collection and teleoperation for Unitree G1/H1 using Apple Vision Pro, Meta Quest 3, or Dexterous Gloves. Equivalent to TeleVision (Stanford) but optimized for Unitree hardware.
Differences from TeleVision:
- Native SDK integration with Unitree G1/H1 (no ROS2 bridge needed)
- Supports Dexterous Gloves → capture finger motion if robot has dexterous hands
- Lower latency because of direct SDK
Supported devices:
| Device | Coverage | Cost |
|---|---|---|
| Apple Vision Pro | Head + hands (no fingers) | ~$3,500 |
| Meta Quest 3 | Hands + body (limited) | ~$500 |
| Dexterous Gloves (e.g., Manus) | Full finger data | ~$5,000+ |
Data collection workflow:
1. Connect HMD/Gloves → xr_teleoperate daemon
2. Teleop robot through task (watch video stream while controlling)
3. Data automatically recorded in format:
{joint_positions, end_effector_poses, camera_frames, timestamps}
4. Convert to LeRobot format for training
unifolm-vla (~477 stars)
Repo: unitreerobotics/unifolm-vla
What it is: VLA model for Unitree humanoids (G1, H1) — pretrain + fine-tune pipeline. "Unifolm" = Unified Foundation Model for Loco-Manipulation.
Architecture:
Backbone: InternVL2 (CASIA + Shanghai AI Lab)
Action head: diffusion policy (continuous)
Frequency: ~10Hz (upper body), 200Hz (locomotion)
Control: upper body end-effector + locomotion velocity command
Important design point: unifolm-vla only outputs high-level commands — desired wrist poses and velocity command for locomotion. A low-level controller (running at 200Hz) handles joint commands and balance. Architecture mirrors GR00T-WBC from NVIDIA but built for Unitree hardware.
How to use:
git clone https://github.com/unitreerobotics/unifolm-vla.git
cd unifolm-vla
# Fine-tune with data from xr_teleoperate
python train.py \
--model_name unifolm_base \
--dataset_path /path/to/lerobot_dataset \
--robot_type g1 \
--output_dir ./finetuned
# Deploy (requires Unitree G1 or H1 + onboard compute)
python deploy.py \
--checkpoint ./finetuned/checkpoint_best.pth \
--robot_type g1
Note: Pretrained checkpoint not yet public at time of writing. Repo has code for training from scratch or fine-tuning from InternVL2 checkpoint.
The complete Unitree pipeline
The three Unitree repos form a closed pipeline:
[unitree_rl_gym] → locomotion baseline (sim)
↓
[xr_teleoperate] → collect manipulation data (real)
↓
[unifolm-vla] → VLA fine-tune + deploy
If you have a Unitree G1 or H1, this is the shortest path from zero to a working whole-body VLA policy. No cross-embodiment fine-tuning needed unlike openpi or GR00T which are designed for many robots.
Tsinghua University (THU): RDT-1B (~1.7k stars)
Repo: thu-ml/RoboticsDiffusionTransformer
What it is: RDT-1B — foundation model for bimanual manipulation using diffusion transformer. Currently one of the strongest bimanual models (LIBERO, RLBench benchmarks).
Architecture:
Backbone: DiT (Diffusion Transformer, 1B params)
Inputs: stereo cameras + wrist cameras + language
Action: diffusion denoising → 7-DoF joint actions (×2 arms)
Frequency: ~25Hz
Why it differs from openpi/GR00T: RDT-1B is more narrowly focused — bimanual manipulation only, no locomotion. But in this domain it outperforms many foundation models, especially for tasks requiring precise two-arm coordination.
Training datasets:
- Bridge V2, DROID, Open X-Embodiment
- LIBERO benchmark suite
- Custom bimanual teleoperation data (ALOHA platform)
Getting started:
git clone https://github.com/thu-ml/RoboticsDiffusionTransformer.git
cd RoboticsDiffusionTransformer
pip install -r requirements.txt
# Download pretrained checkpoint
python scripts/download_model.py --model rdt-1b
# Inference
python inference.py \
--model_path checkpoints/rdt-1b \
--image path/to/obs.jpg \
--instruction "fold the cloth in half"
Fine-tuning: repo has fine-tuning script with custom data. Format: HDF5 or RLDS.
Paper: RDT-1B: A Diffusion Foundation Model for Bimanual Manipulation (2024)
RobotEra: humanoid-gym (~2k stars)
Repo: roboterax/humanoid-gym
What it is: RL training framework for humanoids (based on legged_gym, similar to unitree_rl_gym) but more focused on sim2real transfer with RobotEra's XBot-L robot.
Key differences:
- XBot-L specific (doesn't support as many robots as unitree_rl_gym)
- Heavier focus on sim2real gap: domain randomization, actuator modeling
- Detailed tutorials on tuning for real robot
Use when: you want to learn sim2real transfer techniques or are using XBot-L.
Benchmark: humanoid-bench (~772 stars)
Repo: carlosferrazza/humanoid-bench
What it is: Benchmark suite for humanoid robot tasks — standard to evaluate performance and compare methods.
Tasks in benchmark:
- Stand, walk, run (locomotion only)
- Reach, grasp, place (manipulation only)
- Walk-then-grasp (loco-manipulation)
- Door open, drawer open (long-horizon)
Available humanoid models: Unitree H1, G1, Agility Digit, custom models.
How to use:
git clone https://github.com/carlosferrazza/humanoid-bench.git
cd humanoid-bench
pip install -e .
# Evaluate policy
python evaluate.py \
--task "walk_and_grasp" \
--policy path/to/your/policy \
--num_episodes 100
Important: if you train a new policy, run humanoid-bench to get benchmark numbers for comparison with papers. Very hard to publish without benchmark numbers.
Papers without repos: LeVERB and WoCoCo
Some important works currently have no public repo:
LeVERB
- What: Language-conditioned Whole-Body Control from Videos
- When: Paper April 2026
- Approach: Learning WBC policy from internet video (no robot demos needed)
- Status: No public code yet — waiting
WoCoCo
- What: Learning Whole-Body Humanoid Control with Sequential Contacts (CoRL 2024)
- Where: Peking University
- Approach: RL with contact sequence planning
- Status: No public repo
ExBody2 (Tsinghua)
- What: Extended body imitation learning (full body + expression)
- Status: Paper + code in preparation for release
Summary comparison — Chinese group + open benchmarks
| Repo | Problem | Barrier | Notes |
|---|---|---|---|
| unitree_rl_gym | Locomotion sim | Low | Needs Isaac Gym |
| xr_teleoperate | Data collection | Medium | Needs HMD or gloves |
| unifolm-vla | VLA for Unitree | High | Needs Unitree G1/H1 |
| RDT-1B | Bimanual VLA | Low | No humanoid required |
| humanoid-gym | Sim training | Low | XBot-L focused |
| humanoid-bench | Evaluation | Low | No real robot needed |
US vs China comparison
US group (NVIDIA, Physical Intelligence, Berkeley) excels at:
- Foundation model quality and generalization
- Research novelty (EgoHumanoid, HumanPlus)
- Open culture (paper + code + dataset released together)
Chinese group (Unitree, THU) excels at:
- Hardware-software co-design (Unitree builds both robot and software)
- Cost-effective hardware (G1 is 5-10x cheaper than GR1, Figure)
- Large-scale deployment (Unitree ships more robots than anyone else)
Reality: many US labs are using Unitree G1/H1 as their platform and fine-tuning openpi or GR00T on top of it — combining Chinese hardware + US software.
Series wrap-up
Across these 3 posts, you have a complete picture:
- Post 1: Taxonomy, how to choose starting point by hardware
- Post 2: US group — strong foundation models, research-first
- This post: Chinese group — deployment-first, hardware-integrated stack
Final recommendation: if you want to start immediately with realistic budget:
- Hardware: Unitree G1 (cheapest full humanoid)
- Data collection:
xr_teleoperatewith Meta Quest 3 - VLA: fine-tune
openpi(best generalization) orunifolm-vla(Unitree-native) - WBC:
GR00T-WholeBodyControlorunitree_rl_gymlocomotion + custom arm control
References
- RDT-1B (arxiv:2410.07864)
- unifolm-vla GitHub
- unitree_rl_gym GitHub
- humanoid-bench (arxiv:2312.03586)
- WoCoCo (CoRL 2024)