LeRobot v0.5: The Biggest Update Yet
After over 6 months of development with 200+ pull requests from 50+ contributors, LeRobot v0.5 has officially launched — and this is no minor update. It represents the most significant leap since Hugging Face first released LeRobot, bringing fundamental changes to the CLI architecture, new policies, dataset handling, and hardware support.
If you've been following our VLA-LeRobot series, you'll notice that many limitations we had to work around in v0.3/v0.4 have now been thoroughly addressed. Let's walk through each major feature.
Overview: Impressive Numbers
Before diving into details, here's the big picture:
- 200+ PRs merged from the community
- 50+ contributors worldwide
- 6 new policies including Pi0-FAST, SmolVLA, Wall-X, X-VLA, SARM, and HIL-SERL
- 5 new robots supported: Unitree G1 (first humanoid!), OpenArm, Earth Rover, SO-100, SO-101
- 10x training speed thanks to streaming video encoding
- Paper accepted at ICLR 2026 — confirming the research quality behind the framework
LeRobot v0.5 isn't just a software update — it's a clear signal that open-source robotics AI is entering its maturity phase.
New CLI: lerobot-train and lerobot-eval
Goodbye python -m lerobot.scripts.*
One of the first changes you'll notice is the completely new CLI. Instead of calling:
# Old way (deprecated)
python -m lerobot.scripts.train --policy.type=act --dataset.repo_id=lerobot/aloha_sim_insertion_human
You now use:
# New way in v0.5
lerobot-train --policy.type=act --dataset.repo_id=lerobot/aloha_sim_insertion_human
Similarly for evaluation:
# New way
lerobot-eval --policy.path=outputs/train/act_aloha_sim/checkpoints/last/pretrained_model
And data collection:
# Record dataset
lerobot-record --robot.type=so100 --repo_id=USER/my_dataset --num_episodes=50
Why the change? The primary reason is developer experience. The python -m commands were verbose, error-prone, and hard to remember. The new CLI is automatically installed when you pip install lerobot, working as a proper command-line tool with auto-completion and clear help messages.
New Argument Structure
Arguments are now organized by logical groups:
lerobot-train \
--policy.type=act \ # Policy configuration
--policy.chunk_size=100 \
--dataset.repo_id=lerobot/aloha_sim \ # Dataset configuration
--dataset.episodes=[0,1,2,3,4] \
--training.batch_size=64 \ # Training configuration
--training.steps=100000 \
--training.lr=1e-5
This structure is much clearer than the flat arguments in v0.3/v0.4.
New Policies: 6 Breakthrough Models
1. Pi0-FAST: Autoregressive VLA, 5x Faster
Pi0-FAST combines the PaliGemma architecture with the FAST action tokenizer — an entirely new approach to encoding robot actions. Instead of using flow matching (iterative denoising) like the original Pi0, Pi0-FAST tokenizes actions into discrete tokens and uses autoregressive decoding with KV-caching.
Result: 5x faster inference compared to diffusion-based Pi0, with equivalent performance on LIBERO benchmark (82.5 average success rate).
# Fine-tune Pi0-FAST
lerobot-train \
--policy.type=pi0_fast \
--policy.pretrained_path=lerobot/pi0_fast_base \
--dataset.repo_id=USER/my_dataset \
--training.steps=20000
2. SmolVLA: VLA for Consumer GPUs
SmolVLA solves the biggest problem with VLA models: they're too heavy for commodity hardware. With only 450M parameters (SmolVLM2-500M backbone + 100M Flow Matching action expert), SmolVLA runs on RTX 3060/4060 — GPUs anyone can afford.
The standout feature: SmolVLA is trained entirely on open-source data, achieving 78% real-world success rate on single-arm manipulation tasks.
# Fine-tune SmolVLA
lerobot-train \
--policy.path=lerobot/smolvla_base \
--dataset.repo_id=USER/my_dataset \
--training.batch_size=64 \
--training.steps=20000
3. Wall-X: Cross-embodiment Policy
Wall-X extends the concept of "train once, deploy everywhere" — a cross-embodiment policy that can transfer between SO-100, ALOHA, and Franka Panda without retraining from scratch.
4. X-VLA: Extended VLA Architecture
X-VLA extends the traditional VLA architecture with multi-modal reasoning, enabling the model to simultaneously process images, text instructions, and proprioceptive feedback.
5. SARM: Self-Adaptive Robot Manipulation
SARM focuses on adaptation — the ability to self-adjust behavior based on environment feedback without requiring human intervention.
6. HIL-SERL: Human-in-the-Loop RL
HIL-SERL combines reinforcement learning with human feedback, allowing robots to continuously improve through interaction with operators. This is a significant step forward for real-world deployment.
Real-Time Chunking: Faster, Smoother Responses
The Problem with Traditional Action Chunking
In our article about ACT, we discussed action chunking — predicting a chunk of multiple actions simultaneously rather than one at a time. However, there's a limitation: the robot must wait until the entire chunk is executed before receiving a new chunk. This creates latency and makes the robot slow to react to unexpected changes.
How Real-Time Chunking Solves This
Real-Time Chunking (RTC) allows the robot to continuously update its plan without waiting for the current chunk to finish. Specifically:
- The model still predicts a full action chunk (e.g., 10 actions)
- But only executes the first few actions before querying the model again
- New actions are blended with unexecuted old actions, creating smoother trajectories
Enabling RTC is simple:
lerobot-train \
--policy.type=pi0_fast \
--policy.rtc_config.enabled=true \
--policy.rtc_config.n_steps_warmup=5 \
--dataset.repo_id=USER/my_dataset
RTC is especially effective when combined with Pi0-FAST thanks to fast inference speed — the model can replan multiple times in the window where a diffusion-based model can only predict once.
PEFT/LoRA: Fine-tune Large VLAs Cheaply
Why PEFT?
Large VLA models like Pi0 (3B params) or OpenVLA (7B params) require multiple GPUs and lots of RAM to fully fine-tune. PEFT (Parameter-Efficient Fine-Tuning) lets you train only a very small subset of parameters, significantly reducing hardware requirements.
LoRA in LeRobot v0.5
LeRobot v0.5 integrates LoRA (Low-Rank Adaptation) directly into the training pipeline:
lerobot-train \
--policy.type=pi0_fast \
--policy.pretrained_path=lerobot/pi0_fast_base \
--policy.peft_config.use_peft=true \
--policy.peft_config.lora_r=16 \
--policy.peft_config.lora_alpha=32 \
--dataset.repo_id=USER/my_dataset \
--training.steps=10000
With LoRA rank 16, you only train about 0.5% of total parameters, reducing VRAM from 40GB down to 8-12GB — fitting comfortably on an RTX 4090 or even RTX 3090.
If you've read our article on VLA models, you'll see that PEFT/LoRA is the most practical solution for democratizing VLA fine-tuning across the community.
New Robots: From Single-Arm to Humanoid
Unitree G1: First Humanoid in LeRobot
This is perhaps the most anticipated feature: LeRobot now supports Unitree G1 — the first humanoid robot in the framework. You can:
- Collect teleop data for whole-body control
- Train VLA policies for manipulation tasks
- Run sim-to-real transfer with MJPC environments
# Record on Unitree G1
lerobot-record \
--robot.type=unitree_g1 \
--repo_id=USER/g1_dataset \
--num_episodes=20
OpenArm: Low-cost Open-Source Arm
OpenArm is a new affordable robot arm designed for education and research. Fully 3D-printable, using inexpensive servos, and fully compatible with the LeRobot pipeline.
Earth Rover: Mobile Robotics
Earth Rover extends LeRobot beyond manipulation — you can now use the same framework for mobile robotics, navigation, and exploration.
SO-100 and SO-101
SO-100 and SO-101 represent the evolution of the SO-100 arm, with improvements in accuracy, build quality, and better camera integration.
Dataset Improvements: 10x Training Speed
Streaming Video Encoding
This is a quiet but high-impact change for daily usage. Before v0.5, LeRobot stored images as individual PNG/JPEG files — each frame as a separate file. With a dataset of 1000 episodes at 30 FPS, you could have millions of small files, causing:
- Slow disk loading (I/O overhead)
- Wasted storage (PNG compresses poorly)
- Slow upload/download from Hugging Face Hub
V0.5 switches to streaming video encoding — all visual observations are encoded as video files (MP4/WebM) and decoded on-the-fly during training. Results:
- 10x data loading speed thanks to reduced I/O
- 5-8x storage reduction thanks to video compression
- Seamless streaming from Hugging Face Hub without downloading everything
Subtask Annotation
V0.5 allows annotating subtasks within each episode, helping models learn hierarchical behavior:
# Example annotation
{
"episode_0": {
"subtasks": [
{"start": 0, "end": 45, "label": "reach_object"},
{"start": 45, "end": 90, "label": "grasp_object"},
{"start": 90, "end": 150, "label": "place_object"}
]
}
}
This is especially useful for long-horizon tasks and multi-step manipulation.
EnvHub: Sim Environments from Hugging Face
EnvHub lets you load simulation environments directly from Hugging Face Hub, just like loading models and datasets:
# Load environment from Hub
lerobot-eval \
--policy.path=USER/my_policy \
--env.type=hub \
--env.repo_id=lerobot/libero_object
This solves a major problem: previously, setting up simulation environments was the biggest barrier for newcomers. You had to install MuJoCo, Isaac Gym, or PyBullet separately, configure paths, install dependencies — all before running your first line of code.
With EnvHub, you just need pip install lerobot and everything works.
New System Requirements
Python 3.12+ Required
V0.5 requires Python 3.12 or higher. If you're using Python 3.10 or 3.11, you'll need to upgrade before updating.
# Check version
python --version
# If needed, install Python 3.12
# Ubuntu/Debian
sudo apt install python3.12 python3.12-venv
Transformers v5
LeRobot v0.5 requires Hugging Face Transformers v5, which comes with numerous performance improvements and support for new models.
Quick Start: From Installation to Training
Step 1: Install LeRobot v0.5
# Create virtual environment
python3.12 -m venv lerobot-env
source lerobot-env/bin/activate
# Install LeRobot
pip install lerobot
# Or from source (recommended for development)
git clone https://github.com/huggingface/lerobot.git
cd lerobot
pip install -e ".[dev]"
Step 2: Train ACT on Simulation
lerobot-train \
--policy.type=act \
--dataset.repo_id=lerobot/aloha_sim_insertion_human \
--training.batch_size=8 \
--training.steps=100000
Step 3: Train Pi0-FAST
pip install -e ".[pi]"
lerobot-train \
--policy.type=pi0_fast \
--policy.pretrained_path=lerobot/pi0_fast_base \
--dataset.repo_id=lerobot/aloha_sim_insertion_human \
--training.steps=20000 \
--training.dtype=bfloat16
Step 4: Train SmolVLA
pip install -e ".[smolvla]"
lerobot-train \
--policy.path=lerobot/smolvla_base \
--dataset.repo_id=USER/my_dataset \
--training.batch_size=64 \
--training.steps=20000
Step 5: Evaluate
lerobot-eval \
--policy.path=outputs/train/checkpoints/last/pretrained_model \
--env.type=libero \
--env.task=libero_object
Migration Guide: From v0.3/v0.4 to v0.5
Key Breaking Changes
- Completely new CLI:
python -m lerobot.scripts.trainreplaced bylerobot-train - Python 3.12+ required: Python 3.10/3.11 no longer supported
- New dataset format: Video encoding replaces image files
- New config system: Arguments organized by groups (policy., dataset., training.)
- Policy API changes: Some class names and method signatures have changed
Converting Old Datasets
# Convert dataset from v0.3/v0.4 format to v0.5
lerobot-convert-dataset \
--repo_id=USER/old_dataset \
--output_repo_id=USER/new_dataset \
--encode_videos=true
Converting Training Scripts
If you have custom training scripts, update your imports:
# Old (v0.3/v0.4)
from lerobot.common.policies.act.configuration_act import ACTConfig
from lerobot.common.policies.act.modeling_act import ACTPolicy
# New (v0.5) - still compatible but CLI recommended
# Or use the new API:
from lerobot.policies import ACTPolicy, ACTConfig
Roadmap Ahead
LeRobot v0.5 lays the foundation for future developments:
- v0.6 (planned): Multi-task learning, shared representations across tasks
- v0.7+: Multi-robot coordination, fleet learning
- Long-term: Foundation models for robotics, zero-shot transfer
If you want to dive deep into each new policy, follow the next two posts in this series: SmolVLA training guide and Pi0-FAST training guide.
Conclusion
LeRobot v0.5 marks the transition from "research framework" to "production-ready platform." With a new user-friendly CLI, powerful policies that run on consumer hardware, and a dataset pipeline that's 10x faster, the barrier to getting started with robot learning has never been lower.
In particular, the introduction of SmolVLA (450M params, runs on RTX 3060) and PEFT/LoRA support means that anyone with a mid-range GPU can train VLA models for real robots. This is true democratization of robot AI.
If you're interested in the theoretical foundations behind these models, read our articles on Diffusion Policy and VLA models overview for deeper understanding.
Related Posts
- LeRobot Framework: Introduction and Architecture — Start here if you're new to LeRobot
- VLA Models: From Theory to Practice — Understand VLA foundations before using Pi0-FAST or SmolVLA
- LeRobot Ecosystem: Comprehensive Guide — Overview of hardware, software, and community