← Back to Blog
ailerobotvlapi0-fastsmolvla

LeRobot v0.5: What's New

Complete overview of LeRobot v0.5 — Pi0-FAST, SmolVLA, Real-Time Chunking, HIL-SERL, Unitree G1, PEFT/LoRA, and 10x training speed.

Nguyễn Anh Tuấn8 tháng 4, 202610 min read
LeRobot v0.5: What's New

LeRobot v0.5: The Biggest Update Yet

After over 6 months of development with 200+ pull requests from 50+ contributors, LeRobot v0.5 has officially launched — and this is no minor update. It represents the most significant leap since Hugging Face first released LeRobot, bringing fundamental changes to the CLI architecture, new policies, dataset handling, and hardware support.

If you've been following our VLA-LeRobot series, you'll notice that many limitations we had to work around in v0.3/v0.4 have now been thoroughly addressed. Let's walk through each major feature.

AI and robotics advancing rapidly

Overview: Impressive Numbers

Before diving into details, here's the big picture:

LeRobot v0.5 isn't just a software update — it's a clear signal that open-source robotics AI is entering its maturity phase.

New CLI: lerobot-train and lerobot-eval

Goodbye python -m lerobot.scripts.*

One of the first changes you'll notice is the completely new CLI. Instead of calling:

# Old way (deprecated)
python -m lerobot.scripts.train --policy.type=act --dataset.repo_id=lerobot/aloha_sim_insertion_human

You now use:

# New way in v0.5
lerobot-train --policy.type=act --dataset.repo_id=lerobot/aloha_sim_insertion_human

Similarly for evaluation:

# New way
lerobot-eval --policy.path=outputs/train/act_aloha_sim/checkpoints/last/pretrained_model

And data collection:

# Record dataset
lerobot-record --robot.type=so100 --repo_id=USER/my_dataset --num_episodes=50

Why the change? The primary reason is developer experience. The python -m commands were verbose, error-prone, and hard to remember. The new CLI is automatically installed when you pip install lerobot, working as a proper command-line tool with auto-completion and clear help messages.

New Argument Structure

Arguments are now organized by logical groups:

lerobot-train \
  --policy.type=act \                     # Policy configuration
  --policy.chunk_size=100 \
  --dataset.repo_id=lerobot/aloha_sim \   # Dataset configuration
  --dataset.episodes=[0,1,2,3,4] \
  --training.batch_size=64 \              # Training configuration
  --training.steps=100000 \
  --training.lr=1e-5

This structure is much clearer than the flat arguments in v0.3/v0.4.

New Policies: 6 Breakthrough Models

1. Pi0-FAST: Autoregressive VLA, 5x Faster

Pi0-FAST combines the PaliGemma architecture with the FAST action tokenizer — an entirely new approach to encoding robot actions. Instead of using flow matching (iterative denoising) like the original Pi0, Pi0-FAST tokenizes actions into discrete tokens and uses autoregressive decoding with KV-caching.

Result: 5x faster inference compared to diffusion-based Pi0, with equivalent performance on LIBERO benchmark (82.5 average success rate).

# Fine-tune Pi0-FAST
lerobot-train \
  --policy.type=pi0_fast \
  --policy.pretrained_path=lerobot/pi0_fast_base \
  --dataset.repo_id=USER/my_dataset \
  --training.steps=20000

2. SmolVLA: VLA for Consumer GPUs

SmolVLA solves the biggest problem with VLA models: they're too heavy for commodity hardware. With only 450M parameters (SmolVLM2-500M backbone + 100M Flow Matching action expert), SmolVLA runs on RTX 3060/4060 — GPUs anyone can afford.

The standout feature: SmolVLA is trained entirely on open-source data, achieving 78% real-world success rate on single-arm manipulation tasks.

# Fine-tune SmolVLA
lerobot-train \
  --policy.path=lerobot/smolvla_base \
  --dataset.repo_id=USER/my_dataset \
  --training.batch_size=64 \
  --training.steps=20000

3. Wall-X: Cross-embodiment Policy

Wall-X extends the concept of "train once, deploy everywhere" — a cross-embodiment policy that can transfer between SO-100, ALOHA, and Franka Panda without retraining from scratch.

4. X-VLA: Extended VLA Architecture

X-VLA extends the traditional VLA architecture with multi-modal reasoning, enabling the model to simultaneously process images, text instructions, and proprioceptive feedback.

5. SARM: Self-Adaptive Robot Manipulation

SARM focuses on adaptation — the ability to self-adjust behavior based on environment feedback without requiring human intervention.

6. HIL-SERL: Human-in-the-Loop RL

HIL-SERL combines reinforcement learning with human feedback, allowing robots to continuously improve through interaction with operators. This is a significant step forward for real-world deployment.

AI technology continues to advance

Real-Time Chunking: Faster, Smoother Responses

The Problem with Traditional Action Chunking

In our article about ACT, we discussed action chunking — predicting a chunk of multiple actions simultaneously rather than one at a time. However, there's a limitation: the robot must wait until the entire chunk is executed before receiving a new chunk. This creates latency and makes the robot slow to react to unexpected changes.

How Real-Time Chunking Solves This

Real-Time Chunking (RTC) allows the robot to continuously update its plan without waiting for the current chunk to finish. Specifically:

  1. The model still predicts a full action chunk (e.g., 10 actions)
  2. But only executes the first few actions before querying the model again
  3. New actions are blended with unexecuted old actions, creating smoother trajectories

Enabling RTC is simple:

lerobot-train \
  --policy.type=pi0_fast \
  --policy.rtc_config.enabled=true \
  --policy.rtc_config.n_steps_warmup=5 \
  --dataset.repo_id=USER/my_dataset

RTC is especially effective when combined with Pi0-FAST thanks to fast inference speed — the model can replan multiple times in the window where a diffusion-based model can only predict once.

PEFT/LoRA: Fine-tune Large VLAs Cheaply

Why PEFT?

Large VLA models like Pi0 (3B params) or OpenVLA (7B params) require multiple GPUs and lots of RAM to fully fine-tune. PEFT (Parameter-Efficient Fine-Tuning) lets you train only a very small subset of parameters, significantly reducing hardware requirements.

LoRA in LeRobot v0.5

LeRobot v0.5 integrates LoRA (Low-Rank Adaptation) directly into the training pipeline:

lerobot-train \
  --policy.type=pi0_fast \
  --policy.pretrained_path=lerobot/pi0_fast_base \
  --policy.peft_config.use_peft=true \
  --policy.peft_config.lora_r=16 \
  --policy.peft_config.lora_alpha=32 \
  --dataset.repo_id=USER/my_dataset \
  --training.steps=10000

With LoRA rank 16, you only train about 0.5% of total parameters, reducing VRAM from 40GB down to 8-12GB — fitting comfortably on an RTX 4090 or even RTX 3090.

If you've read our article on VLA models, you'll see that PEFT/LoRA is the most practical solution for democratizing VLA fine-tuning across the community.

New Robots: From Single-Arm to Humanoid

Unitree G1: First Humanoid in LeRobot

This is perhaps the most anticipated feature: LeRobot now supports Unitree G1 — the first humanoid robot in the framework. You can:

# Record on Unitree G1
lerobot-record \
  --robot.type=unitree_g1 \
  --repo_id=USER/g1_dataset \
  --num_episodes=20

OpenArm: Low-cost Open-Source Arm

OpenArm is a new affordable robot arm designed for education and research. Fully 3D-printable, using inexpensive servos, and fully compatible with the LeRobot pipeline.

Earth Rover: Mobile Robotics

Earth Rover extends LeRobot beyond manipulation — you can now use the same framework for mobile robotics, navigation, and exploration.

SO-100 and SO-101

SO-100 and SO-101 represent the evolution of the SO-100 arm, with improvements in accuracy, build quality, and better camera integration.

Dataset Improvements: 10x Training Speed

Streaming Video Encoding

This is a quiet but high-impact change for daily usage. Before v0.5, LeRobot stored images as individual PNG/JPEG files — each frame as a separate file. With a dataset of 1000 episodes at 30 FPS, you could have millions of small files, causing:

V0.5 switches to streaming video encoding — all visual observations are encoded as video files (MP4/WebM) and decoded on-the-fly during training. Results:

Subtask Annotation

V0.5 allows annotating subtasks within each episode, helping models learn hierarchical behavior:

# Example annotation
{
    "episode_0": {
        "subtasks": [
            {"start": 0, "end": 45, "label": "reach_object"},
            {"start": 45, "end": 90, "label": "grasp_object"},
            {"start": 90, "end": 150, "label": "place_object"}
        ]
    }
}

This is especially useful for long-horizon tasks and multi-step manipulation.

EnvHub: Sim Environments from Hugging Face

EnvHub lets you load simulation environments directly from Hugging Face Hub, just like loading models and datasets:

# Load environment from Hub
lerobot-eval \
  --policy.path=USER/my_policy \
  --env.type=hub \
  --env.repo_id=lerobot/libero_object

This solves a major problem: previously, setting up simulation environments was the biggest barrier for newcomers. You had to install MuJoCo, Isaac Gym, or PyBullet separately, configure paths, install dependencies — all before running your first line of code.

With EnvHub, you just need pip install lerobot and everything works.

The future of robot learning

New System Requirements

Python 3.12+ Required

V0.5 requires Python 3.12 or higher. If you're using Python 3.10 or 3.11, you'll need to upgrade before updating.

# Check version
python --version

# If needed, install Python 3.12
# Ubuntu/Debian
sudo apt install python3.12 python3.12-venv

Transformers v5

LeRobot v0.5 requires Hugging Face Transformers v5, which comes with numerous performance improvements and support for new models.

Quick Start: From Installation to Training

Step 1: Install LeRobot v0.5

# Create virtual environment
python3.12 -m venv lerobot-env
source lerobot-env/bin/activate

# Install LeRobot
pip install lerobot

# Or from source (recommended for development)
git clone https://github.com/huggingface/lerobot.git
cd lerobot
pip install -e ".[dev]"

Step 2: Train ACT on Simulation

lerobot-train \
  --policy.type=act \
  --dataset.repo_id=lerobot/aloha_sim_insertion_human \
  --training.batch_size=8 \
  --training.steps=100000

Step 3: Train Pi0-FAST

pip install -e ".[pi]"

lerobot-train \
  --policy.type=pi0_fast \
  --policy.pretrained_path=lerobot/pi0_fast_base \
  --dataset.repo_id=lerobot/aloha_sim_insertion_human \
  --training.steps=20000 \
  --training.dtype=bfloat16

Step 4: Train SmolVLA

pip install -e ".[smolvla]"

lerobot-train \
  --policy.path=lerobot/smolvla_base \
  --dataset.repo_id=USER/my_dataset \
  --training.batch_size=64 \
  --training.steps=20000

Step 5: Evaluate

lerobot-eval \
  --policy.path=outputs/train/checkpoints/last/pretrained_model \
  --env.type=libero \
  --env.task=libero_object

Migration Guide: From v0.3/v0.4 to v0.5

Key Breaking Changes

  1. Completely new CLI: python -m lerobot.scripts.train replaced by lerobot-train
  2. Python 3.12+ required: Python 3.10/3.11 no longer supported
  3. New dataset format: Video encoding replaces image files
  4. New config system: Arguments organized by groups (policy., dataset., training.)
  5. Policy API changes: Some class names and method signatures have changed

Converting Old Datasets

# Convert dataset from v0.3/v0.4 format to v0.5
lerobot-convert-dataset \
  --repo_id=USER/old_dataset \
  --output_repo_id=USER/new_dataset \
  --encode_videos=true

Converting Training Scripts

If you have custom training scripts, update your imports:

# Old (v0.3/v0.4)
from lerobot.common.policies.act.configuration_act import ACTConfig
from lerobot.common.policies.act.modeling_act import ACTPolicy

# New (v0.5) - still compatible but CLI recommended
# Or use the new API:
from lerobot.policies import ACTPolicy, ACTConfig

Roadmap Ahead

LeRobot v0.5 lays the foundation for future developments:

If you want to dive deep into each new policy, follow the next two posts in this series: SmolVLA training guide and Pi0-FAST training guide.

Conclusion

LeRobot v0.5 marks the transition from "research framework" to "production-ready platform." With a new user-friendly CLI, powerful policies that run on consumer hardware, and a dataset pipeline that's 10x faster, the barrier to getting started with robot learning has never been lower.

In particular, the introduction of SmolVLA (450M params, runs on RTX 3060) and PEFT/LoRA support means that anyone with a mid-range GPU can train VLA models for real robots. This is true democratization of robot AI.

If you're interested in the theoretical foundations behind these models, read our articles on Diffusion Policy and VLA models overview for deeper understanding.


Related Posts

Related Posts

ResearchΨ₀ Hands-On (6): Ablation & Bài học rút ra
ai-perceptionvlaresearchhumanoidpsi0Part 6

Ψ₀ Hands-On (6): Ablation & Bài học rút ra

Phân tích ablation studies, so sánh baselines, và 5 bài học quan trọng nhất từ Ψ₀ cho người mới bắt đầu.

11/4/202616 min read
ResearchSimpleVLA-RL (4): Kết quả & Bài học
ai-perceptionvlareinforcement-learningresearchPart 4

SimpleVLA-RL (4): Kết quả & Bài học

Phân tích kết quả SimpleVLA-RL: ablation studies, hiện tượng pushcut, real-world transfer, và 5 bài học rút ra.

11/4/202614 min read
ComparisonSimpleVLA-RL (5): So sánh với LeRobot
ai-perceptionvlareinforcement-learninglerobotresearchPart 5

SimpleVLA-RL (5): So sánh với LeRobot

So sánh chi tiết SimpleVLA-RL và LeRobot: RL approach, VLA models, sim vs real, data efficiency — hai framework bổ trợ nhau.

11/4/202612 min read