← Back to Blog
aiai-perceptionvlareinforcement-learninglerobotresearch

SimpleVLA-RL (5): Comparison with LeRobot

In-depth comparison of SimpleVLA-RL and LeRobot: RL approach, VLA models, sim vs real, data efficiency — two complementary frameworks.

Nguyễn Anh Tuấn11 tháng 4, 202611 min read
SimpleVLA-RL (5): Comparison with LeRobot

SimpleVLA-RL vs LeRobot: Two Paths to Teaching Robots Manipulation

If you have been following robot manipulation research in 2026, two names keep coming up: SimpleVLA-RL — a framework that applies Reinforcement Learning directly to VLA models in simulation, and LeRobot — HuggingFace's open-source ecosystem supporting both Imitation Learning and RL on real robots. Both frameworks tackle the same problem — teaching robots to manipulate objects — but with fundamentally different philosophies.

In this post, we will analyze the strengths and weaknesses of each framework across 8 criteria, helping you choose the right tool for your project. Spoiler: they are not competing — they are complementary.

1. RL Philosophy: Simulation vs Real Robot

This is the most fundamental difference between the two frameworks.

SimpleVLA-RL: Pure RL in Simulation

SimpleVLA-RL uses GRPO (Group Relative Policy Optimization) — a PPO variant designed for language models. The reward function is remarkably simple: binary 0/1 (success or failure). No reward shaping, no reward classifier — just a simulator returning whether the task was completed.

The entire RL process happens in simulation (LIBERO, RoboTwin). The robot tries thousands of episodes in virtual environments with zero risk of hardware damage and no supervision required. Once the policy converges, it transfers to real robots via sim-to-real transfer.

GRPO features an interesting design: asymmetric clipping with a lower bound of 0.2 and an upper bound of 1.28. This encourages the model to explore novel behaviors rather than sticking to safe strategies — which is precisely how SimpleVLA-RL discovered the pushcut phenomenon (pushing to cut instead of using scissors conventionally).

LeRobot HIL-SERL: RL Directly on Real Hardware

LeRobot takes the opposite approach with HIL-SERL (Human-in-the-Loop Sample Efficient RL). The underlying algorithm is SAC (Soft Actor-Critic) — better suited for continuous control on real hardware because it is significantly more sample efficient than PPO/GRPO.

Instead of binary rewards from a simulator, LeRobot trains a reward classifier — a CNN/ResNet network that predicts success probability from camera images. This classifier is trained from approximately 15-20 demonstrations before RL begins.

The most unique aspect: humans intervene directly during training via gamepad or keyboard. When the robot is about to collide or go off-track, the operator presses a button to "correct" the behavior. This is both safer and helps the robot learn faster — but requires constant human presence.

Comparing two RL approaches for robot manipulation

2. Scale and Cost

This is where the two frameworks differ most dramatically in practical terms.

Criterion SimpleVLA-RL LeRobot HIL-SERL
VLA model OpenVLA-OFT (7B params) SmolVLA (450M), ACT, Pi0-FAST
GPU required 8x A800 80GB (~$100K+) 1 consumer-grade GPU
Robot required None (train in sim) SO100, Koch, etc. ($200-500)
Training time Several hours on GPU cluster Several hours on 1 GPU + robot
Estimated cost $500-2000/run (cloud GPU) $500-1000 (robot + GPU purchase)

SimpleVLA-RL demands massive compute — 8 A800 80GB GPUs to train a 7B parameter model with online RL. This level of investment is only feasible for research labs or large companies.

LeRobot takes the opposite approach — its philosophy is to democratize robotics. SmolVLA has only 450M parameters and runs on a single GPU. The SO100 robot arm costs around $200-300. Total setup cost under $1000, making it accessible to students, hobbyists, and small startups.

However, there is an important nuance: once SimpleVLA-RL finishes training, the policy can be deployed to many robots without additional GPU cost. LeRobot HIL-SERL must train separately on each robot (since each has different kinematics and camera configurations).

3. VLA Model Ecosystem

SimpleVLA-RL: Deep Focus on One Model

SimpleVLA-RL centers on OpenVLA-OFT — an architecture based on LLaMA2-7B combined with vision encoders. It is a powerful model, but the framework only supports this single architecture. If you want to experiment with other policies (ACT, Diffusion Policy), you need to implement them yourself.

LeRobot: The VLA Supermarket

LeRobot v0.5.1 (April 2026) supports an impressive roster of policies:

Imitation Learning:

VLA Models:

Reinforcement Learning:

This diversity enables rapid experimentation — train an ACT baseline in 30 minutes, compare it with SmolVLA, then fine-tune Pi0-FAST if you need higher performance. This is the major advantage of an open-source ecosystem with 23K+ GitHub stars and 236 contributors.

4. Sim-to-Real vs Train-on-Real

SimpleVLA-RL: Train in Sim, Deploy on Real

The biggest advantage: no real robot needed during training. The robot attempts thousands of episodes in LIBERO or RoboTwin — failures cost nothing. But the tradeoff is facing the sim-to-real gap — differences between simulation and the real world (physics, lighting, friction, object geometry).

SimpleVLA-RL's sim-to-real results are encouraging: from 17.5% to 38.5% on the Piper dual-arm robot with zero demonstrations on the real robot. However, 38.5% is still far from production-ready. The sim-to-real gap remains the biggest challenge.

LeRobot HIL-SERL: Train Directly on Real

LeRobot bypasses the sim-to-real gap entirely by training directly on real hardware. Just ~15 demonstrations plus a few hours of RL on a SO100 or Koch arm achieves near-perfect performance.

The downside: it is slower (must wait for the robot to execute each action), requires continuous supervision, and the robot can be damaged if exploration is too aggressive. Actions are constrained to end-effector space (not joint space) for safety.

Robot arm performing manipulation in a real environment

5. Data Efficiency: 1 Demo vs 15 Demos

This is the most surprising result from SimpleVLA-RL.

SimpleVLA-RL: Cold-Start with 1 Demo

In the cold-start experiment, SimpleVLA-RL needs only 1 demonstration for SFT (Supervised Fine-Tuning), then uses RL in simulation to improve. Result: 91.7% success rate on the LIBERO benchmark. From 1 demo to >90% — this is an unprecedented level of data efficiency.

The secret: the 7B VLA model already possesses foundational knowledge (language understanding, visual grounding) from pre-training. RL only needs to "unlock" the manipulation capability, not learn from scratch.

LeRobot HIL-SERL: ~15 Demos + Few Hours of RL

LeRobot requires more data — approximately 15-20 demonstrations to train the reward classifier and warm-start the policy. Then, a few hours of RL on the real robot (with human interventions) achieves near-perfect performance.

While more data-hungry, data collection is straightforward: teleoperate the robot by hand, record the trajectory, upload to HuggingFace Hub. The LeRobotDataset format (Parquet + MP4) makes sharing data across the community effortless.

Metric SimpleVLA-RL LeRobot HIL-SERL
Demos required 1 (cold-start) ~15-20
RL training time Several hours (sim) Several hours (real)
Success rate 91.7-99.1% (sim) Near-perfect (real)
Supervision needed? No Yes (human-in-loop)

6. Exploration: Freedom vs Safety

SimpleVLA-RL: Free Exploration in a Sandbox

One of the most fascinating discoveries from SimpleVLA-RL is the pushcut phenomenon. When training a vegetable cutting task with scissors, instead of learning conventional scissor usage, the robot discovered that pushing the blade downward (using it like a knife) was more effective for certain vegetables.

This emerged thanks to temperature sampling at τ=1.6 — a high value that encourages the model to try novel behaviors. In simulation, bold experimentation has zero consequences — worst case, the task fails, resets, and tries again.

LeRobot HIL-SERL: Controlled Exploration

LeRobot takes a more cautious approach. Human interventions serve as safety guardrails — when the robot begins dangerous exploration (collisions, dropping objects), the operator intervenes immediately. This is safer but also limits the potential for discovering novel strategies.

SAC includes entropy regularization to encourage exploration, but at moderate levels — nowhere near as bold as GRPO with SimpleVLA-RL's asymmetric clipping.

7. Community and Ecosystem

SimpleVLA-RL: Academic Paper, Small Team

SimpleVLA-RL is a research product from ICLR 2026, developed by a small team. The code is public but the ecosystem is nascent. Documentation consists mainly of the paper and a few reproduction scripts. If you encounter bugs or want to extend functionality, you are largely on your own.

The underlying framework is veRL (Volcano Engine RL) — an RL library for LLMs from ByteDance, relatively new with a small community.

LeRobot: A Massive Ecosystem

LeRobot has the backing of HuggingFace — the company behind Transformers, Datasets, and Diffusers. The numbers speak for themselves:

This ecosystem means you never train alone. Pre-trained checkpoints, ready-made datasets, tutorials, and people to help debug are all available. This is an advantage that cannot be underestimated — especially for newcomers.

LeRobot v0.5 (April 2026) also introduces many new features: Real-Time Chunking, Pi0-FAST support, 10x faster training, PEFT/LoRA fine-tuning, and EnvHub for simulation.

8. When to Use Which?

After analyzing the 7 criteria above, here are practical recommendations:

Choose SimpleVLA-RL when:

Choose LeRobot when:

The convergence of AI and robotics ecosystems

The Future: Combining Both?

The most interesting question is not "which one to choose" but "how to combine them." Imagine the following pipeline:

  1. Pre-train VLA with SFT on large-scale data (OpenVLA, Pi0)
  2. RL in simulation (SimpleVLA-RL style) — discover novel strategies, achieve high data efficiency, train hundreds of tasks in parallel
  3. Fine-tune on real robot (LeRobot HIL-SERL style) — bridge the sim-to-real gap, apply human corrections for edge cases
  4. Deploy with high confidence — having passed both sim training and real-world validation

This pipeline captures the best of both worlds: SimpleVLA-RL's free exploration in simulation, and LeRobot HIL-SERL's precision on real hardware. The sim-to-real gap — SimpleVLA-RL's biggest challenge — gets solved by the real-robot fine-tuning stage.

Several promising research directions to watch:

Summary Table

Criterion SimpleVLA-RL LeRobot (HIL-SERL)
RL Algorithm GRPO (no KL, asymmetric clip) SAC + human interventions
Training env Simulation (LIBERO, RoboTwin) Real robot (SO100, Koch)
Reward Binary 0/1 from simulator Learned reward classifier
VLA Model OpenVLA-OFT (7B) SmolVLA (450M), Pi0-FAST, ACT, etc.
Hardware 8x A800 80GB 1 GPU + robot arm $200-500
Data efficiency 1 demo to 91.7% ~15 demos + few hours to near-perfect
Exploration Unconstrained (τ=1.6, pushcut) Controlled (human corrections)
Sim-to-real 17.5 to 38.5% (zero-shot) Not needed (train on real)
Community Academic paper, small team 23K+ stars, 236 contributors
Best for Research, scaling, sim-to-real Production, accessibility, real deployment

Conclusion

SimpleVLA-RL and LeRobot are not competitors — they are two pieces of the robot learning puzzle. SimpleVLA-RL unlocks the ability to train VLA models with RL in simulation with remarkable data efficiency. LeRobot provides a complete ecosystem to turn research into real products deployed on actual robots.

If you are a researcher pushing the boundaries of VLA models, SimpleVLA-RL is the ideal playground. If you are an engineer looking to deploy robot manipulation in the real world, LeRobot is the fastest path. And if you are ambitious — combine both.

The future of robot learning is not sim or real. It is sim then real — and both frameworks play essential roles in that pipeline.


Related Posts

Related Posts

ResearchΨ₀ Hands-On (6): Ablation & Bài học rút ra
ai-perceptionvlaresearchhumanoidpsi0Part 6

Ψ₀ Hands-On (6): Ablation & Bài học rút ra

Phân tích ablation studies, so sánh baselines, và 5 bài học quan trọng nhất từ Ψ₀ cho người mới bắt đầu.

11/4/202616 min read
ResearchFlashSAC: RL nhanh hơn PPO cho Robot
ai-perceptionreinforcement-learninghumanoidresearch

FlashSAC: RL nhanh hơn PPO cho Robot

FlashSAC — off-policy RL mới vượt PPO về tốc độ lẫn hiệu quả trên 100+ tasks robotics, từ humanoid locomotion đến dexterous manipulation.

11/4/202610 min read
TutorialSimpleVLA-RL (10): SFT & RL Training cho OpenArm
openarmsimplevla-rltraininggrporeinforcement-learningPart 10

SimpleVLA-RL (10): SFT & RL Training cho OpenArm

Hướng dẫn chi tiết SFT fine-tuning và RL training với SimpleVLA-RL cho OpenArm — từ config environment đến chạy GRPO.

11/4/202616 min read