Introduction: From Research to Product
Throughout the VLA & LeRobot Mastery series, we have covered the entire journey: from understanding the framework, collecting data, training models like SmolVLA and Pi0-FAST, to real-robot RL with HIL-SERL. But everything has been in the "running on a dev machine, checking results in the terminal" stage.
This final post will close the series with a production-ready workflow — techniques for taking VLAs from lab to reality:
- PEFT/LoRA fine-tuning — training large models on consumer GPUs
- Real-Time Chunking — smooth deployment on real robots
- Async inference, streaming encoding — production performance optimization
- Plugin system & EnvHub — extending LeRobot for custom projects
Part 1: PEFT/LoRA — Train Smarter, Not Harder
The Problem: VLAs Are Too Large to Fine-tune
Modern VLA models are substantial in size:
| Model | Parameters | Full Fine-tune VRAM | Notes |
|---|---|---|---|
| Pi0 | ~3B | ~24GB | Requires RTX 4090 or A100 |
| Pi0-FAST | ~3B | ~24GB | Similar to Pi0 |
| SmolVLA | ~500M | ~8GB | Smaller but still significant |
Full fine-tuning — updating all parameters — requires expensive GPUs and long training times. For most practical applications, you only need to adapt the model for a specific task, not change the entire knowledge the model has learned.
LoRA: A Brilliantly Simple Idea
LoRA (Low-Rank Adaptation) solves this problem with an elegant idea: instead of updating the large weight matrix W (size d x d), we add two small matrices A (d x r) and B (r x d), where r << d.
Output = W * x + (A * B) * x
^ ^
Frozen Trainable
(very small)
With rank r = 8 and d = 4096, the trainable parameters are:
- Full: 4096 x 4096 = 16.7M per layer
- LoRA: (4096 x 8) + (8 x 4096) = 65.5K per layer
- A ~255x reduction in trainable parameters!
LoRA is applied to attention layers (Q, K, V, O projections) — where most of the model's "knowledge" resides. Everything else is completely frozen.
Enabling PEFT in LeRobot v0.5
LeRobot v0.5 integrates PEFT natively — just add a flag:
lerobot-train \
--policy.type=pi0 \
--policy.peft_config.use_peft=true \
--dataset.repo_id=your-username/pickup-dataset \
--policy.device=cuda
That is all. LeRobot automatically:
- Loads pretrained weights for Pi0
- Freezes the entire model
- Adds LoRA adapters to attention layers
- Trains only the LoRA parameters
Customizing the LoRA Config
For more fine-grained control:
lerobot-train \
--policy.type=pi0 \
--policy.peft_config.use_peft=true \
--policy.peft_config.lora_rank=16 \
--policy.peft_config.lora_alpha=32 \
--policy.peft_config.target_modules="q_proj,v_proj,k_proj,o_proj" \
--policy.peft_config.lora_dropout=0.05 \
--dataset.repo_id=your-username/pickup-dataset \
--training.batch_size=16 \
--training.num_epochs=100 \
--policy.device=cuda
Parameter explanations:
| Parameter | Default | Meaning |
|---|---|---|
lora_rank |
8 | Rank of the LoRA matrices. Higher = more expressive but uses more VRAM |
lora_alpha |
16 | Scaling factor. Typically set to 2x the rank |
target_modules |
q,v projections | Which layers get LoRA. Adding k,o improves results but costs more |
lora_dropout |
0.0 | Regularization. 0.05 helps prevent overfitting on small datasets |
Comparison: PEFT vs Full Fine-tuning
Here are benchmark results on a pick-and-place task with Pi0:
| Metric | Full Fine-tune | LoRA (r=8) | LoRA (r=16) |
|---|---|---|---|
| VRAM usage | 24GB | 6GB | 8GB |
| Trainable params | 3B (100%) | 15M (0.5%) | 30M (1%) |
| Time per epoch | 45 min | 12 min | 15 min |
| Success rate | 92% | 89% | 91% |
| Minimum GPU | RTX 4090 | RTX 3060 | RTX 3070 |
The key takeaway: LoRA r=16 achieves a 91% success rate — only 1% below full fine-tuning while requiring 3x less VRAM and being 3x faster. This is the sweet spot for most applications.
PEFT for SmolVLA
SmolVLA is already small (~500M params), but PEFT is still useful when:
- Your GPU has only 4GB VRAM (Jetson Nano/Orin)
- You need fast training across multiple different tasks
- You want to keep the base model and swap LoRA adapters
lerobot-train \
--policy.type=smolvla \
--policy.peft_config.use_peft=true \
--policy.peft_config.lora_rank=8 \
--dataset.repo_id=your-username/dataset \
--policy.device=cuda
SmolVLA + LoRA requires only ~3GB VRAM — it runs on a Jetson Orin Nano!
Part 2: Deploying to Real Robots — Real-Time Chunking
The Problem with Naive Deployment
When deploying a policy to a real robot in the simplest way:
while True:
obs = robot.get_observation()
action = policy.predict(obs) # Inference takes 100-200ms
robot.execute(action)
You encounter two problems:
- Latency: Each prediction takes 100-200ms, making the robot react slowly
- Jerky motion: Each action chunk (sequence of actions) starts "from scratch", causing rough transitions
Real-Time Chunking Solves Both
Real-Time Chunking (RTC) is a technique that continuously blends old predictions that are being executed with new predictions that were just computed.
Instead of:
Predict -> Execute all -> Predict -> Execute all -> ...
RTC does:
Predict chunk 1 -> Start executing
-> Predict chunk 2 (while executing chunk 1)
-> Blend chunk 1 remaining + chunk 2 start
-> Continue executing blended actions
-> Predict chunk 3...
Result: the robot moves continuously and smoothly, with no "jerks" between chunks.
Enabling RTC in LeRobot
lerobot-record \
--policy.path=your-username/pi0-pickup-lora \
--policy.rtc_config.enabled=true \
--robot.type=so100_follower \
--robot.port=/dev/ttyACM0 \
--cameras.top.port=/dev/video0
Just --policy.rtc_config.enabled=true — LeRobot handles the rest.
RTC is compatible with:
- Pi0
- Pi0-FAST
- SmolVLA
- Diffusion Policy
- Any policy that outputs action chunks
How RTC Works Under the Hood
Time -> t0 t1 t2 t3 t4 t5 t6
| | | | | | |
Chunk 1: [a1 a2 a3 a4 a5]
Chunk 2: [b1 b2 b3 b4 b5]
Chunk 3: [c1 c2 c3 c4 c5]
| | | | | | |
Executed: a1 blend blend blend blend ...
(a2,b1)(a3,b2,c1)
At each timestep, RTC takes a weighted average of all available predictions for that timestep. Newer predictions receive higher weights (because they are based on the most recent observation).
Blending formula:
action(t) = sum(w_i * chunk_i(t)) / sum(w_i)
Where w_i decreases for older chunks. This is essentially an exponential moving average over the action space.
Part 3: Production Performance Optimization
Async Inference for SmolVLA
SmolVLA supports asynchronous inference — separating image processing (vision encoder) and generation (action decoder) into a parallel pipeline.
In synchronous mode:
[Vision encode: 50ms] -> [Action decode: 100ms] -> Total: 150ms
In async mode:
Frame 1: [Vision encode: 50ms] -> [Action decode: 100ms]
Frame 2: [Vision encode: 50ms] -> [Action decode: 100ms]
^
Runs in parallel with frame 1 decode
Result: 2x throughput, ~30% latency reduction.
Enable it:
lerobot-record \
--policy.path=your-username/smolvla-model \
--policy.async_inference=true \
--policy.rtc_config.enabled=true \
--robot.type=so100_follower
Streaming Video Encoding
LeRobot v0.5 adds streaming video encoding — encoding video continuously during data collection instead of waiting until the end of each episode.
Before v0.5:
Record episode (30s) -> Wait for encoding (10-15s) -> Next episode
With streaming encoding:
Record episode (30s) -> Immediately start next episode
^
Encoding runs in background, zero wait
This is especially important for HIL-SERL — where the robot needs continuous, uninterrupted data collection.
Streaming encoding is enabled by default in v0.5. No additional configuration needed.
Part 4: Complete Production Workflow
The 5-Step Workflow
Here is a production-ready process you can apply to any manipulation task:
Step 1: Collect Data
# Teleop with leader arm or gamepad
lerobot-record \
--robot.type=so100_follower \
--teleop.type=so100_leader \
--dataset.repo_id=your-username/task-v1 \
--fps=30 \
--num_episodes=50
Step 2: Fine-tune with LoRA
# LoRA fine-tune Pi0 — only needs RTX 3060
lerobot-train \
--policy.type=pi0 \
--policy.peft_config.use_peft=true \
--policy.peft_config.lora_rank=16 \
--dataset.repo_id=your-username/task-v1 \
--training.batch_size=8 \
--training.num_epochs=100 \
--training.save_freq=10 \
--wandb.enable=true \
--wandb.project=lerobot-production \
--output_dir=./checkpoints/task-v1 \
--policy.device=cuda
Step 3: Evaluate Offline
# Run evaluation on test episodes
lerobot-eval \
--policy.path=./checkpoints/task-v1/best \
--dataset.repo_id=your-username/task-v1-test \
--output_dir=./eval_results
Check metrics:
- Success rate > 80%: proceed with deployment
- Success rate 60-80%: collect more data or increase LoRA rank
- Success rate < 60%: check data quality, review config
Step 4: Deploy with RTC
# Deploy to real robot
lerobot-record \
--policy.path=./checkpoints/task-v1/best \
--policy.rtc_config.enabled=true \
--robot.type=so100_follower \
--robot.port=/dev/ttyACM0 \
--cameras.top.port=/dev/video0
Step 5: Iterate
If the policy is not good enough on the real robot:
Option A: Collect more data on difficult scenarios and continue fine-tuning
# Collect 20 more episodes on difficult cases
lerobot-record \
--dataset.repo_id=your-username/task-v1-hard-cases \
--num_episodes=20
# Continue fine-tuning from previous checkpoint
lerobot-train \
--policy.path=./checkpoints/task-v1/best \
--policy.peft_config.use_peft=true \
--dataset.repo_id=your-username/task-v1-hard-cases \
--training.num_epochs=50
Option B: Use HIL-SERL for RL-based improvement
# Switch to RL fine-tuning
python -m lerobot.rl.learner --config_path rl_config.json
python -m lerobot.rl.actor --config_path rl_config.json
Versioning Models on HuggingFace Hub
LeRobot integrates deeply with HuggingFace Hub. After training, push your model:
# Push model to HuggingFace Hub
huggingface-cli upload your-username/pi0-pickup-v1 ./checkpoints/task-v1/best
# Anyone can use it immediately
lerobot-record \
--policy.path=your-username/pi0-pickup-v1 \
--policy.rtc_config.enabled=true
Each version is a repository on the Hub — you can track history, compare versions, and roll back when needed.
Monitoring with Weights & Biases
Track training in real-time:
lerobot-train \
--wandb.enable=true \
--wandb.project=lerobot-production \
--wandb.name=pi0-pickup-lora-r16-v2
W&B will log:
- Loss curves (actor loss, critic loss if using SAC)
- Learning rate schedule
- Gradient norms
- Evaluation metrics (success rate, average reward)
- GPU utilization and memory
Part 5: Extending LeRobot — Plugins & EnvHub
3rd-Party Policy Plugins
LeRobot v0.5 introduces a plugin system — allowing you to register custom policies as pip packages.
For example, you develop a new policy called MyCustomPolicy. Instead of forking LeRobot, you create a separate pip package:
# my_policy_package/policy.py
from lerobot.common.policies.base import BasePolicy
class MyCustomPolicy(BasePolicy):
def __init__(self, config):
super().__init__(config)
# Custom architecture here
def forward(self, batch):
# Custom forward pass
pass
def predict_action(self, observation):
# Custom inference
pass
# my_policy_package/setup.py (or pyproject.toml)
# Register entry point
entry_points={
"lerobot.policies": [
"my_custom=my_policy_package.policy:MyCustomPolicy"
]
}
After installing the package:
pip install my-policy-package
# Use it directly in LeRobot
lerobot-train --policy.type=my_custom --dataset.repo_id=...
The plugin system is extremely powerful because:
- No forking LeRobot — upstream updates do not cause conflicts
- Easy sharing — publish to PyPI, anyone can install
- Separation of concerns — policy code is separate, training infrastructure uses LeRobot
EnvHub: Load Simulation Environments from HuggingFace
EnvHub allows you to load gym environments directly from HuggingFace Hub — no separate installation needed:
# Load environment from Hub
lerobot-train \
--env.type=hub \
--env.repo_id=lerobot/simxarm \
--policy.type=diffusion
LeRobot will:
- Download the environment package from the Hub
- Automatically install dependencies
- Initialize the environment
- Start training
This is an important step toward democratizing robot learning — anyone can create and share simulation environments, just like sharing datasets on HuggingFace.
Environments available on the Hub:
lerobot/simxarm— XArm manipulation taskslerobot/aloha-sim— ALOHA bimanual manipulationlerobot/pusht— Push-T benchmark- Community environments are growing rapidly
Series Recap: From Zero to Production
Looking back at the entire VLA & LeRobot Mastery series, we have covered a complete journey:
| Phase | Post | Key Content |
|---|---|---|
| Foundation | Post 1: Framework | LeRobot architecture, dataset format, policy zoo |
| New features | Post 11: v0.5 Overview | SmolVLA, HIL-SERL, PEFT, RTC |
| Training | Post 12: SmolVLA | Training a compact VLA |
| Real-world RL | Post 14: HIL-SERL | RL on real robots with human interventions |
| Production | Post 15 (this post) | LoRA, deployment, optimization, plugins |
Roadmap Ahead
LeRobot is evolving rapidly. Here is what to look forward to:
- Multi-task training — one policy for multiple tasks, switching via language commands
- Humanoid support — expanding from single arm to whole-body control
- Better sim-to-real — stronger transfer learning from simulation to real robots
- Larger pretrained models — foundation models for robotics, similar to GPT for NLP
- Edge deployment — running on Jetson, Raspberry Pi, FPGA
Final Advice
If you are just starting with LeRobot, here is the path I recommend:
- Week 1: Read post 1, set up LeRobot, run basic tutorials
- Week 2: Collect your first dataset with SO-100, train SmolVLA
- Week 3: Fine-tune Pi0 with LoRA, deploy with RTC
- Week 4: Try HIL-SERL if the policy is not good enough
- Week 5+: Optimize, add tasks, share on the Hub
Robot learning is in its "ChatGPT moment" — powerful tools are becoming accessible to everyone. LeRobot v0.5 is a major step forward in that journey.
Start building. Your robot is waiting.
Related Posts
- HIL-SERL: Real Robot RL — The previous step in the workflow: RL directly on real robots
- SmolVLA Training Guide — Training a compact VLA as a base for LoRA fine-tuning
- LeRobot Ecosystem Guide — A comprehensive overview of the LeRobot ecosystem