ICRA 2026 Vienna: Dominance of Vision-Language-Action Models
IEEE International Conference on Robotics and Automation (ICRA 2026) takes place in Vienna (June 1-5) with 1,200+ papers accepted. The trends are clear: Vision-Language-Action models, mature sim-to-real pipelines, and explosive growth in 3D perception dominate the conference.
Trend 1: VLA Models Everywhere
Papers on RT-2, Octo, OpenVLA and successors account for 25% of accepted papers. Key finding: cross-embodiment transfer works. Models trained on multi-robot datasets achieve zero-shot generalization to unseen platforms.
Trend 2: Sim-to-Real Matures
Sim-to-real is no longer experimental. Papers show:
- Domain randomization → 80%+ success on unseen tasks
- Automated real-to-sim tuning → reduces manual calibration
- Contact-implicit models (MuJoCo) → better accuracy than impulse-based methods
Trend 3: 3D Perception Explosion
LiDAR + neural networks + transformers enable:
- Semantic segmentation in cluttered scenes
- 6D object pose estimation
- Real-time 3D reconstruction
Best Paper Contenders
pi-0.5: Open-World Generalization
pi-0.5: a Vision-Language-Action Model with Open-World Generalization — Physical Intelligence, 2025
This is the standout paper. pi-0.5 advances from pi-0 to enable long-horizon and dexterous manipulation in completely unseen homes. The secret lies in co-training on heterogeneous tasks: data from multiple robots, high-level semantic prediction, web data, and object detections combined in hybrid multi-modal training examples.
Result: Robots can clean kitchens, clean bedrooms in entirely new environments — first time an end-to-end learning system achieves this level of generalization.
Practical takeaway: Open-world generalization is no longer a distant goal. If you're building service robots for Vietnamese market (restaurants, hotels, warehouses), VLA models are mature enough for pilots now.
GR00T N1: Foundation Model for Manipulation
NVIDIA's GR00T N1 (Grounded Robot) is a dual-system foundation model:
- Fast reactive module for low-level control
- Slow reasoning planner for high-level decisions
Demonstrated across multiple humanoid platforms with impressive zero-shot transfer.
MuJoCo MPC: Real-Time Whole-Body Control
Real-time whole-body control achieving 100 Hz on humanoid robots with zero-shot sim-to-real transfer. This represents maturation of contact-based model predictive control.
Workshop Highlights
VLA Pipelines for Real Robots
"From Data to Decisions: VLA Pipelines for Real Robots" workshop features:
- 10,000+ hours of real robot data
- Global competition on VLA pipelines
- Key takeaways:
- Data quality > data quantity: diverse, well-labeled data beats raw volume
- Evaluation standards emerging: community converging on VLA benchmarks
- Real deployment gaps: latency, safety constraints, hardware limitations remain
Field Robotics Workshop
Focus on agricultural and construction robots — high-potential area in Vietnam. Discussions centered on robust perception in outdoor environments, long-term autonomy, and operation in harsh weather.
Five Practical Takeaways for Vietnamese Engineers
1. VLA Models Production-Ready for Specific Domains
No need to wait — pi-0.5 and GR00T N1 proved generalization in real environments. Start with open-source models and fine-tune for your use case.
2. 3D Perception is Mandatory Investment
PointVLA and Any3D-VLA showed 2D vision alone insufficient for precise manipulation. Add depth sensing (Intel RealSense, Stereolabs ZED) to your pipeline immediately.
3. Cross-Embodiment Reduces Costs
Instead of training policies for each robot type, invest in cross-embodiment approaches. Particularly important when deploying multiple robot types.
4. Safety-First for Fleet Deployment
Control barrier functions and safety-aware navigation are no longer nice-to-have — they're requirements for warehouse deployment. Study them before deployment.
5. Sim-to-Real Pipeline is Competitive Advantage
Invest in NVIDIA Isaac Lab and automated sim-to-real tuning. This multiplies productivity for small robotics teams.
Looking Ahead to RSS 2026
ICRA 2026 momentum continues to RSS 2026 (Sydney, July 13-17), which will push sim-to-real transfer and dexterous manipulation further. Watch for breakthroughs in manipulation policies trained on vision-language models.