researchresearchconferencerobotics

ICRA 2026: Highlights and Key Trends

Analysis of accepted papers at ICRA 2026 Vienna — Vision-Language-Action models dominate, sim-to-real matures, 3D perception explodes.

Nguyen Anh Tuan7 tháng 2, 20264 phút đọc
ICRA 2026: Highlights and Key Trends

ICRA 2026 Vienna: Dominance of Vision-Language-Action Models

IEEE International Conference on Robotics and Automation (ICRA 2026) takes place in Vienna (June 1-5) with 1,200+ papers accepted. The trends are clear: Vision-Language-Action models, mature sim-to-real pipelines, and explosive growth in 3D perception dominate the conference.

Trend 1: VLA Models Everywhere

Papers on RT-2, Octo, OpenVLA and successors account for 25% of accepted papers. Key finding: cross-embodiment transfer works. Models trained on multi-robot datasets achieve zero-shot generalization to unseen platforms.

Trend 2: Sim-to-Real Matures

Sim-to-real is no longer experimental. Papers show:

  • Domain randomization → 80%+ success on unseen tasks
  • Automated real-to-sim tuning → reduces manual calibration
  • Contact-implicit models (MuJoCo) → better accuracy than impulse-based methods

Trend 3: 3D Perception Explosion

LiDAR + neural networks + transformers enable:

  • Semantic segmentation in cluttered scenes
  • 6D object pose estimation
  • Real-time 3D reconstruction

Best Paper Contenders

pi-0.5: Open-World Generalization

pi-0.5: a Vision-Language-Action Model with Open-World Generalization — Physical Intelligence, 2025

This is the standout paper. pi-0.5 advances from pi-0 to enable long-horizon and dexterous manipulation in completely unseen homes. The secret lies in co-training on heterogeneous tasks: data from multiple robots, high-level semantic prediction, web data, and object detections combined in hybrid multi-modal training examples.

Result: Robots can clean kitchens, clean bedrooms in entirely new environments — first time an end-to-end learning system achieves this level of generalization.

Practical takeaway: Open-world generalization is no longer a distant goal. If you're building service robots for Vietnamese market (restaurants, hotels, warehouses), VLA models are mature enough for pilots now.

GR00T N1: Foundation Model for Manipulation

NVIDIA's GR00T N1 (Grounded Robot) is a dual-system foundation model:

  • Fast reactive module for low-level control
  • Slow reasoning planner for high-level decisions

Demonstrated across multiple humanoid platforms with impressive zero-shot transfer.

MuJoCo MPC: Real-Time Whole-Body Control

Real-time whole-body control achieving 100 Hz on humanoid robots with zero-shot sim-to-real transfer. This represents maturation of contact-based model predictive control.

Workshop Highlights

VLA Pipelines for Real Robots

"From Data to Decisions: VLA Pipelines for Real Robots" workshop features:

  • 10,000+ hours of real robot data
  • Global competition on VLA pipelines
  • Key takeaways:
    • Data quality > data quantity: diverse, well-labeled data beats raw volume
    • Evaluation standards emerging: community converging on VLA benchmarks
    • Real deployment gaps: latency, safety constraints, hardware limitations remain

Field Robotics Workshop

Focus on agricultural and construction robots — high-potential area in Vietnam. Discussions centered on robust perception in outdoor environments, long-term autonomy, and operation in harsh weather.

Five Practical Takeaways for Vietnamese Engineers

1. VLA Models Production-Ready for Specific Domains

No need to wait — pi-0.5 and GR00T N1 proved generalization in real environments. Start with open-source models and fine-tune for your use case.

2. 3D Perception is Mandatory Investment

PointVLA and Any3D-VLA showed 2D vision alone insufficient for precise manipulation. Add depth sensing (Intel RealSense, Stereolabs ZED) to your pipeline immediately.

3. Cross-Embodiment Reduces Costs

Instead of training policies for each robot type, invest in cross-embodiment approaches. Particularly important when deploying multiple robot types.

4. Safety-First for Fleet Deployment

Control barrier functions and safety-aware navigation are no longer nice-to-have — they're requirements for warehouse deployment. Study them before deployment.

5. Sim-to-Real Pipeline is Competitive Advantage

Invest in NVIDIA Isaac Lab and automated sim-to-real tuning. This multiplies productivity for small robotics teams.

Looking Ahead to RSS 2026

ICRA 2026 momentum continues to RSS 2026 (Sydney, July 13-17), which will push sim-to-real transfer and dexterous manipulation further. Watch for breakthroughs in manipulation policies trained on vision-language models.


NT

Nguyễn Anh Tuấn

Robotics & AI Engineer. Building VnRobo — sharing knowledge about robot learning, VLA models, and automation.

Bài viết liên quan

NEWTutorial
NVIDIA Newton 1.0: GPU Physics 475x Nhanh Hơn MJX
simulationnvidiaphysics-enginegpusim-to-realisaac-labrobotics

NVIDIA Newton 1.0: GPU Physics 475x Nhanh Hơn MJX

Hướng dẫn thực hành NVIDIA Newton 1.0 — physics engine mã nguồn mở nhanh nhất cho sim-to-real robotics, tăng tốc 475x so với MJX trên GPU.

17/4/202611 phút đọc
NEWTutorial
Hướng dẫn GigaBrain-0: VLA + World Model + RL
vlaworld-modelreinforcement-learninggigabrainroboticsmanipulation

Hướng dẫn GigaBrain-0: VLA + World Model + RL

Hướng dẫn chi tiết huấn luyện VLA bằng World Model và Reinforcement Learning với framework RAMP từ GigaBrain — open-source, 3.5B params.

12/4/202611 phút đọc
NEWNghiên cứu
Gemma 4 và Ứng Dụng Trong Robotics
ai-perceptiongemmaedge-aifoundation-modelsrobotics

Gemma 4 và Ứng Dụng Trong Robotics

Phân tích kiến trúc Gemma 4 của Google — từ on-device AI đến ứng dụng thực tế trong điều khiển robot, perception và agentic workflows.

12/4/202612 phút đọc