ICRA 2026: Highlights and Key Trends

ICRA 2026 Vienna: Dominance of Vision-Language-Action Models

IEEE International Conference on Robotics and Automation (ICRA 2026) takes place in Vienna (June 1-5) with 1,200+ papers accepted. The trends are clear: Vision-Language-Action models, mature sim-to-real pipelines, and explosive growth in 3D perception dominate the conference.

Trend 1: VLA Models Everywhere

Papers on RT-2, Octo, OpenVLA and successors account for 25% of accepted papers. Key finding: cross-embodiment transfer works. Models trained on multi-robot datasets achieve zero-shot generalization to unseen platforms.

Trend 2: Sim-to-Real Matures

Sim-to-real is no longer experimental. Papers show:

Domain randomization → 80%+ success on unseen tasks
Automated real-to-sim tuning → reduces manual calibration
Contact-implicit models (MuJoCo) → better accuracy than impulse-based methods

Trend 3: 3D Perception Explosion

LiDAR + neural networks + transformers enable:

Semantic segmentation in cluttered scenes
6D object pose estimation
Real-time 3D reconstruction

Best Paper Contenders

pi-0.5: Open-World Generalization

pi-0.5: a Vision-Language-Action Model with Open-World Generalization — Physical Intelligence, 2025

This is the standout paper. pi-0.5 advances from pi-0 to enable long-horizon and dexterous manipulation in completely unseen homes. The secret lies in co-training on heterogeneous tasks: data from multiple robots, high-level semantic prediction, web data, and object detections combined in hybrid multi-modal training examples.

Result: Robots can clean kitchens, clean bedrooms in entirely new environments — first time an end-to-end learning system achieves this level of generalization.

Practical takeaway: Open-world generalization is no longer a distant goal. If you're building service robots for Vietnamese market (restaurants, hotels, warehouses), VLA models are mature enough for pilots now.

GR00T N1: Foundation Model for Manipulation

NVIDIA's GR00T N1 (Grounded Robot) is a dual-system foundation model:

Fast reactive module for low-level control
Slow reasoning planner for high-level decisions

Demonstrated across multiple humanoid platforms with impressive zero-shot transfer.

MuJoCo MPC: Real-Time Whole-Body Control

Real-time whole-body control achieving 100 Hz on humanoid robots with zero-shot sim-to-real transfer. This represents maturation of contact-based model predictive control.

Workshop Highlights

VLA Pipelines for Real Robots

"From Data to Decisions: VLA Pipelines for Real Robots" workshop features:

10,000+ hours of real robot data
Global competition on VLA pipelines
Key takeaways:
- Data quality > data quantity: diverse, well-labeled data beats raw volume
- Evaluation standards emerging: community converging on VLA benchmarks
- Real deployment gaps: latency, safety constraints, hardware limitations remain

Field Robotics Workshop

Focus on agricultural and construction robots — high-potential area in Vietnam. Discussions centered on robust perception in outdoor environments, long-term autonomy, and operation in harsh weather.

Five Practical Takeaways for Vietnamese Engineers

1. VLA Models Production-Ready for Specific Domains

No need to wait — pi-0.5 and GR00T N1 proved generalization in real environments. Start with open-source models and fine-tune for your use case.

2. 3D Perception is Mandatory Investment

PointVLA and Any3D-VLA showed 2D vision alone insufficient for precise manipulation. Add depth sensing (Intel RealSense, Stereolabs ZED) to your pipeline immediately.

3. Cross-Embodiment Reduces Costs

Instead of training policies for each robot type, invest in cross-embodiment approaches. Particularly important when deploying multiple robot types.

4. Safety-First for Fleet Deployment

Control barrier functions and safety-aware navigation are no longer nice-to-have — they're requirements for warehouse deployment. Study them before deployment.

5. Sim-to-Real Pipeline is Competitive Advantage

Invest in NVIDIA Isaac Lab and automated sim-to-real tuning. This multiplies productivity for small robotics teams.

Looking Ahead to RSS 2026

ICRA 2026 momentum continues to RSS 2026 (Sydney, July 13-17), which will push sim-to-real transfer and dexterous manipulation further. Watch for breakthroughs in manipulation policies trained on vision-language models.

ICRA 2026 Vienna: Dominance of Vision-Language-Action Models

Trend 1: VLA Models Everywhere

Trend 2: Sim-to-Real Matures

Sim-to-real is no longer experimental. Papers show:

Domain randomization → 80%+ success on unseen tasks
Automated real-to-sim tuning → reduces manual calibration
Contact-implicit models (MuJoCo) → better accuracy than impulse-based methods

Trend 3: 3D Perception Explosion

LiDAR + neural networks + transformers enable:

Semantic segmentation in cluttered scenes
6D object pose estimation
Real-time 3D reconstruction

Best Paper Contenders

pi-0.5: Open-World Generalization

pi-0.5: a Vision-Language-Action Model with Open-World Generalization — Physical Intelligence, 2025

Result: Robots can clean kitchens, clean bedrooms in entirely new environments — first time an end-to-end learning system achieves this level of generalization.

GR00T N1: Foundation Model for Manipulation

NVIDIA's GR00T N1 (Grounded Robot) is a dual-system foundation model:

Fast reactive module for low-level control
Slow reasoning planner for high-level decisions

Demonstrated across multiple humanoid platforms with impressive zero-shot transfer.

MuJoCo MPC: Real-Time Whole-Body Control

Real-time whole-body control achieving 100 Hz on humanoid robots with zero-shot sim-to-real transfer. This represents maturation of contact-based model predictive control.

Workshop Highlights

VLA Pipelines for Real Robots

"From Data to Decisions: VLA Pipelines for Real Robots" workshop features:

10,000+ hours of real robot data
Global competition on VLA pipelines
Key takeaways:
- Data quality > data quantity: diverse, well-labeled data beats raw volume
- Evaluation standards emerging: community converging on VLA benchmarks
- Real deployment gaps: latency, safety constraints, hardware limitations remain

Field Robotics Workshop

Five Practical Takeaways for Vietnamese Engineers

1. VLA Models Production-Ready for Specific Domains

No need to wait — pi-0.5 and GR00T N1 proved generalization in real environments. Start with open-source models and fine-tune for your use case.

2. 3D Perception is Mandatory Investment

PointVLA and Any3D-VLA showed 2D vision alone insufficient for precise manipulation. Add depth sensing (Intel RealSense, Stereolabs ZED) to your pipeline immediately.

3. Cross-Embodiment Reduces Costs

Instead of training policies for each robot type, invest in cross-embodiment approaches. Particularly important when deploying multiple robot types.

4. Safety-First for Fleet Deployment

Control barrier functions and safety-aware navigation are no longer nice-to-have — they're requirements for warehouse deployment. Study them before deployment.

5. Sim-to-Real Pipeline is Competitive Advantage

Invest in NVIDIA Isaac Lab and automated sim-to-real tuning. This multiplies productivity for small robotics teams.

ICRA 2026 Vienna: Dominance of Vision-Language-Action Models

Trend 1: VLA Models Everywhere

Trend 2: Sim-to-Real Matures

Trend 3: 3D Perception Explosion

Best Paper Contenders

pi-0.5: Open-World Generalization

GR00T N1: Foundation Model for Manipulation

MuJoCo MPC: Real-Time Whole-Body Control

Workshop Highlights

VLA Pipelines for Real Robots

Field Robotics Workshop

Five Practical Takeaways for Vietnamese Engineers

1. VLA Models Production-Ready for Specific Domains

2. 3D Perception is Mandatory Investment

3. Cross-Embodiment Reduces Costs

4. Safety-First for Fleet Deployment

5. Sim-to-Real Pipeline is Competitive Advantage

Looking Ahead to RSS 2026

Related Articles

Nguyễn Anh Tuấn

Related Posts

IROS 2026: Papers navigation và manipulation đáng theo dõi

IROS 2026 Preview: Những gì đáng chờ đợi

RSS 2026: Papers sim-to-real và manipulation đáng đọc

ICRA 2026 Vienna: Dominance of Vision-Language-Action Models

Trend 1: VLA Models Everywhere

Trend 2: Sim-to-Real Matures

Trend 3: 3D Perception Explosion

Best Paper Contenders

pi-0.5: Open-World Generalization

GR00T N1: Foundation Model for Manipulation

MuJoCo MPC: Real-Time Whole-Body Control

Workshop Highlights

VLA Pipelines for Real Robots

Field Robotics Workshop

Five Practical Takeaways for Vietnamese Engineers

1. VLA Models Production-Ready for Specific Domains

2. 3D Perception is Mandatory Investment

3. Cross-Embodiment Reduces Costs

4. Safety-First for Fleet Deployment

5. Sim-to-Real Pipeline is Competitive Advantage

Looking Ahead to RSS 2026

Related Articles

Nguyễn Anh Tuấn

Related Posts

IROS 2026: Papers navigation và manipulation đáng theo dõi

IROS 2026 Preview: Những gì đáng chờ đợi

RSS 2026: Papers sim-to-real và manipulation đáng đọc