VnRobo
AboutPricingBlogContact
🇻🇳VISign InStart Free Trial
🇻🇳VI
VnRobo logo

AI infrastructure for next-generation industrial robots.

Product

  • Features
  • Pricing
  • Knowledge Base
  • Services

Company

  • About Us
  • Blog
  • Contact

Legal

  • Privacy Policy
  • Terms of Service

© 2026 VnRobo. All rights reserved.

Made with♥in Vietnam
VnRobo
AboutPricingBlogContact
🇻🇳VISign InStart Free Trial
🇻🇳VI
  1. Home
  2. Blog
  3. ENPiRE: Coding Agents That Teach Robots to Improve
researchnvidiagear-labcoding-agentsrobot-learningmanipulationreinforcement-learningpolicy-improvementautonomous-learningenpire

ENPiRE: Coding Agents That Teach Robots to Improve

NVIDIA GEAR Lab's ENPiRE uses coding agents (Claude Code, Codex, Kimi) to run real robot experiments autonomously, achieving 99% pass rate on dexterous manipulation.

Nguyễn Anh TuấnJune 18, 202610 min read
ENPiRE: Coding Agents That Teach Robots to Improve

Imagine handing a robot a task — "insert this GPU into the motherboard" — and then doing nothing else. No resetting objects, no reviewing logs, no tweaking code. An AI coding agent handles all of that: it watches the robot fail, figures out why, rewrites the control policy, and tries again — until it hits 99% success rate. No human supervision required.

That is exactly what ENPiRE does. Released by NVIDIA GEAR Lab in June 2026, ENPiRE is the first framework to deploy coding agents as fully autonomous robot researchers operating on real hardware — not just simulators.

The Problem: Human Bottleneck in Robot Learning

If you have trained a robot manipulation policy before, you know the cycle: run policy → robot fails → walk over and reset the scene → read the log → tweak the code → run again. Repeat hundreds of times.

The robot is not the bottleneck. The human is:

  • Someone must physically reset the scene between trials
  • Someone must evaluate which trials succeeded and which failed
  • Someone must read the logs and hypothesize what went wrong
  • Someone must rewrite training code, adjust reward functions, switch algorithms

This is why robot learning is expensive and slow: every hour of robot runtime typically costs many hours of human engineering time.

ENPiRE flips the question: "What if we handed the entire loop to AI?"

What is ENPiRE? Breaking Down the Acronym

ENPiRE = ENvironment + Policy Improvement + Rollout + Evolution

These four modules form a closed-loop system where coding agents can operate indefinitely without ever needing to ask a human.

Source: NVIDIA GEAR Lab ENPiRE project page

Architecture: Four Modules

1. EN — Environment

The Environment module is the "lab manager" of the system. It handles two critical functions:

Auto-reset: After each robot trial (success or failure), Environment automatically resets the scene to its starting state. For Push-T, this means placing the T-block back at a randomized position. For GPU insertion, it means removing the GPU and laying it flat on the table again.

Verification: Environment also confirms outcomes — using perception to check whether the task was completed. If the GPU is seated in the PCIe slot, it reports success. If it is 5mm off, it reports failure and logs the final state.

Everything here is automatic. There is no human-in-the-loop at this stage.

2. PI — Policy Improvement

This is ENPiRE's "brain." The Policy Improvement module takes in:

  • Reward signals from Environment (success/failure + distance metrics)
  • Video of recent trials (both successful and failed)
  • Failure cases automatically analyzed from logs

And outputs new code — an improved control policy. The agent uses chain-of-thought reasoning to hypothesize failure causes before rewriting code, rather than making random mutations.

Example: if the robot consistently slips when gripping a pin, the agent analyzes the video, identifies that the gripper is closing too early, and rewrites the heuristic to adjust timing.

3. R — Rollout

The Rollout module manages policy execution on real robots. It supports:

  • Single rollout: Run one robot, log full state/action/video
  • Parallel rollout: Run a fleet of 4 or 8 robots simultaneously to collect data faster

Every trial is logged in detail: joint positions, gripper force, camera feed, timestep, and outcome. These logs feed into the Evolution module.

4. E — Evolution

The Evolution module is the meta-improvement step — instead of only improving the policy, it improves how the policy is improved. Specifically:

  • Long-horizon log analysis: Find recurring failure patterns across many iterations
  • Literature consultation: The agent can read robotics papers to discover better algorithms
  • Infrastructure improvement: Adjust training structure, reward shaping, and data augmentation

This is what separates ENPiRE from a simple auto-tuner: Evolution does not just fix bugs — it rethinks the entire strategy.

Hardware Stack: Three NVIDIA Technologies

ENPiRE is not just software — it is built on NVIDIA's hardware and middleware ecosystem.

YAM Arm

The robot performing manipulation tasks in ENPiRE is the YAM arm (NVIDIA's dexterous manipulator). It is designed for high-precision manipulation — the kind needed for tasks like GPU insertion and pin placement.

SAM 3 — Perception

SAM 3 object segmentation providing perception for ENPiRE
SAM 3 object segmentation providing perception for ENPiRE
Source: NVIDIA GEAR Lab ENPiRE project page

For the robot to "see" the environment, ENPiRE uses SAM 3 (Segment Anything Model v3). SAM 3 provides real-time object segmentation — the robot knows exactly where each object is and what shape it has. This underlies both the auto-reset verification and the policy inputs.

cuRobo — Motion Planning

cuRobo is NVIDIA's GPU-accelerated motion planning library. When the agent writes policy code, cuRobo handles inverse kinematics and collision-free path planning. Running on GPU, cuRobo computes trajectories in milliseconds — fast enough for real-time control.

The combination of YAM + SAM 3 + cuRobo creates a capable hardware stack for dexterous manipulation — capable enough to tackle GPU insertion, a task that even trained human workers find delicate.

Four Demo Tasks: From Simple to Precise

ENPiRE was tested on four manipulation tasks of increasing difficulty:

1. Push-T (Standard Benchmark)

A classic robot learning benchmark: push a T-shaped block to a target position and orientation. Simple in concept, but requires the robot to learn both position control and rotation strategy.

Push-T task in ENPiRE — pushing a T-shaped block to a target location
Push-T task in ENPiRE — pushing a T-shaped block to a target location
Source: NVIDIA GEAR Lab ENPiRE project page

2. Pin Insertion (Sub-Millimeter Precision)

Insert small pins into precise slots — a task requiring sub-millimeter accuracy. Grip force, approach angle, and gripper closing speed must all be calibrated correctly.

Pin insertion task — placing pins into slots with sub-mm precision
Pin insertion task — placing pins into slots with sub-mm precision
Source: NVIDIA GEAR Lab ENPiRE project page

3. Zip-Tie Cutting (Tool Use)

The robot uses scissors to cut a zip-tie — requiring not just grasping but tool-use coordination. The agent must learn to position the scissors correctly on the zip-tie and coordinate cutting force.

Zip-tie cutting task — robot using scissors to cut a zip-tie
Zip-tie cutting task — robot using scissors to cut a zip-tie
Source: NVIDIA GEAR Lab ENPiRE project page

4. GPU Insertion (Most Impressive)

The flagship task: insert a GPU card into a PCIe slot on a real motherboard. This requires extremely precise alignment — PCIe slots have a tolerance of only a few millimeters, and the GPU must mate perfectly before being pressed down. A failed attempt risks damaging both the GPU and the board.

GPU insertion — seating a GPU into a PCIe slot, ENPiRE's hardest task
GPU insertion — seating a GPU into a PCIe slot, ENPiRE's hardest task
Source: NVIDIA GEAR Lab ENPiRE project page

Coding Agents Tested: Claude Code, Codex, Kimi

ENPiRE is agent-agnostic — the framework is the infrastructure, and any frontier coding model can be plugged in. Agents tested include:

  • Codex (GPT-5.5) — OpenAI's coding model
  • Claude Code — Anthropic's coding agent
  • Kimi Code — Moonshot AI's coding agent

This means researchers are not locked into one vendor. Swap the agent, keep the framework.

Fleet Scaling: 1, 4, 8 Robots

One of ENPiRE's most compelling experiments is fleet scaling — running multiple robots in parallel to collect data faster.

Results show clear benefits:

  • 1 robot: Slower learning, more idle time during resets
  • 4 robots: Significantly higher throughput, agent has more data to analyze
  • 8 robots: Notable speedup on complex tasks

Two metrics capture the efficiency:

MRU (Mean Robot Utilization) — percentage of time robots are actively running trials (not resetting or idle). Higher MRU means fewer wasted cycles.

MTU (Mean Token Utilization) — efficiency of the agent: how many tokens were used to produce real improvements. Higher MTU means the agent is spending its compute on useful reasoning, not rambling.

Fleet scaling works because ENPiRE's Rollout module supports parallel execution: multiple robots run the same policy, collect data simultaneously, and the Evolution module synthesizes everything before producing the next improvement.

Policy Improvement Methods Explored

A key strength of ENPiRE is that the agent is not locked into one learning algorithm. Methods tested include:

  1. Heuristic learning — hand-coded rules based on observations
  2. Behavior Cloning (BC) — learning from successful demonstrations
  3. Offline RL — training on a collected dataset without additional trials
  4. Online RL — training in real-time from live feedback during execution
  5. Tool calling — agent uses external tools (cuRobo, SAM 3) as part of the policy

The agent decides which method to use based on the task and current results — a contrast with traditional frameworks where the researcher must choose an algorithm upfront.

Results: From Zero to 99%

The headline numbers from ENPiRE:

  • Pass@8 success rate up to 99% on complex tasks (Push-T, pin insertion, zip-tie, GPU insertion)
  • Robots learn from zero (no initial demonstrations) to near-perfect reliability
  • Timeline: under 2 hours on some tasks with an 8-robot fleet

"Pass@8" means: out of 8 independent attempts, at least 1 succeeds. This metric originates from software coding benchmarks (like HumanEval for LLMs), and ENPiRE applies it to real robot hardware — an interesting bridge between the software and robotics worlds.

Simulation: RoboCasa

Beyond real robot experiments, ENPiRE was also evaluated in RoboCasa — a popular simulation environment for household manipulation tasks:

  • Coffee Setup Mug — placing a mug in a coffee machine
  • Open Cabinet — opening a cabinet door
  • Open Drawer — opening a drawer

Simulation allows rapid prototyping before moving to hardware, though ENPiRE emphasizes that its primary goal is real-world learning — not sim-to-real transfer.

Why ENPiRE Matters

ENPiRE is significant not because it improves a specific algorithm — it automates the entire research loop. The paradigm shift:

Before ENPiRE:

Human engineer designs policy → tests → debugs → tests again

With ENPiRE:

Coding agent writes policy → runs real robot → analyzes results → improves → repeats (fully autonomous)

This has major implications:

  • Scale: More robots + more agents = linear speedup in learning
  • Cost: Reduced engineering overhead — humans only need to define the task
  • Diversity: Multiple coding agents exploring different strategies in parallel

This points toward autonomous robot research — a future where robot labs are not just places where humans study robots, but where robots study themselves.

To understand the reinforcement learning foundations that underlie ENPiRE's policy improvement methods, see AI Series #1: RL Basics. For the diffusion policy approach the agents may employ, read AI Series #4: Diffusion Policy. And to see what else NVIDIA GEAR Lab has built for humanoid control, check out GEAR-SONIC.

Conclusion

ENPiRE represents a qualitative shift in how robot learning works:

  • From "robots learning with human help" → "AI autonomously running robot labs"
  • From "researcher chooses algorithm" → "agent tries multiple algorithms"
  • From "one robot, one researcher" → "robot fleet, multiple agents in parallel"

As coding agents become more capable and robot hardware becomes more reliable, frameworks like ENPiRE will form the core infrastructure for the next revolution in robotics.

Paper: ENPiRE — Google Drive
Project page: research.nvidia.com/labs/gear/enpire
Institutions: NVIDIA GEAR Lab, Carnegie Mellon University, UC Berkeley


Tool recommendations

VLA train/deploy stack

Train on cloud/workstation, then deploy optimized models to Jetson or the robot computer.

Cloud GPU for VLA / policy training Use for imitation learning, diffusion policies, RL, and robotics model fine-tuning. View cloud GPU → NVIDIA Jetson Orin NX / Orin Nano Edge deployment hardware for perception, logging, and optimized inference. View Jetson → Hugging Face / robotics dataset hosting Host datasets, checkpoints, and model cards for cleaner LeRobot/VLA workflows. View platform →

Related Posts

  • GEAR-SONIC: Whole-Body Control for Humanoid Robots
  • AI Series #1: Reinforcement Learning for Robotics
  • GigaBrain-0: World Model + RL for Robot Manipulation
NT

Nguyễn Anh Tuấn

Robotics & AI Engineer. Building VnRobo — sharing knowledge about robot learning, VLA models, and automation.

Khám phá VnRobo

Fleet MonitoringROS 2 IntegrationAMR Solutions

Related Posts

Deep Dive
RISE: VLA tự cải thiện bằng imagination
vlaworld-modelreinforcement-learning
research

RISE: VLA tự cải thiện bằng imagination

RISE dùng compositional world model để cho VLA robot học RL trong imagination, giảm tương tác thật nhưng vẫn tăng mạnh kết quả manipulation.

6/6/202613 min read
NT
NEWResearch
WholeBodyVLA: video egocentric + RL loco-manipulation
manipulationvlawhole-bodyPart 5
research

WholeBodyVLA: video egocentric + RL loco-manipulation

WholeBodyVLA học từ video egocentric action-free và nối VLA cấp cao với RL controller cấp thấp cho whole-body loco-manipulation. Phân tích paper ICLR 2026, test trên AgiBot X2.

6/14/20267 min read
NT
Research
Robot Ping-Pong DeepMind: AI Đánh Bóng Bàn Trình Độ Người
ai-perceptionreinforcement-learningmanipulation
research

Robot Ping-Pong DeepMind: AI Đánh Bóng Bàn Trình Độ Người

Cách Google DeepMind xây dựng robot đánh bóng bàn trình độ nghiệp dư bằng Reinforcement Learning, kiến trúc phân cấp LLC-HLC, và kỹ thuật sim-to-real zero-shot.

4/22/202612 min read
NT
VnRobo logo

AI infrastructure for next-generation industrial robots.

Product

  • Features
  • Pricing
  • Knowledge Base
  • Services

Company

  • About Us
  • Blog
  • Contact

Legal

  • Privacy Policy
  • Terms of Service

© 2026 VnRobo. All rights reserved.

Made with♥in Vietnam