Imagine handing a robot a task — "insert this GPU into the motherboard" — and then doing nothing else. No resetting objects, no reviewing logs, no tweaking code. An AI coding agent handles all of that: it watches the robot fail, figures out why, rewrites the control policy, and tries again — until it hits 99% success rate. No human supervision required.
That is exactly what ENPiRE does. Released by NVIDIA GEAR Lab in June 2026, ENPiRE is the first framework to deploy coding agents as fully autonomous robot researchers operating on real hardware — not just simulators.
The Problem: Human Bottleneck in Robot Learning
If you have trained a robot manipulation policy before, you know the cycle: run policy → robot fails → walk over and reset the scene → read the log → tweak the code → run again. Repeat hundreds of times.
The robot is not the bottleneck. The human is:
- Someone must physically reset the scene between trials
- Someone must evaluate which trials succeeded and which failed
- Someone must read the logs and hypothesize what went wrong
- Someone must rewrite training code, adjust reward functions, switch algorithms
This is why robot learning is expensive and slow: every hour of robot runtime typically costs many hours of human engineering time.
ENPiRE flips the question: "What if we handed the entire loop to AI?"
What is ENPiRE? Breaking Down the Acronym
ENPiRE = ENvironment + Policy Improvement + Rollout + Evolution
These four modules form a closed-loop system where coding agents can operate indefinitely without ever needing to ask a human.
Source: NVIDIA GEAR Lab ENPiRE project page
Architecture: Four Modules
1. EN — Environment
The Environment module is the "lab manager" of the system. It handles two critical functions:
Auto-reset: After each robot trial (success or failure), Environment automatically resets the scene to its starting state. For Push-T, this means placing the T-block back at a randomized position. For GPU insertion, it means removing the GPU and laying it flat on the table again.
Verification: Environment also confirms outcomes — using perception to check whether the task was completed. If the GPU is seated in the PCIe slot, it reports success. If it is 5mm off, it reports failure and logs the final state.
Everything here is automatic. There is no human-in-the-loop at this stage.
2. PI — Policy Improvement
This is ENPiRE's "brain." The Policy Improvement module takes in:
- Reward signals from Environment (success/failure + distance metrics)
- Video of recent trials (both successful and failed)
- Failure cases automatically analyzed from logs
And outputs new code — an improved control policy. The agent uses chain-of-thought reasoning to hypothesize failure causes before rewriting code, rather than making random mutations.
Example: if the robot consistently slips when gripping a pin, the agent analyzes the video, identifies that the gripper is closing too early, and rewrites the heuristic to adjust timing.
3. R — Rollout
The Rollout module manages policy execution on real robots. It supports:
- Single rollout: Run one robot, log full state/action/video
- Parallel rollout: Run a fleet of 4 or 8 robots simultaneously to collect data faster
Every trial is logged in detail: joint positions, gripper force, camera feed, timestep, and outcome. These logs feed into the Evolution module.
4. E — Evolution
The Evolution module is the meta-improvement step — instead of only improving the policy, it improves how the policy is improved. Specifically:
- Long-horizon log analysis: Find recurring failure patterns across many iterations
- Literature consultation: The agent can read robotics papers to discover better algorithms
- Infrastructure improvement: Adjust training structure, reward shaping, and data augmentation
This is what separates ENPiRE from a simple auto-tuner: Evolution does not just fix bugs — it rethinks the entire strategy.
Hardware Stack: Three NVIDIA Technologies
ENPiRE is not just software — it is built on NVIDIA's hardware and middleware ecosystem.
YAM Arm
The robot performing manipulation tasks in ENPiRE is the YAM arm (NVIDIA's dexterous manipulator). It is designed for high-precision manipulation — the kind needed for tasks like GPU insertion and pin placement.
SAM 3 — Perception

For the robot to "see" the environment, ENPiRE uses SAM 3 (Segment Anything Model v3). SAM 3 provides real-time object segmentation — the robot knows exactly where each object is and what shape it has. This underlies both the auto-reset verification and the policy inputs.
cuRobo — Motion Planning
cuRobo is NVIDIA's GPU-accelerated motion planning library. When the agent writes policy code, cuRobo handles inverse kinematics and collision-free path planning. Running on GPU, cuRobo computes trajectories in milliseconds — fast enough for real-time control.
The combination of YAM + SAM 3 + cuRobo creates a capable hardware stack for dexterous manipulation — capable enough to tackle GPU insertion, a task that even trained human workers find delicate.
Four Demo Tasks: From Simple to Precise
ENPiRE was tested on four manipulation tasks of increasing difficulty:
1. Push-T (Standard Benchmark)
A classic robot learning benchmark: push a T-shaped block to a target position and orientation. Simple in concept, but requires the robot to learn both position control and rotation strategy.

2. Pin Insertion (Sub-Millimeter Precision)
Insert small pins into precise slots — a task requiring sub-millimeter accuracy. Grip force, approach angle, and gripper closing speed must all be calibrated correctly.

3. Zip-Tie Cutting (Tool Use)
The robot uses scissors to cut a zip-tie — requiring not just grasping but tool-use coordination. The agent must learn to position the scissors correctly on the zip-tie and coordinate cutting force.

4. GPU Insertion (Most Impressive)
The flagship task: insert a GPU card into a PCIe slot on a real motherboard. This requires extremely precise alignment — PCIe slots have a tolerance of only a few millimeters, and the GPU must mate perfectly before being pressed down. A failed attempt risks damaging both the GPU and the board.

Coding Agents Tested: Claude Code, Codex, Kimi
ENPiRE is agent-agnostic — the framework is the infrastructure, and any frontier coding model can be plugged in. Agents tested include:
- Codex (GPT-5.5) — OpenAI's coding model
- Claude Code — Anthropic's coding agent
- Kimi Code — Moonshot AI's coding agent
This means researchers are not locked into one vendor. Swap the agent, keep the framework.
Fleet Scaling: 1, 4, 8 Robots
One of ENPiRE's most compelling experiments is fleet scaling — running multiple robots in parallel to collect data faster.
Results show clear benefits:
- 1 robot: Slower learning, more idle time during resets
- 4 robots: Significantly higher throughput, agent has more data to analyze
- 8 robots: Notable speedup on complex tasks
Two metrics capture the efficiency:
MRU (Mean Robot Utilization) — percentage of time robots are actively running trials (not resetting or idle). Higher MRU means fewer wasted cycles.
MTU (Mean Token Utilization) — efficiency of the agent: how many tokens were used to produce real improvements. Higher MTU means the agent is spending its compute on useful reasoning, not rambling.
Fleet scaling works because ENPiRE's Rollout module supports parallel execution: multiple robots run the same policy, collect data simultaneously, and the Evolution module synthesizes everything before producing the next improvement.
Policy Improvement Methods Explored
A key strength of ENPiRE is that the agent is not locked into one learning algorithm. Methods tested include:
- Heuristic learning — hand-coded rules based on observations
- Behavior Cloning (BC) — learning from successful demonstrations
- Offline RL — training on a collected dataset without additional trials
- Online RL — training in real-time from live feedback during execution
- Tool calling — agent uses external tools (cuRobo, SAM 3) as part of the policy
The agent decides which method to use based on the task and current results — a contrast with traditional frameworks where the researcher must choose an algorithm upfront.
Results: From Zero to 99%
The headline numbers from ENPiRE:
- Pass@8 success rate up to 99% on complex tasks (Push-T, pin insertion, zip-tie, GPU insertion)
- Robots learn from zero (no initial demonstrations) to near-perfect reliability
- Timeline: under 2 hours on some tasks with an 8-robot fleet
"Pass@8" means: out of 8 independent attempts, at least 1 succeeds. This metric originates from software coding benchmarks (like HumanEval for LLMs), and ENPiRE applies it to real robot hardware — an interesting bridge between the software and robotics worlds.
Simulation: RoboCasa
Beyond real robot experiments, ENPiRE was also evaluated in RoboCasa — a popular simulation environment for household manipulation tasks:
- Coffee Setup Mug — placing a mug in a coffee machine
- Open Cabinet — opening a cabinet door
- Open Drawer — opening a drawer
Simulation allows rapid prototyping before moving to hardware, though ENPiRE emphasizes that its primary goal is real-world learning — not sim-to-real transfer.
Why ENPiRE Matters
ENPiRE is significant not because it improves a specific algorithm — it automates the entire research loop. The paradigm shift:
Before ENPiRE:
Human engineer designs policy → tests → debugs → tests again
With ENPiRE:
Coding agent writes policy → runs real robot → analyzes results → improves → repeats (fully autonomous)
This has major implications:
- Scale: More robots + more agents = linear speedup in learning
- Cost: Reduced engineering overhead — humans only need to define the task
- Diversity: Multiple coding agents exploring different strategies in parallel
This points toward autonomous robot research — a future where robot labs are not just places where humans study robots, but where robots study themselves.
To understand the reinforcement learning foundations that underlie ENPiRE's policy improvement methods, see AI Series #1: RL Basics. For the diffusion policy approach the agents may employ, read AI Series #4: Diffusion Policy. And to see what else NVIDIA GEAR Lab has built for humanoid control, check out GEAR-SONIC.
Conclusion
ENPiRE represents a qualitative shift in how robot learning works:
- From "robots learning with human help" → "AI autonomously running robot labs"
- From "researcher chooses algorithm" → "agent tries multiple algorithms"
- From "one robot, one researcher" → "robot fleet, multiple agents in parallel"
As coding agents become more capable and robot hardware becomes more reliable, frameworks like ENPiRE will form the core infrastructure for the next revolution in robotics.
Paper: ENPiRE — Google Drive
Project page: research.nvidia.com/labs/gear/enpire
Institutions: NVIDIA GEAR Lab, Carnegie Mellon University, UC Berkeley



