simulationsimulationnvidiaphysics-enginegpusim-to-realisaac-labrobotics

NVIDIA Newton 1.0: GPU Physics 475x Faster Than MJX

Hands-on guide to NVIDIA Newton 1.0 — the fastest open-source GPU physics engine for sim-to-real robotics, 475x faster than MJX.

Nguyễn Anh Tuấn17 tháng 4, 202611 phút đọc
NVIDIA Newton 1.0: GPU Physics 475x Faster Than MJX

If you've ever sat watching a policy training run grind through hours — or even days — NVIDIA Newton 1.0 was built exactly to fix that. This open-source physics engine achieves 475x faster simulation than MJX (Google DeepMind's JAX physics engine) on NVIDIA GPUs, collapsing training time from days to minutes.

This guide takes you from zero to working code: what Newton is, how it's architected, installation, and how to use it for real sim-to-real robotics workflows.

What is Newton 1.0?

Newton is a GPU-accelerated, open-source physics engine developed jointly by NVIDIA, Google DeepMind, and Disney Research, managed under the Linux Foundation for long-term neutrality.

Key distinctions from prior tools:

  • Not a replacement for Isaac Lab or Warp — Newton is built on top of NVIDIA Warp as a standalone physics backend
  • Multi-solver architecture — different solvers for different physics domains, all in one API
  • Differentiable physics — full backpropagation through the simulation graph, enabling gradient-based policy learning
  • Sim-to-real validated — policies trained in Newton have been successfully deployed on real G1 robots

The project was announced at GTC 2026 and immediately made waves across the robotics research community.

Why 475x Faster Than MJX?

To understand the significance, you need to know what MJX is. MJX (MuJoCo eXtended) is Google DeepMind's JAX-based version of MuJoCo — excellent for accuracy and differentiation, but not optimally tuned for GPU parallelism at scale.

Newton achieves its speed advantage through three core mechanisms:

1. Warp Foundation: Newton uses NVIDIA Warp, a CUDA-X acceleration library that lets you write GPU kernels in pure Python without hand-coding CUDA. Warp automatically optimizes memory access patterns and kernel fusion.

2. Batched Simulation: Instead of running one environment at a time, Newton runs thousands of environments in parallel on the GPU. A single RTX 4090 can sustain 50,000+ simultaneous environments.

3. MuJoCo Warp Solver: The core differentiator — a GPU-native reimplementation of the MuJoCo solver, not a simple port. It was rewritten from the ground up to exploit GPU parallelism at the kernel level.

Concrete benchmarks (vs MJX):

Task RTX PRO 6000 Blackwell GeForce RTX 4090
Manipulation 475x faster 313x faster
Locomotion 252x faster 152x faster

If a locomotion policy takes 8 hours to train on MJX, Newton completes it in ~3 minutes on an RTX 4090, or ~2 minutes on the RTX PRO 6000.

GPU cluster processing robot simulation workloads at high speed

Newton Architecture

Newton uses a modular multi-solver architecture, matching the right solver to each physics domain:

Newton Physics Engine
├── MuJoCo Warp Solver    ← rigid body dynamics (primary)
├── XPBD Solver           ← deformable materials (rubber, foam)
├── VBD Solver            ← cloth, cable (vertex block descent)
├── MPM Solver            ← granular (sand, powder)
├── Featherstone Solver   ← alternative rigid body
└── Semi-Implicit Solver  ← numerical integration

Why does multi-solver matter? Real-world robotics isn't just rigid bodies. Cable routing in robot arms, fabric in laundry folding tasks, powder in food processing — each domain needs a specialized solver. Newton unifies all of them under one API.

Differentiable Physics Stack:

RL Policy (PyTorch/JAX)
       ↕ gradients
Newton Physics (Warp)
       ↕ gradients  
CUDA Kernels

Gradient propagation through simulation unlocks:

  • Direct trajectory optimization
  • System identification (learn robot parameters from real data)
  • End-to-end policy gradient computation without surrogate models

Installation

System Requirements

  • GPU: NVIDIA Maxwell or newer (GTX 900 series+)
  • Driver: 545+ (CUDA 12)
  • Python: 3.8+
  • No separate CUDA Toolkit needed — Warp handles it

Method 1: pip (quickest)

pip install "newton[examples]"

# Verify with the first built-in example
python -m newton.examples basic_pendulum
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone repo
git clone https://github.com/newton-physics/newton.git
cd newton

# Run without manually activating a venv
uv run -m newton.examples basic_pendulum --viewer null

Method 3: Development setup (for research)

# MuJoCo pre-release (required for MuJoCo Warp solver)
pip install mujoco --pre -f https://py.mujoco.org/

# Warp and MuJoCo Warp
pip install warp-lang mujoco_warp

# Newton itself
pip install newton-physics

Verify your install

import newton
import warp as wp

print(f"Newton version: {newton.__version__}")
print(f"Warp version: {wp.__version__}")
print(f"CUDA available: {wp.context.runtime.core.is_cuda_available()}")

Hello World: Pendulum Simulation

At its lowest level, Newton revolves around two concepts: Model (robot/environment structure) and Solver (the physics computation engine).

import newton
import warp as wp
import numpy as np

# Initialize Warp runtime
wp.init()

# Build model
builder = newton.ModelBuilder()

# Add a rigid body (pendulum bob)
body = builder.add_body(
    origin=wp.transform([0.0, 1.0, 0.0], wp.quat_identity()),
    armature=0.01
)

# Add shape to body
builder.add_shape_sphere(
    body=body,
    radius=0.05,
    density=1000.0
)

# Add revolute joint at world origin
builder.add_joint_revolute(
    parent=-1,   # -1 = world frame
    child=body,
    axis=wp.vec3(0.0, 0.0, 1.0),  # rotate around Z axis
)

# Finalize model on GPU
model = builder.finalize(device="cuda")

# Create solver
solver = newton.MuJoCoWarpSolver(model)

# Create double-buffered states
state_0 = model.state()
state_1 = model.state()

# Set initial conditions
state_0.joint_q  = wp.array([np.pi / 4], dtype=wp.float32, device="cuda")  # 45 degrees
state_0.joint_qd = wp.array([0.0],       dtype=wp.float32, device="cuda")  # zero velocity

# Run simulation
dt = 1.0 / 60.0
for step in range(240):   # 4 seconds at 60Hz
    solver.step(model, state_0, state_1, dt)
    state_0, state_1 = state_1, state_0  # swap buffers

print(f"Final joint angle: {state_0.joint_q.numpy()[0]:.4f} rad")

Batched Environments: Newton's Real Power

What makes Newton fast isn't running one simulation faster — it's running thousands of simulations simultaneously. This is the core pattern for RL training:

import newton
import warp as wp
import numpy as np

NUM_ENVS = 4096  # 4096 parallel environments

def create_batch_envs(num_envs: int):
    """Create num_envs environments with different initial conditions."""
    builder = newton.ModelBuilder()
    
    for i in range(num_envs):
        # Offset each env so they don't overlap visually
        x_offset = (i % 64) * 2.0
        z_offset = (i // 64) * 2.0
        
        body = builder.add_body(
            origin=wp.transform(
                [x_offset, 1.0, z_offset],
                wp.quat_identity()
            )
        )
        builder.add_shape_sphere(body=body, radius=0.05, density=1000.0)
        builder.add_joint_revolute(parent=-1, child=body, axis=wp.vec3(0, 0, 1))
    
    return builder.finalize(device="cuda")

model = create_batch_envs(NUM_ENVS)
solver = newton.MuJoCoWarpSolver(model)

state_0 = model.state()
state_1 = model.state()

# Random initial angles for all environments
random_angles = np.random.uniform(-np.pi, np.pi, NUM_ENVS).astype(np.float32)
state_0.joint_q = wp.array(random_angles, dtype=wp.float32, device="cuda")

# Step ALL 4096 environments in a single CUDA kernel call
import time
start = time.time()
for _ in range(1000):
    solver.step(model, state_0, state_1, dt=1/60)
    state_0, state_1 = state_1, state_0
elapsed = time.time() - start

print(f"1000 steps × {NUM_ENVS} envs = {NUM_ENVS * 1000:,} sim steps")
print(f"Time: {elapsed:.2f}s")
print(f"Throughput: {NUM_ENVS * 1000 / elapsed:,.0f} steps/sec")

On an RTX 4090, this achieves roughly 50 million simulation steps per second — compared to ~50,000 steps/sec on single-threaded MuJoCo CPU (1000x difference).

Sim-to-Real with Isaac Lab + Newton

Newton integrates natively with NVIDIA Isaac Lab 3.0 as a swappable physics backend. The recommended workflow:

Isaac Lab Environment
    ↕ (physics backend)
Newton Solver
    ↓ (after training)
PhysX validation  ← verify policy works on a different engine
    ↓ (if passing)
Real Robot Deploy

Configure Isaac Lab with Newton backend

# Install Isaac Lab (requires Isaac Sim 4.5+)
pip install isaacsim --pre -f https://pypi.nvidia.com/isaacsim

# Enable Newton backend
pip install "isaacsim[newton]"
from isaaclab.envs import DirectRLEnv, DirectRLEnvCfg
from isaaclab.sim import SimulationCfg

# Swap physics backend to Newton — API unchanged
sim_cfg = SimulationCfg(
    physics_engine="newton",   # instead of "physx"
    dt=1.0 / 200,              # 200Hz physics
    gravity=(0.0, 0.0, -9.81),
    device="cuda:0"
)

Training a quadruped locomotion policy

# Train a G1 humanoid to walk stably (simplified)
from isaaclab_tasks.locomotion.velocity.config.g1 import G1FlatEnvCfg_NEWTON

env_cfg = G1FlatEnvCfg_NEWTON()
env_cfg.scene.num_envs = 4096
env_cfg.sim.physics_engine = "newton"

# With Newton backend:
# - 4096 envs run in parallel on GPU
# - Training reaches ~500M steps/hour
# - Policy converges in ~30 min instead of 8+ hours

Validated sim-to-real results

NVIDIA has verified that Newton-trained policies transfer successfully to hardware:

  • G1 Robot (Unitree): Locomotion policy trained in Newton → deployed on real robot
  • Engine cross-transfer: Policy trained on Newton → tested on PhysX → equivalent performance
  • Reverse transfer: PhysX → Newton also works

This cross-engine validation is critical: it demonstrates that Newton doesn't sacrifice physical accuracy for speed. The simulated dynamics are faithful enough to cross the sim-to-real gap.

Robot arm performing precise manipulation tasks with deformable object handling

Deformable Objects: VBD Solver

A particularly strong feature of Newton is handling deformable objects — something most physics engines either skip or approximate poorly.

import newton
import warp as wp
import numpy as np

# Setup cloth simulation with VBD solver
builder = newton.ModelBuilder()
builder.set_gravity(0.0, -9.81, 0.0)

# Add a 16x16 cloth grid
builder.add_cloth_grid(
    pos=(0.0, 2.0, 0.0),
    rot=wp.quat_from_axis_angle((1.0, 0.0, 0.0), -np.pi * 0.5),
    vel=(0.0, 0.0, 0.0),
    dim_x=16,      # 16x16 grid = 256 vertices
    dim_y=16,
    cell_x=0.05,   # 5cm cell size
    cell_y=0.05,
    mass=0.1,      # 100g total mass
    fix_left=True  # Pin the left edge
)

model = builder.finalize(device="cuda")

# VBD integrator for deformables
integrator = newton.VBDIntegrator(model, iterations=10)
state_0 = model.state()
state_1 = model.state()

# Simulate cloth falling and draping
for _ in range(300):
    integrator.simulate(model, state_0, state_1, dt=1/60)
    state_0, state_1 = state_1, state_0

Real-world applications:

  • Samsung uses Newton's VBD solver to train robots for cable routing in refrigerator assembly lines
  • Skild AI uses it to generate synthetic training data for GPU rack assembly tasks (connector insertion, board placement)

Newton vs Other Physics Engines

MuJoCo MJX Isaac Gym Newton
Backend CPU JAX/GPU PyTorch/GPU Warp/GPU
Speed Baseline ~2-5x ~50-100x ~150-475x
Differentiable
Deformables Limited ✅ (VBD, MPM)
Multi-solver
Open source
Sim-to-real validated Partial
Isaac Lab native Via MJCF No Legacy

When to use Newton:

  • Large-scale RL training (1000+ parallel environments)
  • Need deformable objects (cable, cloth, food materials)
  • Want differentiable physics for trajectory optimization
  • Building sim-to-real pipelines on Isaac Lab

When to stick with CPU MuJoCo:

  • Quick debugging with interactive viewer
  • Research requiring strict reproducibility across machines
  • No GPU available
  • Simple tasks with few environments

Common Pitfalls

1. GPU memory is the real bottleneck:

# Don't over-provision num_envs
# Rule of thumb: ~1MB GPU memory per environment
# RTX 4090 has 24GB → ~24,000 envs safely
# Leave headroom — use 50-70% capacity
NUM_ENVS = 16384  # Safe on RTX 4090
NUM_ENVS = 65536  # Will OOM

2. Timestep stability:

# Newton (like MuJoCo) is sensitive to large dt
# Contact-rich tasks need smaller dt
dt = 1 / 200   # Fine for locomotion
dt = 1 / 500   # Better for contact-rich manipulation
dt = 1 / 60    # Only for simple, low-contact tasks

3. MJCF compatibility: Newton supports loading MJCF files (MuJoCo's XML format), but not 100% of MuJoCo's advanced features. Always test your specific robot model before committing to Newton.

4. Pin dependency versions:

# Newton has tight Warp version dependencies
# Check compatibility matrix at:
# https://github.com/newton-physics/newton#requirements
pip install newton-physics==1.0.0 warp-lang==1.5.0

Conclusion

NVIDIA Newton 1.0 is a genuine leap for sim-to-real robotics — not an incremental improvement but a paradigm shift. Training time collapses from days to minutes, enabling much tighter iteration loops between simulation and hardware testing.

The most significant achievement isn't raw speed — it's that Newton converges two things you previously had to trade off: MuJoCo-level accuracy and GPU-level throughput. With real-world validation on hardware robots and industrial deployments at companies like Samsung and Skild AI, this is no longer a research prototype.

If you're building a sim-to-real pipeline today, Newton + Isaac Lab is the combination worth evaluating first.

Resources

NT

Nguyễn Anh Tuấn

Robotics & AI Engineer. Building VnRobo — sharing knowledge about robot learning, VLA models, and automation.

Bài viết liên quan

NEWTutorial
Fine-Tune GR00T N1.6 với Cosmos Reason 2
grootnvidiavlacosmosfine-tuninghumanoidisaac

Fine-Tune GR00T N1.6 với Cosmos Reason 2

Hướng dẫn chi tiết fine-tune NVIDIA GR00T N1.6 — VLA model 3B tham số kết hợp Cosmos Reason 2 để điều khiển humanoid robot từ ảnh và ngôn ngữ.

15/4/202611 phút đọc
NEWDeep Dive
Task Planning cho Manipulation trên Jetson Edge
jetsonmanipulationtask-planningedge-ainvidiacuTAMPcuMotionTensorRT

Task Planning cho Manipulation trên Jetson Edge

Hướng dẫn triển khai task planning cho robot manipulation trên NVIDIA Jetson AGX Orin 64GB — từ cuTAMP, cuMotion đến VLM inference.

14/4/202615 phút đọc
NEWTutorial
GEAR-SONIC: Whole-Body Control cho Humanoid Robot
humanoidwhole-body-controlnvidiareinforcement-learningmotion-trackingvr-teleoperationisaac-lab

GEAR-SONIC: Whole-Body Control cho Humanoid Robot

Hướng dẫn chi tiết GEAR-SONIC của NVIDIA — huấn luyện whole-body controller cho humanoid robot với dataset BONES-SEED và VR teleoperation.

13/4/202612 phút đọc