Whole-Body MPC: Real-Time Full-Body Control

Why MPC for Humanoids?

In Part 2, we learned about IK, operational space control, and QP-based whole-body control. But these methods are all reactive — they only look at current state to make decisions.

Model Predictive Control (MPC) is different: it looks into the future. MPC solves an optimization problem over a horizon (e.g., 0.5-1 second), calculates the best sequence of actions, executes the first step, then repeats. This is especially important for humanoids because:

Contact planning: MPC can plan when to place feet, when to lift feet
Balance recovery: Look ahead to prevent balance loss
Whole-body coordination: Optimize locomotion and manipulation simultaneously

The most important recent paper: "Whole-Body Model-Predictive Control of Legged Robots with MuJoCo" (arXiv:2503.04613) — from John Z. Zhang (MIT), Taylor Howell, Yuval Tassa (Google DeepMind), and Guanya Shi (CMU). This paper proves MPC with MuJoCo can run real-time on real humanoids with zero-shot sim-to-real.

Model Predictive Control planning horizon for humanoid locomotion

MPC — Basic Theory

Optimization Problem

MPC solves this problem at each timestep:

minimize: sum_{t=0}^{T-1} [ l(x_t, u_t) ] + l_T(x_T)
subject to:
    x_{t+1} = f(x_t, u_t)       (dynamics)
    x_0 = x_current              (initial state)
    u_min <= u_t <= u_max        (actuator limits)
    h(x_t, u_t) >= 0             (constraints: friction, joint limits, ...)

Where:

x_t — state (joint positions, velocities)
u_t — control (joint torques or accelerations)
f — dynamics model
l — cost function (tracking error, energy, ...)
T — prediction horizon (lookahead steps)

Receding Horizon

MPC repeats every control cycle (e.g., every 1-10 ms):

Measure current state x_0
Solve optimization problem over horizon T
Execute only u_0 (first step)
Return to step 1

This creates a natural feedback loop — if disturbed (e.g., pushed), MPC automatically adjusts the plan.

iLQR — Optimization Algorithm for MPC

iterative Linear Quadratic Regulator (iLQR) is the most commonly used algorithm for MPC on humanoids. It's the nonlinear version of LQR.

iLQR Algorithm — Iterative Linear Quadratic Regulator

iLQR approximates nonlinear problems by:

Forward pass: Simulate trajectory with current control
Backward pass: Linearize dynamics around trajectory, solve Riccati for feedback gains
Forward pass: Update trajectory with new gains
Repeat until convergence

The beauty of iLQR is that it naturally handles contact-implicit control — the forward simulation automatically computes contact dynamics (using MuJoCo's convex optimization solver), and the backward pass derives optimal gains that respect these constraints.

import numpy as np
import mujoco

class iLQR_MPC:
    def __init__(self, model, horizon=50, dt=0.01):
        self.model = model
        self.T = horizon
        self.dt = dt
        self.nx = model.nq + model.nv  # state dimension
        self.nu = model.nu              # control dimension
        
    def rollout(self, x0, U):
        """Forward simulate with control sequence U."""
        data = mujoco.MjData(self.model)
        X = [x0.copy()]
        
        for t in range(self.T):
            data.qpos[:] = X[-1][:self.model.nq]
            data.qvel[:] = X[-1][self.model.nq:]
            data.ctrl[:] = U[t]
            mujoco.mj_step(self.model, data)
            x_next = np.concatenate([data.qpos.copy(), data.qvel.copy()])
            X.append(x_next)
        
        return X
    
    def solve(self, x0, x_ref, U_init, Q, R, Qf, max_iter=10):
        """Run iLQR MPC."""
        self.x_ref = x_ref
        U = [u.copy() for u in U_init]
        
        for iteration in range(max_iter):
            # Forward rollout
            X = self.rollout(x0, U)
            
            # Compute cost
            cost = self._compute_cost(X, U, Q, R, Qf)
            
            # Update (simplified)
            if iteration > 0 and cost >= prev_cost:
                break  # Converged
            
            prev_cost = cost
        
        return U[0], X  # Return first control and trajectory
    
    def _compute_cost(self, X, U, Q, R, Qf):
        """Compute total cost."""
        cost = 0
        for t in range(self.T):
            x_err = X[t] - self.x_ref
            cost += x_err.T @ Q @ x_err
            cost += U[t].T @ R @ U[t]
        x_final_err = X[-1] - self.x_ref
        cost += x_final_err.T @ Qf @ x_final_err
        return cost

MuJoCo MPC — Official Implementation

Paper arXiv:2503.04613 uses MuJoCo as the dynamics model directly in the iLQR loop. Key features:

Finite-difference derivatives: Instead of analytical derivatives (complex with contact), use finite differences on MuJoCo simulator. Fast because MuJoCo steps are very fast.
Contact-implicit: No explicit contact schedule needed — MuJoCo automatically handles contact in forward simulation.
Real-time: Achieves ~100 Hz control rate on real hardware (Unitree H1, quadrupeds).

Official code: github.com/johnzhang3/mujoco_mpc_deploy

Contact Dynamics — The Biggest Challenge

Contact (between feet and ground) is the hardest part of humanoid dynamics. Three states exist:

No contact: Foot in air, no contact forces
Sliding contact: Foot slides on ground, friction
Sticking contact: Foot adheres to ground, static friction

Friction Cone

Contact forces are bounded by friction cone:

||f_tangential|| <= mu * f_normal

In MPC, the friction cone is approximated as a pyramid to convert it to linear constraints.

Contact dynamics and friction cone in humanoid locomotion

Real-Time Constraints

To run MPC real-time on humanoids (100-1000 Hz control loop):

1. Reduce Horizon

Instead of 100 steps, use 20-50 with larger timesteps. Trade-off: shorter look-ahead but less accuracy.

2. Warm-starting

Use the previous solution as starting point for the current one. Since state doesn't change much between cycles, warm-start helps converge faster (usually 1-3 iLQR iterations instead of 10+).

def mpc_loop(model, x0, x_ref, Q, R, Qf):
    """MPC loop with warm-starting."""
    T = 30  # horizon
    U_prev = [np.zeros(model.nu) for _ in range(T)]
    
    while True:
        x_current = get_state()  # Read from sensors
        
        # Warm-start: shift previous solution
        U_init = U_prev[1:] + [U_prev[-1]]
        
        # Solve MPC (only 2-3 iterations)
        mpc = iLQR_MPC(model, horizon=T)
        u_opt, K = mpc.solve(x_current, x_ref, U_init, Q, R, Qf, max_iter=3)
        
        # Apply control
        apply_torque(u_opt)
        
        # Save for next warm-start
        U_prev = U_init

3. GPU Acceleration

MuJoCo MJX (JAX backend) and MJX-Warp (NVIDIA GPU) run dynamics on GPU. Especially useful for sampling-based MPC (e.g., MPPI) — sample thousands of trajectories in parallel.

MPC vs QP vs RL

Criterion	QP (Part 2)	MPC (iLQR)	RL (Part 4)
Look-ahead	No (reactive)	Yes (0.5-1s horizon)	Yes (learned)
Contact planning	Needs explicit schedule	Contact-implicit	Learned
Computation	~1 ms	~5-50 ms	~0.1 ms (inference)
Robustness	Medium	Good (replans)	Best
Domain transfer	Good (model-based)	Good (model-based)	Needs sim-to-real
Dexterity	Good	Good	Best
Setup effort	Medium	High	High (training)

Hybrid MPC + RL

Latest trend: combine both:

RL learns rough policy (robust, fast)
MPC refines real-time (precise, optimal)

Example: RL for locomotion, MPC for fine manipulation. Or RL provides warm-start for MPC.

Learning Resources

Code: github.com/johnzhang3/mujoco_mpc_deploy — Real-world MPC deployment
Paper: arXiv:2503.04613 — Whole-Body MPC with MuJoCo
MuJoCo MPC: github.com/google-deepmind/mujoco_mpc — Official framework
Tutorial: REx Lab project page

Next in This Series

Part 2: Humanoid Control Basics: IK to Task-Space Control
Part 4: RL for Humanoid: From Humanoid-Gym to Sim2Real — Reinforcement learning approach
Part 5: Loco-Manipulation: Walking and Manipulating Together

Simulation for Robotics: MuJoCo vs Isaac Sim vs Gazebo — Simulator comparison
Getting Started with MuJoCo — Hands-on MuJoCo tutorial
RL for Bipedal Walking — RL for two-legged robots
Sim-to-Real Transfer — Moving from simulation to real hardware

Why MPC for Humanoids?

In Part 2, we learned about IK, operational space control, and QP-based whole-body control. But these methods are all reactive — they only look at current state to make decisions.

Contact planning: MPC can plan when to place feet, when to lift feet
Balance recovery: Look ahead to prevent balance loss
Whole-body coordination: Optimize locomotion and manipulation simultaneously

Model Predictive Control planning horizon for humanoid locomotion

MPC — Basic Theory

Optimization Problem

MPC solves this problem at each timestep:

minimize: sum_{t=0}^{T-1} [ l(x_t, u_t) ] + l_T(x_T)
subject to:
    x_{t+1} = f(x_t, u_t)       (dynamics)
    x_0 = x_current              (initial state)
    u_min <= u_t <= u_max        (actuator limits)
    h(x_t, u_t) >= 0             (constraints: friction, joint limits, ...)

Where:

x_t — state (joint positions, velocities)
u_t — control (joint torques or accelerations)
f — dynamics model
l — cost function (tracking error, energy, ...)
T — prediction horizon (lookahead steps)

Receding Horizon

MPC repeats every control cycle (e.g., every 1-10 ms):

Measure current state x_0
Solve optimization problem over horizon T
Execute only u_0 (first step)
Return to step 1

This creates a natural feedback loop — if disturbed (e.g., pushed), MPC automatically adjusts the plan.

iLQR — Optimization Algorithm for MPC

iterative Linear Quadratic Regulator (iLQR) is the most commonly used algorithm for MPC on humanoids. It's the nonlinear version of LQR.

iLQR Algorithm — Iterative Linear Quadratic Regulator

iLQR approximates nonlinear problems by:

Forward pass: Simulate trajectory with current control
Backward pass: Linearize dynamics around trajectory, solve Riccati for feedback gains
Forward pass: Update trajectory with new gains
Repeat until convergence

import numpy as np
import mujoco

class iLQR_MPC:
    def __init__(self, model, horizon=50, dt=0.01):
        self.model = model
        self.T = horizon
        self.dt = dt
        self.nx = model.nq + model.nv  # state dimension
        self.nu = model.nu              # control dimension
        
    def rollout(self, x0, U):
        """Forward simulate with control sequence U."""
        data = mujoco.MjData(self.model)
        X = [x0.copy()]
        
        for t in range(self.T):
            data.qpos[:] = X[-1][:self.model.nq]
            data.qvel[:] = X[-1][self.model.nq:]
            data.ctrl[:] = U[t]
            mujoco.mj_step(self.model, data)
            x_next = np.concatenate([data.qpos.copy(), data.qvel.copy()])
            X.append(x_next)
        
        return X
    
    def solve(self, x0, x_ref, U_init, Q, R, Qf, max_iter=10):
        """Run iLQR MPC."""
        self.x_ref = x_ref
        U = [u.copy() for u in U_init]
        
        for iteration in range(max_iter):
            # Forward rollout
            X = self.rollout(x0, U)
            
            # Compute cost
            cost = self._compute_cost(X, U, Q, R, Qf)
            
            # Update (simplified)
            if iteration > 0 and cost >= prev_cost:
                break  # Converged
            
            prev_cost = cost
        
        return U[0], X  # Return first control and trajectory
    
    def _compute_cost(self, X, U, Q, R, Qf):
        """Compute total cost."""
        cost = 0
        for t in range(self.T):
            x_err = X[t] - self.x_ref
            cost += x_err.T @ Q @ x_err
            cost += U[t].T @ R @ U[t]
        x_final_err = X[-1] - self.x_ref
        cost += x_final_err.T @ Qf @ x_final_err
        return cost

MuJoCo MPC — Official Implementation

Paper arXiv:2503.04613 uses MuJoCo as the dynamics model directly in the iLQR loop. Key features:

Finite-difference derivatives: Instead of analytical derivatives (complex with contact), use finite differences on MuJoCo simulator. Fast because MuJoCo steps are very fast.
Contact-implicit: No explicit contact schedule needed — MuJoCo automatically handles contact in forward simulation.
Real-time: Achieves ~100 Hz control rate on real hardware (Unitree H1, quadrupeds).

Official code: github.com/johnzhang3/mujoco_mpc_deploy

Contact Dynamics — The Biggest Challenge

Contact (between feet and ground) is the hardest part of humanoid dynamics. Three states exist:

No contact: Foot in air, no contact forces
Sliding contact: Foot slides on ground, friction
Sticking contact: Foot adheres to ground, static friction

Friction Cone

Contact forces are bounded by friction cone:

||f_tangential|| <= mu * f_normal

In MPC, the friction cone is approximated as a pyramid to convert it to linear constraints.

Contact dynamics and friction cone in humanoid locomotion

Real-Time Constraints

To run MPC real-time on humanoids (100-1000 Hz control loop):

1. Reduce Horizon

Instead of 100 steps, use 20-50 with larger timesteps. Trade-off: shorter look-ahead but less accuracy.

2. Warm-starting

Use the previous solution as starting point for the current one. Since state doesn't change much between cycles, warm-start helps converge faster (usually 1-3 iLQR iterations instead of 10+).

def mpc_loop(model, x0, x_ref, Q, R, Qf):
    """MPC loop with warm-starting."""
    T = 30  # horizon
    U_prev = [np.zeros(model.nu) for _ in range(T)]
    
    while True:
        x_current = get_state()  # Read from sensors
        
        # Warm-start: shift previous solution
        U_init = U_prev[1:] + [U_prev[-1]]
        
        # Solve MPC (only 2-3 iterations)
        mpc = iLQR_MPC(model, horizon=T)
        u_opt, K = mpc.solve(x_current, x_ref, U_init, Q, R, Qf, max_iter=3)
        
        # Apply control
        apply_torque(u_opt)
        
        # Save for next warm-start
        U_prev = U_init

3. GPU Acceleration

MuJoCo MJX (JAX backend) and MJX-Warp (NVIDIA GPU) run dynamics on GPU. Especially useful for sampling-based MPC (e.g., MPPI) — sample thousands of trajectories in parallel.

MPC vs QP vs RL

Criterion	QP (Part 2)	MPC (iLQR)	RL (Part 4)
Look-ahead	No (reactive)	Yes (0.5-1s horizon)	Yes (learned)
Contact planning	Needs explicit schedule	Contact-implicit	Learned
Computation	~1 ms	~5-50 ms	~0.1 ms (inference)
Robustness	Medium	Good (replans)	Best
Domain transfer	Good (model-based)	Good (model-based)	Needs sim-to-real
Dexterity	Good	Good	Best
Setup effort	Medium	High	High (training)

Hybrid MPC + RL

Latest trend: combine both:

RL learns rough policy (robust, fast)
MPC refines real-time (precise, optimal)

Example: RL for locomotion, MPC for fine manipulation. Or RL provides warm-start for MPC.

Learning Resources

Code: github.com/johnzhang3/mujoco_mpc_deploy — Real-world MPC deployment
Paper: arXiv:2503.04613 — Whole-Body MPC with MuJoCo
MuJoCo MPC: github.com/google-deepmind/mujoco_mpc — Official framework
Tutorial: REx Lab project page

Next in This Series

Part 2: Humanoid Control Basics: IK to Task-Space Control
Part 4: RL for Humanoid: From Humanoid-Gym to Sim2Real — Reinforcement learning approach
Part 5: Loco-Manipulation: Walking and Manipulating Together

Simulation for Robotics: MuJoCo vs Isaac Sim vs Gazebo — Simulator comparison
Getting Started with MuJoCo — Hands-on MuJoCo tutorial
RL for Bipedal Walking — RL for two-legged robots
Sim-to-Real Transfer — Moving from simulation to real hardware

Why MPC for Humanoids?

MPC — Basic Theory

Optimization Problem

Receding Horizon

iLQR — Optimization Algorithm for MPC

iLQR Algorithm — Iterative Linear Quadratic Regulator

MuJoCo MPC — Official Implementation

Contact Dynamics — The Biggest Challenge

Friction Cone

Real-Time Constraints

1. Reduce Horizon

2. Warm-starting

3. GPU Acceleration

MPC vs QP vs RL

Hybrid MPC + RL

Learning Resources

Next in This Series

Related Articles

Nguyễn Anh Tuấn

Related Posts

Humanoid control cơ bản: IK đến task-space control

MPC-RL: Huấn luyện Humanoid bằng MPC

Tương lai Humanoid: Cơ hội cho kỹ sư robotics

Why MPC for Humanoids?

MPC — Basic Theory

Optimization Problem

Receding Horizon

iLQR — Optimization Algorithm for MPC

iLQR Algorithm — Iterative Linear Quadratic Regulator

MuJoCo MPC — Official Implementation

Contact Dynamics — The Biggest Challenge

Friction Cone

Real-Time Constraints

1. Reduce Horizon

2. Warm-starting

3. GPU Acceleration

MPC vs QP vs RL

Hybrid MPC + RL

Learning Resources

Next in This Series

Related Articles

Nguyễn Anh Tuấn

Related Posts

Humanoid control cơ bản: IK đến task-space control

MPC-RL: Huấn luyện Humanoid bằng MPC

Tương lai Humanoid: Cơ hội cho kỹ sư robotics