humanoidhumanoidmpcmujocoilqrcontrol

Whole-Body MPC: Real-Time Full-Body Control

Whole-body Model Predictive Control with MuJoCo and iLQR — from contact dynamics theory to real-time deployment on humanoid robots.

Nguyen Anh Tuan15 tháng 2, 20266 phút đọc
Whole-Body MPC: Real-Time Full-Body Control

Why MPC for Humanoids?

In Part 2, we learned about IK, operational space control, and QP-based whole-body control. But these methods are all reactive — they only look at current state to make decisions.

Model Predictive Control (MPC) is different: it looks into the future. MPC solves an optimization problem over a horizon (e.g., 0.5-1 second), calculates the best sequence of actions, executes the first step, then repeats. This is especially important for humanoids because:

  1. Contact planning: MPC can plan when to place feet, when to lift feet
  2. Balance recovery: Look ahead to prevent balance loss
  3. Whole-body coordination: Optimize locomotion and manipulation simultaneously

The most important recent paper: "Whole-Body Model-Predictive Control of Legged Robots with MuJoCo" (arXiv:2503.04613) — from John Z. Zhang (MIT), Taylor Howell, Yuval Tassa (Google DeepMind), and Guanya Shi (CMU). This paper proves MPC with MuJoCo can run real-time on real humanoids with zero-shot sim-to-real.

Model Predictive Control planning horizon for humanoid locomotion

MPC — Basic Theory

Optimization Problem

MPC solves this problem at each timestep:

minimize: sum_{t=0}^{T-1} [ l(x_t, u_t) ] + l_T(x_T)
subject to:
    x_{t+1} = f(x_t, u_t)       (dynamics)
    x_0 = x_current              (initial state)
    u_min <= u_t <= u_max        (actuator limits)
    h(x_t, u_t) >= 0             (constraints: friction, joint limits, ...)

Where:

  • x_t — state (joint positions, velocities)
  • u_t — control (joint torques or accelerations)
  • f — dynamics model
  • l — cost function (tracking error, energy, ...)
  • T — prediction horizon (lookahead steps)

Receding Horizon

MPC repeats every control cycle (e.g., every 1-10 ms):

  1. Measure current state x_0
  2. Solve optimization problem over horizon T
  3. Execute only u_0 (first step)
  4. Return to step 1

This creates a natural feedback loop — if disturbed (e.g., pushed), MPC automatically adjusts the plan.

iLQR — Optimization Algorithm for MPC

iterative Linear Quadratic Regulator (iLQR) is the most commonly used algorithm for MPC on humanoids. It's the nonlinear version of LQR.

iLQR Algorithm — Iterative Linear Quadratic Regulator

iLQR approximates nonlinear problems by:

  1. Forward pass: Simulate trajectory with current control
  2. Backward pass: Linearize dynamics around trajectory, solve Riccati for feedback gains
  3. Forward pass: Update trajectory with new gains
  4. Repeat until convergence

The beauty of iLQR is that it naturally handles contact-implicit control — the forward simulation automatically computes contact dynamics (using MuJoCo's convex optimization solver), and the backward pass derives optimal gains that respect these constraints.

import numpy as np
import mujoco

class iLQR_MPC:
    def __init__(self, model, horizon=50, dt=0.01):
        self.model = model
        self.T = horizon
        self.dt = dt
        self.nx = model.nq + model.nv  # state dimension
        self.nu = model.nu              # control dimension
        
    def rollout(self, x0, U):
        """Forward simulate with control sequence U."""
        data = mujoco.MjData(self.model)
        X = [x0.copy()]
        
        for t in range(self.T):
            data.qpos[:] = X[-1][:self.model.nq]
            data.qvel[:] = X[-1][self.model.nq:]
            data.ctrl[:] = U[t]
            mujoco.mj_step(self.model, data)
            x_next = np.concatenate([data.qpos.copy(), data.qvel.copy()])
            X.append(x_next)
        
        return X
    
    def solve(self, x0, x_ref, U_init, Q, R, Qf, max_iter=10):
        """Run iLQR MPC."""
        self.x_ref = x_ref
        U = [u.copy() for u in U_init]
        
        for iteration in range(max_iter):
            # Forward rollout
            X = self.rollout(x0, U)
            
            # Compute cost
            cost = self._compute_cost(X, U, Q, R, Qf)
            
            # Update (simplified)
            if iteration > 0 and cost >= prev_cost:
                break  # Converged
            
            prev_cost = cost
        
        return U[0], X  # Return first control and trajectory
    
    def _compute_cost(self, X, U, Q, R, Qf):
        """Compute total cost."""
        cost = 0
        for t in range(self.T):
            x_err = X[t] - self.x_ref
            cost += x_err.T @ Q @ x_err
            cost += U[t].T @ R @ U[t]
        x_final_err = X[-1] - self.x_ref
        cost += x_final_err.T @ Qf @ x_final_err
        return cost

MuJoCo MPC — Official Implementation

Paper arXiv:2503.04613 uses MuJoCo as the dynamics model directly in the iLQR loop. Key features:

  1. Finite-difference derivatives: Instead of analytical derivatives (complex with contact), use finite differences on MuJoCo simulator. Fast because MuJoCo steps are very fast.
  2. Contact-implicit: No explicit contact schedule needed — MuJoCo automatically handles contact in forward simulation.
  3. Real-time: Achieves ~100 Hz control rate on real hardware (Unitree H1, quadrupeds).

Official code: github.com/johnzhang3/mujoco_mpc_deploy

Contact Dynamics — The Biggest Challenge

Contact (between feet and ground) is the hardest part of humanoid dynamics. Three states exist:

  1. No contact: Foot in air, no contact forces
  2. Sliding contact: Foot slides on ground, friction
  3. Sticking contact: Foot adheres to ground, static friction

Friction Cone

Contact forces are bounded by friction cone:

||f_tangential|| <= mu * f_normal

In MPC, the friction cone is approximated as a pyramid to convert it to linear constraints.

Contact dynamics and friction cone in humanoid locomotion

Real-Time Constraints

To run MPC real-time on humanoids (100-1000 Hz control loop):

1. Reduce Horizon

Instead of 100 steps, use 20-50 with larger timesteps. Trade-off: shorter look-ahead but less accuracy.

2. Warm-starting

Use the previous solution as starting point for the current one. Since state doesn't change much between cycles, warm-start helps converge faster (usually 1-3 iLQR iterations instead of 10+).

def mpc_loop(model, x0, x_ref, Q, R, Qf):
    """MPC loop with warm-starting."""
    T = 30  # horizon
    U_prev = [np.zeros(model.nu) for _ in range(T)]
    
    while True:
        x_current = get_state()  # Read from sensors
        
        # Warm-start: shift previous solution
        U_init = U_prev[1:] + [U_prev[-1]]
        
        # Solve MPC (only 2-3 iterations)
        mpc = iLQR_MPC(model, horizon=T)
        u_opt, K = mpc.solve(x_current, x_ref, U_init, Q, R, Qf, max_iter=3)
        
        # Apply control
        apply_torque(u_opt)
        
        # Save for next warm-start
        U_prev = U_init

3. GPU Acceleration

MuJoCo MJX (JAX backend) and MJX-Warp (NVIDIA GPU) run dynamics on GPU. Especially useful for sampling-based MPC (e.g., MPPI) — sample thousands of trajectories in parallel.

MPC vs QP vs RL

Criterion QP (Part 2) MPC (iLQR) RL (Part 4)
Look-ahead No (reactive) Yes (0.5-1s horizon) Yes (learned)
Contact planning Needs explicit schedule Contact-implicit Learned
Computation ~1 ms ~5-50 ms ~0.1 ms (inference)
Robustness Medium Good (replans) Best
Domain transfer Good (model-based) Good (model-based) Needs sim-to-real
Dexterity Good Good Best
Setup effort Medium High High (training)

Hybrid MPC + RL

Latest trend: combine both:

  • RL learns rough policy (robust, fast)
  • MPC refines real-time (precise, optimal)

Example: RL for locomotion, MPC for fine manipulation. Or RL provides warm-start for MPC.

Learning Resources

Next in This Series


NT

Nguyễn Anh Tuấn

Robotics & AI Engineer. Building VnRobo — sharing knowledge about robot learning, VLA models, and automation.

Bài viết liên quan

NEWTutorial
Fine-Tune GR00T N1.6 với Cosmos Reason 2
grootnvidiavlacosmosfine-tuninghumanoidisaac

Fine-Tune GR00T N1.6 với Cosmos Reason 2

Hướng dẫn chi tiết fine-tune NVIDIA GR00T N1.6 — VLA model 3B tham số kết hợp Cosmos Reason 2 để điều khiển humanoid robot từ ảnh và ngôn ngữ.

15/4/202611 phút đọc
NEWTutorial
GEAR-SONIC: Whole-Body Control cho Humanoid Robot
humanoidwhole-body-controlnvidiareinforcement-learningmotion-trackingvr-teleoperationisaac-lab

GEAR-SONIC: Whole-Body Control cho Humanoid Robot

Hướng dẫn chi tiết GEAR-SONIC của NVIDIA — huấn luyện whole-body controller cho humanoid robot với dataset BONES-SEED và VR teleoperation.

13/4/202612 phút đọc
NEWTutorial
Genie Sim 3.0: Huấn luyện Humanoid với AGIBOT
simulationhumanoidisaac-simgenie-simagibotsim-to-realreinforcement-learning

Genie Sim 3.0: Huấn luyện Humanoid với AGIBOT

Hướng dẫn chi tiết dựng môi trường simulation với Genie Sim 3.0 — nền tảng open-source từ AGIBOT trên Isaac Sim để huấn luyện robot humanoid.

12/4/202611 phút đọc