humanoidsim2realhumanoiddeploymentrlunitree

Sim-to-Real for Humanoids: Deployment Best Practices

Complete pipeline for deploying RL locomotion policies to real humanoid robots — domain randomization, system ID, safety, and Unitree SDK.

Nguyễn Anh Tuấn9 tháng 4, 20269 phút đọc
Sim-to-Real for Humanoids: Deployment Best Practices

The Final Step: From Sim to Real Robot

Throughout the previous 9 posts, we trained humanoids to walk (G1, H1), run (H1 advanced), carry objects (loco-manip), and parkour (post 9) — all in simulation. Now comes the most important and difficult step: deploying to a real robot.

Sim-to-real for humanoids is harder than for quadrupeds (see locomotion sim2real) because: bipedal is inherently unstable, narrow support polygon, higher center of mass, and more DOF. A small mistake can cause the robot to fall and damage hardware worth tens of thousands of dollars.

Humanoid robot deployment

Domain Randomization Checklist for Humanoids

Physical Parameters

class HumanoidDomainRandomization:
    """Domain randomization config for humanoid sim-to-real."""
    
    def __init__(self):
        self.params = {
            # --- Body dynamics ---
            "base_mass_range": [0.85, 1.15],      # ±15% total mass
            "link_mass_range": [0.9, 1.1],         # ±10% per link
            "com_offset_range": [-0.02, 0.02],     # ±2cm CoM offset
            "inertia_range": [0.8, 1.2],           # ±20% inertia
            
            # --- Joint properties ---
            "joint_friction_range": [0.0, 0.05],
            "joint_damping_range": [0.5, 2.0],
            "joint_armature_range": [0.01, 0.05],
            "joint_stiffness_range": [0.0, 5.0],
            
            # --- Actuator ---
            "motor_strength_range": [0.85, 1.15],  # ±15% max torque
            "motor_offset_range": [-0.02, 0.02],   # Position offset (rad)
            
            # --- Ground ---
            "ground_friction_range": [0.4, 1.5],
            "ground_restitution_range": [0.0, 0.3],
            
            # --- Delays & noise ---
            "action_delay_range": [0, 30],         # ms
            "observation_delay_range": [0, 15],    # ms
            "action_noise_std": 0.02,
            "joint_pos_noise_std": 0.01,
            "joint_vel_noise_std": 0.1,
            "imu_noise_std": 0.05,
            "imu_bias_range": [-0.1, 0.1],
            
            # --- External perturbations ---
            "push_force_range": [0, 50],           # N
            "push_interval_range": [5, 15],        # seconds
            "push_duration": 0.1,                  # seconds
            
            # --- Terrain ---
            "terrain_roughness_range": [0.0, 0.03],
        }
    
    def randomize(self, env):
        """Apply randomization to environment."""
        for key, (low, high) in self.params.items():
            value = np.random.uniform(low, high)
            env.set_param(key, value)
        return env

Sim vs Real Gap Comparison

Parameter Sim Default Real Robot Gap Impact
Control latency 0ms 5-20ms Critical Oscillation, instability
Joint friction 0 0.01-0.05 Nm Medium Slow response
Ground friction 1.0 0.3-1.2 High Slip, fall
Motor strength 100% 85-95% Medium Weaker motions
Sensor noise 0 IMU: ±2°, encoder: ±0.5° Medium Jerky control
Link mass CAD exact +5-10% (cables, etc.) Medium Balance offset
Flexibility Rigid Harmonic drives flex High Compliance mismatch

System Identification

Motor Identification

import numpy as np
from scipy.optimize import minimize

class MotorIdentification:
    """Identify real motor parameters from data."""
    
    def __init__(self, joint_name, dt=0.002):
        self.joint_name = joint_name
        self.dt = dt
    
    def collect_data(self, robot, duration=10.0):
        """Collect torque-position-velocity data."""
        data = {'pos': [], 'vel': [], 'torque': [], 'cmd': []}
        
        t = 0
        while t < duration:
            freq = 0.5 + t / duration * 5.0  # Chirp 0.5 → 5.5 Hz
            cmd = 0.3 * np.sin(2 * np.pi * freq * t)
            
            robot.set_joint_position(self.joint_name, cmd)
            state = robot.get_joint_state(self.joint_name)
            
            data['pos'].append(state.position)
            data['vel'].append(state.velocity)
            data['torque'].append(state.effort)
            data['cmd'].append(cmd)
            
            t += self.dt
        
        return {k: np.array(v) for k, v in data.items()}
    
    def identify(self, data):
        """Fit motor model to collected data."""
        pos, vel, torque, cmd = data['pos'], data['vel'], data['torque'], data['cmd']
        
        def motor_model(params, cmd, pos, vel):
            Kp, Kd, friction, damping = params
            return Kp * (cmd - pos) - Kd * vel - friction * np.sign(vel) - damping * vel
        
        def objective(params):
            pred = motor_model(params, cmd, pos, vel)
            return np.mean((pred - torque) ** 2)
        
        x0 = [100.0, 5.0, 0.5, 0.1]
        bounds = [(10, 500), (0.1, 50), (0, 5), (0, 2)]
        result = minimize(objective, x0, bounds=bounds, method='L-BFGS-B')
        
        Kp, Kd, friction, damping = result.x
        print(f"Joint {self.joint_name}: Kp={Kp:.1f}, Kd={Kd:.2f}, "
              f"friction={friction:.3f}, damping={damping:.3f}, MSE={result.fun:.6f}")
        
        return {'Kp': Kp, 'Kd': Kd, 'friction': friction, 'damping': damping}

Real-Time Inference on Onboard Computer

Unitree Robot Compute Options

Robot Onboard Compute GPU Power Inference Speed
G1 Jetson Orin NX 16GB 1024 CUDA 25W ~3ms (MLP)
H1 Jetson Orin NX 16GB 1024 CUDA 25W ~3ms (MLP)
H1-2 Jetson AGX Orin 64GB 2048 CUDA 60W ~1.5ms (MLP)

Optimized Policy Inference

import torch
import time
import numpy as np

class PolicyInference:
    """Optimized policy inference for real-time control."""
    
    def __init__(self, policy_path, device='cuda'):
        self.device = device
        
        checkpoint = torch.load(policy_path, map_location=device)
        self.policy = checkpoint['policy']
        self.policy.eval()
        
        # JIT compile for speed
        dummy_input = torch.zeros(1, self.obs_dim).to(device)
        self.policy_jit = torch.jit.trace(self.policy, dummy_input)
        self.policy_jit = torch.jit.freeze(self.policy_jit)
        
        # Normalization stats
        self.obs_mean = checkpoint['obs_mean'].to(device)
        self.obs_std = checkpoint['obs_std'].to(device)
        
        self.action_scale = checkpoint.get('action_scale', 0.25)
        self.default_dof_pos = checkpoint['default_dof_pos']
        
        # Pre-allocate tensors
        self.obs_tensor = torch.zeros(1, self.obs_dim, device=device)
        
    @torch.no_grad()
    def get_action(self, obs_np):
        """Get action from numpy observation."""
        self.obs_tensor[0] = torch.from_numpy(obs_np).float()
        obs_norm = (self.obs_tensor - self.obs_mean) / (self.obs_std + 1e-8)
        action = self.policy_jit(obs_norm)
        target_pos = self.default_dof_pos + action.squeeze() * self.action_scale
        return target_pos.cpu().numpy()
    
    def benchmark(self, n_iterations=1000):
        """Benchmark inference speed."""
        obs = np.random.randn(self.obs_dim).astype(np.float32)
        
        for _ in range(100):
            self.get_action(obs)
        
        times = []
        for _ in range(n_iterations):
            start = time.perf_counter()
            self.get_action(obs)
            torch.cuda.synchronize()
            times.append(time.perf_counter() - start)
        
        print(f"Inference: {np.mean(times)*1000:.2f} ± {np.std(times)*1000:.2f} ms")
        print(f"Control freq: {1000/(np.mean(times)*1000):.0f} Hz")

Safety Systems

Safety Layer Architecture

class HumanoidSafetyLayer:
    """Safety layer between policy and actuators."""
    
    def __init__(self, robot_config):
        self.config = robot_config
        self.soft_limits = robot_config['soft_joint_limits']
        self.hard_limits = robot_config['hard_joint_limits']
        self.max_joint_vel = robot_config['max_joint_velocity']
        self.max_torque = robot_config['max_torque']
        self.min_base_height = 0.3  # meters
        self.max_tilt_angle = 60.0  # degrees
        self.e_stop_active = False
        
    def check_and_apply(self, target_positions, current_state):
        """Apply safety checks and return safe commands."""
        if self.e_stop_active:
            return self._damping_mode(current_state)
        
        # Fall detection
        if self._detect_fall(current_state):
            print("FALL DETECTED — activating damping mode")
            self.e_stop_active = True
            return self._damping_mode(current_state)
        
        # Joint limit clamping
        safe_positions = np.clip(
            target_positions, self.hard_limits[:, 0], self.hard_limits[:, 1]
        )
        
        # Velocity limiting
        current_pos = current_state['joint_positions']
        delta = safe_positions - current_pos
        max_delta = self.max_joint_vel * self.config['dt']
        safe_positions = current_pos + np.clip(delta, -max_delta, max_delta)
        
        return safe_positions
    
    def _detect_fall(self, state):
        base_height = state['base_position'][2]
        if base_height < self.min_base_height:
            return True
        roll = np.degrees(state['base_euler'][0])
        pitch = np.degrees(state['base_euler'][1])
        if abs(roll) > self.max_tilt_angle or abs(pitch) > self.max_tilt_angle:
            return True
        return False
    
    def _damping_mode(self, state):
        return state['joint_positions']  # Hold current position

Complete Deployment Framework

class HumanoidDeployment:
    """Deploy RL locomotion policy on Unitree humanoid."""
    
    def __init__(self, policy_path, robot_type='g1'):
        self.policy = PolicyInference(policy_path)
        self.safety = HumanoidSafetyLayer(ROBOT_CONFIGS[robot_type])
        self.config = ROBOT_CONFIGS[robot_type]
        self.dt = self.config['control_dt']
        self.action_filter = ExponentialFilter(alpha=0.8)
        self.running = False
        
    def build_observation(self, state):
        """Construct observation vector from robot state."""
        obs = np.concatenate([
            state['base_angular_velocity'],
            state['projected_gravity'],
            self.velocity_command,
            state['joint_positions'] - self.config['default_dof_pos'],
            state['joint_velocities'],
            self.previous_action,
        ])
        return np.clip(obs, -100.0, 100.0)
    
    def control_step(self, state):
        """Single control step: obs → policy → safety → command."""
        obs = self.build_observation(state)
        raw_action = self.policy.get_action(obs)
        filtered_action = self.action_filter.apply(raw_action)
        target_positions = self.config['default_dof_pos'] + filtered_action * self.config['action_scale']
        safe_positions = self.safety.check_and_apply(target_positions, state)
        self.previous_action = filtered_action.copy()
        return safe_positions
    
    def run(self, velocity_command=[0.5, 0.0, 0.0]):
        """Main control loop."""
        self.velocity_command = np.array(velocity_command)
        self.previous_action = np.zeros(self.config['num_dof'])
        self.running = True
        
        print(f"Starting deployment: vel_cmd = {velocity_command}")
        print(f"Control frequency: {1/self.dt:.0f} Hz")
        
        try:
            while self.running:
                loop_start = time.perf_counter()
                state = self.read_robot_state()
                target_pos = self.control_step(state)
                self.send_command(target_pos)
                
                elapsed = time.perf_counter() - loop_start
                sleep_time = self.dt - elapsed
                if sleep_time > 0:
                    time.sleep(sleep_time)
        except KeyboardInterrupt:
            print("\nStopping — entering damping mode")
            self.safety.e_stop_active = True

Sim-to-real deployment testing

Debugging Common Failures

Symptom Probable Cause Fix
Robot falls immediately Observation mismatch Verify joint order, IMU frame, gravity direction
Wobbling/oscillation PD gains too high or latency Reduce gains, add action filter
Drifts sideways IMU bias or asymmetric friction Calibrate IMU, check foot contact
Can't lift feet Friction too high or motor weak Reduce friction, check motor strength
Jerky motions No action smoothing Add EMA filter (alpha=0.7-0.9)
Falls on stairs No terrain adaptation Deploy teacher-student with history

Complete Deployment Checklist

Pre-Deployment

  • System identification done (motor params, friction, latency)
  • Domain randomization ranges cover identified real values
  • Policy trained with history observations
  • Safety layer tested with simulated failures
  • E-stop hardware tested and accessible
  • Soft floor/mats placed in testing area
  • Inference benchmarked on target compute (< 5ms for 50Hz)
  • Battery fully charged (>80%)

First Test Protocol

  • Start with robot on gantry/harness — no ground contact
  • Verify joint tracking with sinusoidal commands
  • Lower to ground with zero velocity command (standing only)
  • Send very slow velocity command (0.1 m/s forward)
  • Gradually increase to target velocity over 20+ minutes
  • Test push recovery with gentle pushes
  • Test on flat ground before any terrain

Monitoring During Deployment

  • Log all observations, actions, torques at full rate
  • Monitor joint temperatures (thermal protection)
  • Monitor battery voltage (shutdown if <20%)
  • Camera recording for post-analysis
  • Always have someone within arm's reach of e-stop

Series Summary

Post Topic Robot
1 Intro & Setup General
2 Reward Engineering General
3 G1 Basic Walking Unitree G1
4 G1 Terrain Unitree G1
5 H1 Basic Unitree H1
6 H1 Advanced Unitree H1
7 H1-2 Unitree H1-2
8 Loco-Manipulation General
9 Parkour General
10 Sim-to-Real (this post) Unitree G1/H1

References

  1. Learning Humanoid Locomotion with Transformers — Radosavovic et al., 2024
  2. Real-World Humanoid Locomotion with RL — Gu et al., 2024
  3. Humanoid-Gym: Reinforcement Learning for Humanoid Robot — Gu et al., 2024
  4. Sim-to-Real Transfer for Bipedal Locomotion — Li et al., 2023
NT

Nguyễn Anh Tuấn

Robotics & AI Engineer. Building VnRobo — sharing knowledge about robot learning, VLA models, and automation.

Khám phá VnRobo

Bài viết liên quan

NEWTutorial
WholeBodyVLA Tutorial: Teleop → Train → Deploy Humanoid
wholebodyvlavlahumanoidloco-manipulationiclr-2026agibot-x2teleoprl

WholeBodyVLA Tutorial: Teleop → Train → Deploy Humanoid

ICLR 2026 — pipeline thực chiến từ thu thập teleop, train unified latent VLA đến deploy whole-body loco-manipulation trên AgiBot X2.

11/5/202611 phút đọc
NEWTutorial
Ark v1.5: Python Framework cho Robot Learning sim-to-real
ark-frameworkpythonimitation-learningactdiffusion-policysim2realros

Ark v1.5: Python Framework cho Robot Learning sim-to-real

Hướng dẫn chi tiết Ark v1.5 — Python framework open-source train ACT + Diffusion Policy seamless giữa sim và robot thật, native ROS, Gym-style API.

8/5/202610 phút đọc
NEWTutorial
Booster Gym ICRA 2026: Train Humanoid T1 Sim-to-Real với Isaac Gym
humanoidisaac-gymreinforcement-learningsim2realbooster-t1icra-2026

Booster Gym ICRA 2026: Train Humanoid T1 Sim-to-Real với Isaac Gym

Hướng dẫn chi tiết Booster Gym — RL framework end-to-end open-source train humanoid Booster T1 walking từ teleop đến deploy thực tế.

6/5/202611 phút đọc