Sim-to-Real Transfer: Deploy VLA Policy lên Robot thật

Bài cuối cùng: Từ Simulation đến Thế giới thật

Xuyên suốt 9 bài trước, chúng ta đã xây dựng mọi thứ trong simulation: thu thập data bằng teleop, train single-arm, multi-object, dual-arm, mobile manipulation, và humanoid. Nhưng simulation chỉ là bước đệm — mục tiêu cuối cùng luôn là robot thật hoạt động trong thế giới thật.

Sim-to-real transfer là nơi mà rất nhiều project thất bại. Policy đạt 95% success rate trong sim nhưng 0% trên real robot. Bài này sẽ giúp bạn tránh những cạm bẫy đó với pipeline đã được kiểm chứng.

Robot arm thực tế trong lab

Tại sao Sim-to-Real khó?

Reality Gap

Yếu tố	Simulation	Real World
Physics	Perfect rigid body, known friction	Deformable, unknown parameters
Rendering	Clean synthetic images	Lighting variation, reflections, shadows
Latency	~0ms	10-50ms control loop, 30-100ms vision
Sensor noise	Optional, Gaussian	Complex, correlated, drifting
Actuator	Perfect torque/position	Backlash, friction, thermal drift
Objects	Exact CAD model	Manufacturing tolerance, wear

Nghiên cứu từ Tobin et al. (2017) đã chỉ ra rằng domain randomization là chìa khóa để bridge reality gap, nhưng nó chỉ là một phần của puzzle.

Domain Randomization cho Visual Policies

Visual Randomization

VLA policies dựa rất nhiều vào camera input. Visual domain randomization là critical:

# LeRobot visual domain randomization config
visual_randomization = {
    # Camera randomization
    "camera_position_noise": 0.02,       # ±2cm position noise
    "camera_rotation_noise": 3.0,        # ±3 degrees rotation
    "camera_fov_range": [55, 75],        # Field of view range
    
    # Lighting randomization
    "light_intensity_range": [0.3, 1.5],
    "light_color_range": [0.8, 1.2],     # Per-channel multiplier
    "light_position_noise": 0.5,         # ±50cm
    "num_random_lights": [1, 4],
    
    # Texture randomization
    "table_texture_randomize": True,
    "object_color_randomize": True,
    "background_randomize": True,
    
    # Post-processing
    "brightness_range": [0.7, 1.3],
    "contrast_range": [0.8, 1.2],
    "saturation_range": [0.8, 1.2],
    "noise_std_range": [0.0, 0.02],
}

Physics Randomization

# Physics domain randomization
physics_randomization = {
    # Object properties
    "object_mass_range": [0.8, 1.2],      # ±20% mass
    "object_friction_range": [0.3, 1.5],
    "object_size_range": [0.9, 1.1],      # ±10% size
    
    # Robot properties
    "joint_friction_range": [0.5, 2.0],
    "joint_damping_range": [0.8, 1.2],
    "actuator_strength_range": [0.9, 1.1],
    
    # Control
    "action_delay_range": [0, 3],         # 0-3 frames delay
    "action_noise_std": 0.01,
    "observation_delay_range": [0, 2],
    
    # Environment
    "gravity_noise": 0.05,                # ±5% gravity
    "table_height_range": [-0.02, 0.02],  # ±2cm
}

Paper Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World (Tobin et al., 2017) đã chứng minh rằng chỉ cần randomize đủ rộng, policy sẽ học features invariant giữa sim và real.

Camera Calibration và Real-World Setup

Camera Setup

Camera placement phải match giữa sim và real — đây là lỗi phổ biến nhất:

import numpy as np
import cv2

def calibrate_camera(images_dir, checkerboard_size=(9, 6)):
    """Camera calibration using checkerboard pattern."""
    criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
    
    objp = np.zeros((checkerboard_size[0] * checkerboard_size[1], 3), np.float32)
    objp[:, :2] = np.mgrid[0:checkerboard_size[0], 
                           0:checkerboard_size[1]].T.reshape(-1, 2)
    
    objpoints, imgpoints = [], []
    
    import glob
    images = glob.glob(f"{images_dir}/*.png")
    
    for fname in images:
        img = cv2.imread(fname)
        gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
        ret, corners = cv2.findChessboardCorners(gray, checkerboard_size, None)
        
        if ret:
            objpoints.append(objp)
            corners2 = cv2.cornerSubPix(gray, corners, (11, 11), (-1, -1), criteria)
            imgpoints.append(corners2)
    
    ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(
        objpoints, imgpoints, gray.shape[::-1], None, None
    )
    
    print(f"Reprojection error: {ret:.4f}")
    print(f"Camera matrix:\n{mtx}")
    print(f"Distortion coefficients: {dist}")
    
    return mtx, dist

# Calibrate
mtx, dist = calibrate_camera("calibration_images/")

Extrinsic Calibration (Camera-to-Robot)

def hand_eye_calibration(robot_poses, camera_poses):
    """
    AX = XB hand-eye calibration.
    robot_poses: list of 4x4 homogeneous transforms (base to end-effector)
    camera_poses: list of 4x4 transforms (camera to calibration target)
    """
    R_gripper2base = [pose[:3, :3] for pose in robot_poses]
    t_gripper2base = [pose[:3, 3] for pose in robot_poses]
    R_target2cam = [pose[:3, :3] for pose in camera_poses]
    t_target2cam = [pose[:3, 3] for pose in camera_poses]
    
    R_cam2gripper, t_cam2gripper = cv2.calibrateHandEye(
        R_gripper2base, t_gripper2base,
        R_target2cam, t_target2cam,
        method=cv2.CALIB_HAND_EYE_TSAI
    )
    
    T_cam2gripper = np.eye(4)
    T_cam2gripper[:3, :3] = R_cam2gripper
    T_cam2gripper[:3, 3] = t_cam2gripper.flatten()
    
    return T_cam2gripper

Camera calibration setup

Inference Optimization

Quantization cho Real-Time Inference

VLA models (7B+ params) cần optimization để chạy real-time:

from transformers import AutoModelForVision2Seq, BitsAndBytesConfig
import torch

# 8-bit quantization
quantization_config = BitsAndBytesConfig(
    load_in_8bit=True,
    llm_int8_threshold=6.0,
)

model = AutoModelForVision2Seq.from_pretrained(
    "openvla/openvla-7b-finetuned",
    quantization_config=quantization_config,
    device_map="auto",
    torch_dtype=torch.float16,
)

# Benchmark inference speed
import time

# Warm up
for _ in range(5):
    with torch.no_grad():
        _ = model.predict_action(dummy_obs)

# Measure
times = []
for _ in range(100):
    start = time.perf_counter()
    with torch.no_grad():
        action = model.predict_action(obs)
    times.append(time.perf_counter() - start)

print(f"Inference: {np.mean(times)*1000:.1f}ms ± {np.std(times)*1000:.1f}ms")
print(f"Control frequency: {1/np.mean(times):.1f} Hz")

Performance Comparison

Method	Model Size	Latency	GPU Memory	Success Rate
FP32	28 GB	450ms	32 GB	100% (baseline)
FP16	14 GB	180ms	16 GB	99.5%
INT8	7 GB	95ms	10 GB	98.2%
INT4 (GPTQ)	3.5 GB	65ms	6 GB	94.1%
TensorRT FP16	14 GB	45ms	16 GB	99.5%

Với robot arm thông thường, control frequency 5-10 Hz là đủ, nên INT8 (~95ms) hoạt động tốt. Nếu cần nhanh hơn, TensorRT là lựa chọn tốt nhất nhưng yêu cầu conversion phức tạp hơn.

ROS 2 Deployment Node

Complete ROS 2 Policy Server

#!/usr/bin/env python3
"""ROS 2 node for VLA policy deployment."""

import rclpy
from rclpy.node import Node
from sensor_msgs.msg import Image, JointState
from trajectory_msgs.msg import JointTrajectoryPoint
from cv_bridge import CvBridge
import numpy as np
import torch
from lerobot.common.policies.factory import make_policy

class VLAPolicyNode(Node):
    def __init__(self):
        super().__init__('vla_policy_node')
        
        # Parameters
        self.declare_parameter('policy_path', '')
        self.declare_parameter('control_freq', 10.0)
        self.declare_parameter('device', 'cuda')
        
        policy_path = self.get_parameter('policy_path').value
        self.device = self.get_parameter('device').value
        control_freq = self.get_parameter('control_freq').value
        
        # Load policy
        self.get_logger().info(f'Loading policy from {policy_path}...')
        self.policy = make_policy(
            cfg=None,
            pretrained_policy_name_or_path=policy_path,
        )
        self.policy.to(self.device)
        self.policy.eval()
        self.get_logger().info('Policy loaded successfully!')
        
        # State
        self.bridge = CvBridge()
        self.latest_image = None
        self.latest_joints = None
        
        # Subscribers
        self.image_sub = self.create_subscription(
            Image, '/camera/color/image_raw', self.image_callback, 10
        )
        self.joint_sub = self.create_subscription(
            JointState, '/joint_states', self.joint_callback, 10
        )
        
        # Publisher
        self.action_pub = self.create_publisher(
            JointTrajectoryPoint, '/policy_action', 10
        )
        
        # Control loop timer
        period = 1.0 / control_freq
        self.timer = self.create_timer(period, self.control_loop)
        self.get_logger().info(f'Control loop at {control_freq} Hz')
    
    def image_callback(self, msg):
        self.latest_image = self.bridge.imgmsg_to_cv2(msg, 'rgb8')
    
    def joint_callback(self, msg):
        self.latest_joints = np.array(msg.position)
    
    def control_loop(self):
        if self.latest_image is None or self.latest_joints is None:
            return
        
        # Prepare observation
        obs = {
            'observation.images.top': torch.from_numpy(
                self.latest_image
            ).permute(2, 0, 1).unsqueeze(0).float().to(self.device) / 255.0,
            'observation.state': torch.from_numpy(
                self.latest_joints
            ).unsqueeze(0).float().to(self.device),
        }
        
        # Predict action
        with torch.no_grad():
            action = self.policy.select_action(obs)
        
        action_np = action.squeeze().cpu().numpy()
        
        # Publish
        msg = JointTrajectoryPoint()
        msg.positions = action_np.tolist()
        msg.time_from_start.sec = 0
        msg.time_from_start.nanosec = int(1e8)  # 100ms
        self.action_pub.publish(msg)

def main():
    rclpy.init()
    node = VLAPolicyNode()
    rclpy.spin(node)
    node.destroy_node()
    rclpy.shutdown()

if __name__ == '__main__':
    main()

Debugging Common Failures

Failure Taxonomy

Failure Mode	Symptom	Root Cause	Fix
Frozen robot	Robot không di chuyển	Observation mismatch (wrong camera, wrong joint order)	Verify obs matches training format
Wild oscillation	Robot rung lắc mạnh	Action scale mismatch	Check action normalization stats
Drift	Robot từ từ lệch khỏi target	Latency accumulation	Reduce inference time, add action smoothing
Collision	Robot va vào vật cản	Missing safety limits	Add workspace bounds, torque limits
Grasp fail	Không nắm được vật	Visual domain gap	More visual randomization, finetune on real images

Action Smoothing Filter

class ActionSmoother:
    """Exponential moving average for smooth actions."""
    
    def __init__(self, alpha=0.7, action_dim=7):
        self.alpha = alpha
        self.prev_action = np.zeros(action_dim)
        self.initialized = False
    
    def smooth(self, raw_action):
        if not self.initialized:
            self.prev_action = raw_action.copy()
            self.initialized = True
            return raw_action
        
        smoothed = self.alpha * raw_action + (1 - self.alpha) * self.prev_action
        self.prev_action = smoothed.copy()
        return smoothed

# Usage
smoother = ActionSmoother(alpha=0.7)
smoothed_action = smoother.smooth(raw_policy_action)

Complete Deployment Checklist

Pre-deployment

Camera intrinsic calibration (reprojection error < 0.5px)
Hand-eye calibration (position error < 5mm)
Joint order matches training (verify with manual test)
Action space bounds verified against robot limits
Emergency stop (e-stop) tested and accessible
Workspace boundaries configured in software
Inference latency measured (< 200ms for 5Hz control)
GPU memory sufficient (check with nvidia-smi)

During deployment

Start with slow speed (10% of max velocity)
Run 5 episodes with soft objects first
Monitor joint torques — spikes indicate collision
Log all observations and actions for post-analysis
Gradually increase speed over 20+ successful episodes

Post-deployment

Calculate success rate over 50+ episodes
Analyze failure modes (categorize using table above)
Collect real-world failure cases for fine-tuning
Compare sim vs real performance gap

Real-World Fine-tuning với LeRobot

Nếu sim-trained policy đạt < 80% trên real robot, fine-tune trên real data là bước tiếp theo:

from lerobot.common.datasets.lerobot_dataset import LeRobotDataset
from lerobot.common.policies.diffusion.modeling_diffusion import DiffusionPolicy

# 1. Collect 50-100 real demonstrations
# (dùng teleop giống bài 2, nhưng trên robot thật)

# 2. Load sim-pretrained policy
policy = DiffusionPolicy.from_pretrained("my-sim-trained-policy")

# 3. Load real dataset
real_dataset = LeRobotDataset("my-real-demos")

# 4. Fine-tune with lower learning rate
from lerobot.scripts.train import train
train(
    policy=policy,
    dataset=real_dataset,
    training_steps=10000,        # Ít hơn sim training
    learning_rate=1e-5,          # 10x thấp hơn
    batch_size=32,
    eval_freq=1000,
)

Chiến lược này được gọi là sim-to-real-to-sim — train trong sim, fine-tune trên real, rồi dùng real data để improve sim.

Tổng kết Series

Qua 10 bài, chúng ta đã xây dựng complete pipeline:

Framework deep dive — hiểu LeRobot từ trong ra ngoài
Data collection — thu thập demonstration quality
Single-arm training — ACT vs Diffusion Policy
Multi-object — scaling lên nhiều vật thể
Long-horizon — multi-step task planning
Dual-arm setup — bimanual calibration
Dual-arm tasks — coordination training
Mobile manipulation — base + arms
Humanoid — whole-body control
Sim-to-Real (bài này) — deployment pipeline

LeRobot đang phát triển rất nhanh — theo dõi GitHub repo để cập nhật. Chúc bạn deploy thành công!

Tài liệu tham khảo

Domain Randomization for Transferring Deep Neural Networks — Tobin et al., 2017
Sim-to-Real Robot Learning from Pixels with Progressive Nets — Rusu et al., 2017
Closing the Sim-to-Real Loop — Chebotar et al., 2019
LeRobot: State-of-the-art Machine Learning for Real-World Robotics — Hugging Face, 2024

Bài cuối cùng: Từ Simulation đến Thế giới thật

Robot arm thực tế trong lab

Tại sao Sim-to-Real khó?

Reality Gap

Yếu tố	Simulation	Real World
Physics	Perfect rigid body, known friction	Deformable, unknown parameters
Rendering	Clean synthetic images	Lighting variation, reflections, shadows
Latency	~0ms	10-50ms control loop, 30-100ms vision
Sensor noise	Optional, Gaussian	Complex, correlated, drifting
Actuator	Perfect torque/position	Backlash, friction, thermal drift
Objects	Exact CAD model	Manufacturing tolerance, wear

Nghiên cứu từ Tobin et al. (2017) đã chỉ ra rằng domain randomization là chìa khóa để bridge reality gap, nhưng nó chỉ là một phần của puzzle.

Domain Randomization cho Visual Policies

Visual Randomization

VLA policies dựa rất nhiều vào camera input. Visual domain randomization là critical:

# LeRobot visual domain randomization config
visual_randomization = {
    # Camera randomization
    "camera_position_noise": 0.02,       # ±2cm position noise
    "camera_rotation_noise": 3.0,        # ±3 degrees rotation
    "camera_fov_range": [55, 75],        # Field of view range
    
    # Lighting randomization
    "light_intensity_range": [0.3, 1.5],
    "light_color_range": [0.8, 1.2],     # Per-channel multiplier
    "light_position_noise": 0.5,         # ±50cm
    "num_random_lights": [1, 4],
    
    # Texture randomization
    "table_texture_randomize": True,
    "object_color_randomize": True,
    "background_randomize": True,
    
    # Post-processing
    "brightness_range": [0.7, 1.3],
    "contrast_range": [0.8, 1.2],
    "saturation_range": [0.8, 1.2],
    "noise_std_range": [0.0, 0.02],
}

Physics Randomization

# Physics domain randomization
physics_randomization = {
    # Object properties
    "object_mass_range": [0.8, 1.2],      # ±20% mass
    "object_friction_range": [0.3, 1.5],
    "object_size_range": [0.9, 1.1],      # ±10% size
    
    # Robot properties
    "joint_friction_range": [0.5, 2.0],
    "joint_damping_range": [0.8, 1.2],
    "actuator_strength_range": [0.9, 1.1],
    
    # Control
    "action_delay_range": [0, 3],         # 0-3 frames delay
    "action_noise_std": 0.01,
    "observation_delay_range": [0, 2],
    
    # Environment
    "gravity_noise": 0.05,                # ±5% gravity
    "table_height_range": [-0.02, 0.02],  # ±2cm
}

Camera Calibration và Real-World Setup

Camera Setup

Camera placement phải match giữa sim và real — đây là lỗi phổ biến nhất:

import numpy as np
import cv2

def calibrate_camera(images_dir, checkerboard_size=(9, 6)):
    """Camera calibration using checkerboard pattern."""
    criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
    
    objp = np.zeros((checkerboard_size[0] * checkerboard_size[1], 3), np.float32)
    objp[:, :2] = np.mgrid[0:checkerboard_size[0], 
                           0:checkerboard_size[1]].T.reshape(-1, 2)
    
    objpoints, imgpoints = [], []
    
    import glob
    images = glob.glob(f"{images_dir}/*.png")
    
    for fname in images:
        img = cv2.imread(fname)
        gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
        ret, corners = cv2.findChessboardCorners(gray, checkerboard_size, None)
        
        if ret:
            objpoints.append(objp)
            corners2 = cv2.cornerSubPix(gray, corners, (11, 11), (-1, -1), criteria)
            imgpoints.append(corners2)
    
    ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(
        objpoints, imgpoints, gray.shape[::-1], None, None
    )
    
    print(f"Reprojection error: {ret:.4f}")
    print(f"Camera matrix:\n{mtx}")
    print(f"Distortion coefficients: {dist}")
    
    return mtx, dist

# Calibrate
mtx, dist = calibrate_camera("calibration_images/")

Extrinsic Calibration (Camera-to-Robot)

def hand_eye_calibration(robot_poses, camera_poses):
    """
    AX = XB hand-eye calibration.
    robot_poses: list of 4x4 homogeneous transforms (base to end-effector)
    camera_poses: list of 4x4 transforms (camera to calibration target)
    """
    R_gripper2base = [pose[:3, :3] for pose in robot_poses]
    t_gripper2base = [pose[:3, 3] for pose in robot_poses]
    R_target2cam = [pose[:3, :3] for pose in camera_poses]
    t_target2cam = [pose[:3, 3] for pose in camera_poses]
    
    R_cam2gripper, t_cam2gripper = cv2.calibrateHandEye(
        R_gripper2base, t_gripper2base,
        R_target2cam, t_target2cam,
        method=cv2.CALIB_HAND_EYE_TSAI
    )
    
    T_cam2gripper = np.eye(4)
    T_cam2gripper[:3, :3] = R_cam2gripper
    T_cam2gripper[:3, 3] = t_cam2gripper.flatten()
    
    return T_cam2gripper

Camera calibration setup

Inference Optimization

Quantization cho Real-Time Inference

VLA models (7B+ params) cần optimization để chạy real-time:

from transformers import AutoModelForVision2Seq, BitsAndBytesConfig
import torch

# 8-bit quantization
quantization_config = BitsAndBytesConfig(
    load_in_8bit=True,
    llm_int8_threshold=6.0,
)

model = AutoModelForVision2Seq.from_pretrained(
    "openvla/openvla-7b-finetuned",
    quantization_config=quantization_config,
    device_map="auto",
    torch_dtype=torch.float16,
)

# Benchmark inference speed
import time

# Warm up
for _ in range(5):
    with torch.no_grad():
        _ = model.predict_action(dummy_obs)

# Measure
times = []
for _ in range(100):
    start = time.perf_counter()
    with torch.no_grad():
        action = model.predict_action(obs)
    times.append(time.perf_counter() - start)

print(f"Inference: {np.mean(times)*1000:.1f}ms ± {np.std(times)*1000:.1f}ms")
print(f"Control frequency: {1/np.mean(times):.1f} Hz")

Performance Comparison

Method	Model Size	Latency	GPU Memory	Success Rate
FP32	28 GB	450ms	32 GB	100% (baseline)
FP16	14 GB	180ms	16 GB	99.5%
INT8	7 GB	95ms	10 GB	98.2%
INT4 (GPTQ)	3.5 GB	65ms	6 GB	94.1%
TensorRT FP16	14 GB	45ms	16 GB	99.5%

ROS 2 Deployment Node

Complete ROS 2 Policy Server

#!/usr/bin/env python3
"""ROS 2 node for VLA policy deployment."""

import rclpy
from rclpy.node import Node
from sensor_msgs.msg import Image, JointState
from trajectory_msgs.msg import JointTrajectoryPoint
from cv_bridge import CvBridge
import numpy as np
import torch
from lerobot.common.policies.factory import make_policy

class VLAPolicyNode(Node):
    def __init__(self):
        super().__init__('vla_policy_node')
        
        # Parameters
        self.declare_parameter('policy_path', '')
        self.declare_parameter('control_freq', 10.0)
        self.declare_parameter('device', 'cuda')
        
        policy_path = self.get_parameter('policy_path').value
        self.device = self.get_parameter('device').value
        control_freq = self.get_parameter('control_freq').value
        
        # Load policy
        self.get_logger().info(f'Loading policy from {policy_path}...')
        self.policy = make_policy(
            cfg=None,
            pretrained_policy_name_or_path=policy_path,
        )
        self.policy.to(self.device)
        self.policy.eval()
        self.get_logger().info('Policy loaded successfully!')
        
        # State
        self.bridge = CvBridge()
        self.latest_image = None
        self.latest_joints = None
        
        # Subscribers
        self.image_sub = self.create_subscription(
            Image, '/camera/color/image_raw', self.image_callback, 10
        )
        self.joint_sub = self.create_subscription(
            JointState, '/joint_states', self.joint_callback, 10
        )
        
        # Publisher
        self.action_pub = self.create_publisher(
            JointTrajectoryPoint, '/policy_action', 10
        )
        
        # Control loop timer
        period = 1.0 / control_freq
        self.timer = self.create_timer(period, self.control_loop)
        self.get_logger().info(f'Control loop at {control_freq} Hz')
    
    def image_callback(self, msg):
        self.latest_image = self.bridge.imgmsg_to_cv2(msg, 'rgb8')
    
    def joint_callback(self, msg):
        self.latest_joints = np.array(msg.position)
    
    def control_loop(self):
        if self.latest_image is None or self.latest_joints is None:
            return
        
        # Prepare observation
        obs = {
            'observation.images.top': torch.from_numpy(
                self.latest_image
            ).permute(2, 0, 1).unsqueeze(0).float().to(self.device) / 255.0,
            'observation.state': torch.from_numpy(
                self.latest_joints
            ).unsqueeze(0).float().to(self.device),
        }
        
        # Predict action
        with torch.no_grad():
            action = self.policy.select_action(obs)
        
        action_np = action.squeeze().cpu().numpy()
        
        # Publish
        msg = JointTrajectoryPoint()
        msg.positions = action_np.tolist()
        msg.time_from_start.sec = 0
        msg.time_from_start.nanosec = int(1e8)  # 100ms
        self.action_pub.publish(msg)

def main():
    rclpy.init()
    node = VLAPolicyNode()
    rclpy.spin(node)
    node.destroy_node()
    rclpy.shutdown()

if __name__ == '__main__':
    main()

Debugging Common Failures

Failure Taxonomy

Failure Mode	Symptom	Root Cause	Fix
Frozen robot	Robot không di chuyển	Observation mismatch (wrong camera, wrong joint order)	Verify obs matches training format
Wild oscillation	Robot rung lắc mạnh	Action scale mismatch	Check action normalization stats
Drift	Robot từ từ lệch khỏi target	Latency accumulation	Reduce inference time, add action smoothing
Collision	Robot va vào vật cản	Missing safety limits	Add workspace bounds, torque limits
Grasp fail	Không nắm được vật	Visual domain gap	More visual randomization, finetune on real images

Action Smoothing Filter

class ActionSmoother:
    """Exponential moving average for smooth actions."""
    
    def __init__(self, alpha=0.7, action_dim=7):
        self.alpha = alpha
        self.prev_action = np.zeros(action_dim)
        self.initialized = False
    
    def smooth(self, raw_action):
        if not self.initialized:
            self.prev_action = raw_action.copy()
            self.initialized = True
            return raw_action
        
        smoothed = self.alpha * raw_action + (1 - self.alpha) * self.prev_action
        self.prev_action = smoothed.copy()
        return smoothed

# Usage
smoother = ActionSmoother(alpha=0.7)
smoothed_action = smoother.smooth(raw_policy_action)

Complete Deployment Checklist

Pre-deployment

Camera intrinsic calibration (reprojection error < 0.5px)
Hand-eye calibration (position error < 5mm)
Joint order matches training (verify with manual test)
Action space bounds verified against robot limits
Emergency stop (e-stop) tested and accessible
Workspace boundaries configured in software
Inference latency measured (< 200ms for 5Hz control)
GPU memory sufficient (check with nvidia-smi)

During deployment

Start with slow speed (10% of max velocity)
Run 5 episodes with soft objects first
Monitor joint torques — spikes indicate collision
Log all observations and actions for post-analysis
Gradually increase speed over 20+ successful episodes

Post-deployment

Calculate success rate over 50+ episodes
Analyze failure modes (categorize using table above)
Collect real-world failure cases for fine-tuning
Compare sim vs real performance gap

Real-World Fine-tuning với LeRobot

Nếu sim-trained policy đạt < 80% trên real robot, fine-tune trên real data là bước tiếp theo:

from lerobot.common.datasets.lerobot_dataset import LeRobotDataset
from lerobot.common.policies.diffusion.modeling_diffusion import DiffusionPolicy

# 1. Collect 50-100 real demonstrations
# (dùng teleop giống bài 2, nhưng trên robot thật)

# 2. Load sim-pretrained policy
policy = DiffusionPolicy.from_pretrained("my-sim-trained-policy")

# 3. Load real dataset
real_dataset = LeRobotDataset("my-real-demos")

# 4. Fine-tune with lower learning rate
from lerobot.scripts.train import train
train(
    policy=policy,
    dataset=real_dataset,
    training_steps=10000,        # Ít hơn sim training
    learning_rate=1e-5,          # 10x thấp hơn
    batch_size=32,
    eval_freq=1000,
)

Chiến lược này được gọi là sim-to-real-to-sim — train trong sim, fine-tune trên real, rồi dùng real data để improve sim.

Tổng kết Series

Qua 10 bài, chúng ta đã xây dựng complete pipeline:

Framework deep dive — hiểu LeRobot từ trong ra ngoài
Data collection — thu thập demonstration quality
Single-arm training — ACT vs Diffusion Policy
Multi-object — scaling lên nhiều vật thể
Long-horizon — multi-step task planning
Dual-arm setup — bimanual calibration
Dual-arm tasks — coordination training
Mobile manipulation — base + arms
Humanoid — whole-body control
Sim-to-Real (bài này) — deployment pipeline

LeRobot đang phát triển rất nhanh — theo dõi GitHub repo để cập nhật. Chúc bạn deploy thành công!

Tài liệu tham khảo

Domain Randomization for Transferring Deep Neural Networks — Tobin et al., 2017
Sim-to-Real Robot Learning from Pixels with Progressive Nets — Rusu et al., 2017
Closing the Sim-to-Real Loop — Chebotar et al., 2019
LeRobot: State-of-the-art Machine Learning for Real-World Robotics — Hugging Face, 2024

Bài cuối cùng: Từ Simulation đến Thế giới thật

Tại sao Sim-to-Real khó?

Reality Gap

Domain Randomization cho Visual Policies

Visual Randomization

Physics Randomization

Camera Calibration và Real-World Setup

Camera Setup

Extrinsic Calibration (Camera-to-Robot)

Inference Optimization

Quantization cho Real-Time Inference

Performance Comparison

ROS 2 Deployment Node

Complete ROS 2 Policy Server

Debugging Common Failures

Failure Taxonomy

Action Smoothing Filter

Complete Deployment Checklist

Pre-deployment

During deployment

Post-deployment

Real-World Fine-tuning với LeRobot

Tổng kết Series

Tài liệu tham khảo

Bài viết liên quan

Nguyễn Anh Tuấn

Bài viết liên quan

PEFT/LoRA Fine-tune & Deploy VLA

SARM trong LeRobot: Reward Model cho VLA

LeRobot v0.5: Pi0-FAST + G1 Whole-Body Control

Bài cuối cùng: Từ Simulation đến Thế giới thật

Tại sao Sim-to-Real khó?

Reality Gap

Domain Randomization cho Visual Policies

Visual Randomization

Physics Randomization

Camera Calibration và Real-World Setup

Camera Setup

Extrinsic Calibration (Camera-to-Robot)

Inference Optimization

Quantization cho Real-Time Inference

Performance Comparison

ROS 2 Deployment Node

Complete ROS 2 Policy Server

Debugging Common Failures

Failure Taxonomy

Action Smoothing Filter

Complete Deployment Checklist

Pre-deployment

During deployment

Post-deployment

Real-World Fine-tuning với LeRobot

Tổng kết Series

Tài liệu tham khảo

Bài viết liên quan

Nguyễn Anh Tuấn

Bài viết liên quan

PEFT/LoRA Fine-tune & Deploy VLA

SARM trong LeRobot: Reward Model cho VLA

LeRobot v0.5: Pi0-FAST + G1 Whole-Body Control