Sim-to-Real cho Locomotion: Thực tế và kinh nghiệm

Sim-to-Real cho Locomotion: Không chỉ là Domain Randomization

Nếu bạn đã đọc các bài tổng quát về sim-to-real transfer (Sim-to-Real Transfer: Train simulation, chạy thực tế, Domain Randomization), bạn đã nắm được concepts cơ bản. Bài này đi sâu vào sim-to-real specific cho locomotion -- những vấn đề và giải pháp mà chỉ xuất hiện khi train robot đi, chạy, nhảy.

Locomotion sim-to-real khác manipulation sim-to-real ở 3 điểm quan trọng:

Contact dynamics phức tạp hơn: Chân robot liên tục va chạm mặt đất, mỗi bước tạo impact forces khác nhau tùy terrain, tốc độ, góc tiếp xúc
Actuator dynamics quan trọng hơn: Robot locomotion push actuators đến giới hạn (high torque, high speed) -- nơi mà motor response phi tuyến nhất
Tích lũy lỗi nhanh hơn: Mỗi bước sai ảnh hưởng bước tiếp theo. Sau vài giây, small errors có thể snowball thành robot ngã

Trong bài này, mình sẽ phân tích từng gap, giải pháp tương ứng, và chia sẻ best practices từ các papers và teams hàng đầu.

Các Sim-to-Real Gaps chính trong Locomotion

Gap 1: Actuator Dynamics

Vấn đề: Trong simulation, motors thường được model là ideal torque source -- bạn command torque, motor output đúng torque đó ngay lập tức. Thực tế hoàn toàn khác:

Torque response có delay: 5-20ms latency từ command đến actual torque
Velocity-dependent torque: Motor yếu hơn ở tốc độ cao (back-EMF effect)
Temperature-dependent: Motor nóng → output torque giảm
Friction và backlash: Gears có friction, backlash gây hysteresis
Torque saturation: Non-linear clipping khi gần max torque

Đây là gap lớn nhất cho locomotion vì robot liên tục operate gần actuator limits (nhảy, chạy nhanh, recovery từ perturbation).

Gap 2: Terrain Friction và Contact

Vấn đề: Simulation dùng simplified contact models (thường Coulomb friction với fixed coefficient). Real world phức tạp hơn:

Friction thay đổi: Sàn gỗ (mu=0.4), thảm (mu=0.8), sàn ướt (mu=0.2)
Anisotropic friction: Friction khác nhau theo hướng (carpets, grooved surfaces)
Deformable terrain: Đất mềm, cỏ, cát -- chân robot lún xuống
Foot geometry: Sim dùng simple collision shapes, real foot có complex geometry

Gap 3: Sensor Noise và Latency

Vấn đề cho locomotion cụ thể:

IMU drift: Gyroscope drift tích lũy theo thời gian, đặc biệt problematic cho balance
Joint encoder noise: +/- 0.1-0.5 degree, tích lũy qua kinematic chain
Foot contact detection: Binary contact sensors không chính xác -- thường dùng thresholding trên foot force
Communication latency: 1-5ms bus delay giữa sensors và controller

Gap 4: Unmodeled Dynamics

Những thứ simulation thường bỏ qua:

Cable routing: Dây điện ảnh hưởng joint movement
Structural flexibility: Cơ khí không rigid tuyệt đối
Thermal effects: Motor nóng → performance thay đổi theo thời gian
Battery voltage drop: Voltage giảm → motor output thay đổi

Gap giữa simulation và thực tế trong robot locomotion

Giải pháp 1: Actuator Network

Actuator network là technique quan trọng nhất cho locomotion sim-to-real, được giới thiệu bởi Hwangbo et al. trong paper Learning Agile and Dynamic Motor Skills for Legged Robots trên ANYmal.

Concept

Thay vì dùng ideal motor model trong sim, train một neural network để predict actual motor response:

Input: [position_error_history, velocity_history, command_history]
       (window 50-100 timesteps)
Output: predicted_actual_torque

Thu thập data

Đặt robot thật trên stand (chân không chạm đất)
Gửi random joint commands (sinusoidal, step, random)
Record: commanded position, actual position, actual velocity, actual torque (nếu có torque sensor)
~30 phút data đủ cho 1 robot

Train actuator network

class ActuatorNet(nn.Module):
    def __init__(self, history_length=50):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(history_length * 3, 256),  # pos_err, vel, cmd
            nn.ELU(),
            nn.Linear(256, 128),
            nn.ELU(),
            nn.Linear(128, 1)  # predicted torque
        )

    def forward(self, pos_error_hist, vel_hist, cmd_hist):
        x = torch.cat([pos_error_hist, vel_hist, cmd_hist], dim=-1)
        return self.net(x)

Integrate vào simulation

Thay motor model mặc định bằng actuator network:

# Trong simulation loop
for step in range(num_steps):
    action = policy(obs)                    # RL policy output
    # Thay vì: sim.set_torque(action)
    predicted_torque = actuator_net(        # Qua actuator network
        pos_error_history, vel_history, action_history
    )
    sim.set_torque(predicted_torque)        # Apply predicted torque
    sim.step()

Kết quả

ANYmal với actuator network:

~500K timesteps/second simulation speed (đủ nhanh cho RL)
Sim-to-real gap giảm đáng kể: Policy transfer thành công mà không cần heavy domain randomization
Actuator network chạy tại ~50K Hz inference -- nhẹ hơn physics simulation

Giới hạn

Cần torque sensing trên robot (không phải robot nào cũng có)
Actuator net specific cho từng robot -- phải re-collect data và re-train cho robot mới
Không capture temperature-dependent changes (trừ khi collect data ở nhiều temperatures)

Giải pháp 2: Domain Randomization cho Locomotion

Domain randomization (DR) là approach "brute force" -- thay vì model chính xác real world, randomize simulation parameters để policy robust với mọi variation.

Ranges khuyến nghị cho locomotion

Dựa trên papers từ ETH Zurich (ANYmal), CMU (Unitree), Berkeley (Cassie):

Parameter	Range	Ghi chú
Body mass	+/- 15-25%	Simulate payload variation
CoM position	+/- 3-5 cm	Manufacturing tolerance
Friction coefficient	0.3 - 2.0	Diverse floor types
Restitution	0.0 - 0.5	Bounce on contact
Motor strength	+/- 10-20%	Motor variation
Joint damping	+/- 30%	Mechanical wear
Action delay	0 - 20 ms	Communication latency
PD gains	+/- 20%	Controller variation
Terrain height noise	+/- 2-5 cm	Uneven ground
Push force	0 - 80 N	External perturbation
Gravity direction	+/- 3 degrees	IMU calibration error

Adaptive Domain Randomization

Thay vì fixed ranges, adaptive DR điều chỉnh ranges theo performance:

# Automatic Domain Randomization (ADR)
for epoch in training:
    success_rate = evaluate(policy, current_dr_ranges)
    if success_rate > 0.8:
        # Expand ranges (make harder)
        dr_ranges *= 1.1
    elif success_rate < 0.5:
        # Shrink ranges (make easier)
        dr_ranges *= 0.9

OpenAI dùng ADR cho Rubik's Cube manipulation. Cho locomotion, approach tương tự áp dụng được nhưng cần careful tuning -- quá aggressive expansion có thể tạo impossible scenarios.

Giải pháp 3: Terrain Curriculum

Specific cho locomotion, terrain curriculum là form of domain randomization cho ground geometry.

Progressive Difficulty

Phase 1 (0-500 epochs):     Flat ground only
Phase 2 (500-1000 epochs):  + Rough terrain (2cm noise)
Phase 3 (1000-1500 epochs): + Slopes (5-15 degrees)
Phase 4 (1500-2000 epochs): + Stairs (10-20cm steps)
Phase 5 (2000+ epochs):     + Mixed (all terrains random)

Terrain Types quan trọng

Height noise terrain: Random height perturbations, simulate uneven ground
Discrete obstacles: Random boxes/barriers on flat ground
Gaps: Missing ground sections (robot phải step over)
Slopes: Inclines và declines
Stairs: Regular steps, ascending và descending
Stepping stones: Discrete footholds với gaps between

Success Rates trong thực tế

Dựa trên published results và reported numbers từ các teams:

Platform	Task	Sim Success	Real Success	Gap
ANYmal-D	Rough terrain walking	98%	~93%	5%
Unitree A1	Stair climbing	95%	~91%	4%
Cassie	Flat walking	99%	~95%	4%
Digit	Warehouse walking	92%	~84%	8%
Unitree H1	Flat walking	97%	~90%	7%
ANYmal-C	Parkour (mixed)	85%	~75%	10%

Observations:

Quadruped > Bipedal: ANYmal/A1 consistently chuyển tốt hơn Digit/H1
Flat > Complex terrain: Gap tăng khi terrain difficulty tăng
Actuator quality matters: QDD actuators (Berkeley Humanoid) có gap nhỏ hơn geared motors

Step-by-Step: Train → Deploy Locomotion Policy

Step 1: Setup Simulation

# Clone Humanoid-Gym hoặc legged_gym
git clone https://github.com/roboterax/humanoid-gym
cd humanoid-gym
pip install -e .

# Hoặc cho quadruped
git clone https://github.com/leggedrobotics/legged_gym

Configure robot URDF, reward weights, DR ranges.

Step 2: Train với Progressive Curriculum

# Config example
train_cfg = {
    "terrain": {
        "curriculum": True,
        "terrain_types": ["flat", "rough", "slope", "stairs"],
        "difficulty_scale": [0.0, 0.25, 0.5, 0.75, 1.0],
    },
    "domain_randomization": {
        "mass_range": [-0.15, 0.15],        # +/- 15%
        "friction_range": [0.4, 1.8],
        "motor_strength_range": [0.9, 1.1],
        "push_force_range": [0, 50],          # Newtons
        "action_delay": [0, 0.02],            # seconds
    },
    "reward": {
        "forward_vel": 1.0,
        "upright": 0.5,
        "energy": -0.001,
        "action_rate": -0.01,
        "feet_air_time": 0.5,                # Encourage lifting feet
    }
}

Train 2000-5000 epochs, ~2-8 giờ trên RTX 4090 (4096 parallel envs).

Step 3: Evaluate trong Sim

Trước khi deploy, evaluate kỹ trong simulation:

# Test scenarios
test_scenarios = [
    "flat_ground_forward_1.0ms",
    "flat_ground_turning_0.5rads",
    "rough_terrain_forward_0.5ms",
    "slope_15deg_ascending",
    "stairs_15cm_ascending",
    "push_recovery_30N_lateral",
    "push_recovery_50N_lateral",
]

for scenario in test_scenarios:
    success_rate = evaluate(policy, scenario, n_trials=100)
    print(f"{scenario}: {success_rate:.1%}")
    assert success_rate > 0.85, f"FAIL: {scenario}"

Red flags trong sim evaluation:

Success rate < 90% trên flat ground → policy chưa converge
Robot falls khi bị push 30N → balance reward quá thấp
Shuffling gait → cần periodic gait reward
High energy consumption → energy penalty quá thấp

Step 4: Sim-to-Sim Verification (Optional nhưng khuyến nghị)

Transfer policy từ Isaac Gym sang MuJoCo (hoặc ngược lại):

# Load policy trained in Isaac Gym
policy = load_policy("checkpoints/best_policy.pt")

# Test trong MuJoCo
mujoco_env = create_mujoco_env(robot_urdf)
for episode in range(50):
    obs = mujoco_env.reset()
    for step in range(1000):
        action = policy(obs)
        obs, reward, done, info = mujoco_env.step(action)
        if done:
            break
    print(f"Episode {episode}: survived {step} steps")

Nếu policy hoạt động tốt trong cả hai simulators, khả năng cao sẽ transfer tốt sang real robot.

Quy trình sim-to-real deployment cho robot locomotion

Step 5: Real-World Deployment

Deploy checklist:

Safety harness cho robot (nhất là bipedal)
Flat ground trước, complex terrain sau
Start với low velocity commands (0.3 m/s)
Monitor motor temperatures
Record data cho debugging

# Deploy script (simplified)
import robot_sdk

robot = robot_sdk.connect("192.168.1.100")
policy = load_policy("checkpoints/best_policy.pt")

# Safety limits
MAX_VEL = 0.5  # Start slow
MAX_TORQUE = 0.8 * robot.max_torque  # 80% limit

while True:
    obs = robot.get_observation()
    action = policy(obs)
    action = clip(action, -MAX_TORQUE, MAX_TORQUE)

    robot.set_action(action)
    robot.step()

    # Safety check
    if robot.is_fallen():
        robot.stop()
        break

Step 6: Iterate

Sau deployment lần đầu, observe failure modes và quay lại sim:

Robot trượt trên sàn → tăng friction randomization range
Motor overheat → thêm torque penalty, giảm max velocity
Robot không stable trên terrain thật → thêm terrain noise trong sim
Latency gây instability → tăng action delay randomization

Common Failure Modes và Fixes

Failure 1: Robot ngã ngay khi đặt xuống

Nguyên nhân: Initial state khác giữa sim và real. Sim thường bắt đầu từ perfect standing, real robot có slight imbalance.

Fix: Randomize initial state trong sim (body orientation +/- 5 degrees, velocity +/- 0.1 m/s).

Failure 2: Gait không smooth, giật

Nguyên nhân: Action delay trong real system lớn hơn sim. Policy output thay đổi nhanh nhưng robot response chậm.

Fix: (a) Thêm action delay randomization (0-20ms), (b) Thêm action rate penalty trong reward, (c) Giảm control frequency.

Failure 3: Robot trượt khi turning

Nguyên nhân: Friction coefficient trong sim khác real. Policy học dựa trên higher friction.

Fix: Randomize friction coefficient rộng hơn (0.3-2.0). Thêm lateral velocity penalty.

Failure 4: Motor overheat sau vài phút

Nguyên nhân: Policy dùng quá nhiều torque. Sim không model thermal limits.

Fix: (a) Tăng energy penalty weight, (b) Thêm torque RMS constraint, (c) Giảm max action magnitude.

Failure 5: Policy hoạt động khác nhau mỗi lần deploy

Nguyên nhân: Policy sensitive với initial conditions hoặc sensor noise. Không robust.

Fix: Increase domain randomization ranges, đặc biệt sensor noise và initial state randomization.

Papers quan trọng cho Sim-to-Real Locomotion

Danh sách papers essential đọc:

Learning Agile and Dynamic Motor Skills for Legged Robots (arXiv:1901.08652) -- Hwangbo et al., 2019. Actuator network cho ANYmal, foundation paper.
Learning Robust, Agile, Natural Legged Locomotion Skills in the Wild (arXiv:2304.10888) -- Adversarial training + teacher-student cho natural locomotion on wild terrain.
Reinforcement Learning for Versatile, Dynamic, and Robust Bipedal Locomotion Control (arXiv:2401.16889) -- Comprehensive bipedal RL framework trên Cassie.
Humanoid-Gym: RL for Humanoid Robot with Zero-Shot Sim2Real Transfer (arXiv:2404.05695) -- Open-source framework, verified zero-shot transfer.
Berkeley Humanoid: A Research Platform for Learning-based Control (arXiv:2407.21781) -- Hardware co-design giúp sim-to-real dễ hơn.
Towards Bridging the Gap: Systematic Sim-to-Real Transfer for Diverse Legged Robots (arXiv:2509.06342) -- PACE: joint-space dynamics alignment, alternative to actuator networks.

Best Practices tổng hợp

Từ kinh nghiệm của các teams hàng đầu (ETH RSL, CMU, Berkeley, Unitree):

1. Start simple, add complexity gradually

Flat ground walking trước
Thêm terrain curriculum sau khi flat walking stable
Thêm vision sau khi proprioceptive policy robust

2. Actuator modeling > Domain randomization

Nếu có torque sensor → train actuator network (tốt nhất)
Nếu không → aggressive motor strength randomization (+/- 20%)
QDD actuators giảm cần cả hai

3. Sim-to-sim trước sim-to-real

Train Isaac Gym → test MuJoCo (hoặc ngược lại)
Nếu transfer tốt giữa 2 sims → likely transfer tốt sang real

4. Safety first khi deploy

Harness cho bipedal (luôn luôn)
Start low velocity, tăng dần
Monitor motor temperature real-time
Kill switch accessible

5. Log everything

Record toàn bộ observations, actions, rewards trong real deployment
So sánh distributions với sim data → identify gaps
Dùng real data để refine sim parameters

Kết luận

Sim-to-real cho locomotion là nghệ thuật kết hợp nhiều techniques: actuator network cho accurate motor modeling, domain randomization cho robustness, terrain curriculum cho progressive difficulty, và careful engineering cho deployment. Không có silver bullet -- mỗi robot, mỗi task cần combination khác nhau.

Key takeaway: Hardware co-design matters. Berkeley Humanoid với QDD actuators cần less sim-to-real effort hơn robots với complex geared transmissions. Nếu bạn đang design robot mới, chọn actuators dễ simulate.

Đọc lại toàn bộ series:

Và các bài liên quan trong Simulation series:

Sim-to-Real cho Locomotion: Không chỉ là Domain Randomization

Locomotion sim-to-real khác manipulation sim-to-real ở 3 điểm quan trọng:

Contact dynamics phức tạp hơn: Chân robot liên tục va chạm mặt đất, mỗi bước tạo impact forces khác nhau tùy terrain, tốc độ, góc tiếp xúc
Actuator dynamics quan trọng hơn: Robot locomotion push actuators đến giới hạn (high torque, high speed) -- nơi mà motor response phi tuyến nhất
Tích lũy lỗi nhanh hơn: Mỗi bước sai ảnh hưởng bước tiếp theo. Sau vài giây, small errors có thể snowball thành robot ngã

Trong bài này, mình sẽ phân tích từng gap, giải pháp tương ứng, và chia sẻ best practices từ các papers và teams hàng đầu.

Các Sim-to-Real Gaps chính trong Locomotion

Gap 1: Actuator Dynamics

Torque response có delay: 5-20ms latency từ command đến actual torque
Velocity-dependent torque: Motor yếu hơn ở tốc độ cao (back-EMF effect)
Temperature-dependent: Motor nóng → output torque giảm
Friction và backlash: Gears có friction, backlash gây hysteresis
Torque saturation: Non-linear clipping khi gần max torque

Đây là gap lớn nhất cho locomotion vì robot liên tục operate gần actuator limits (nhảy, chạy nhanh, recovery từ perturbation).

Gap 2: Terrain Friction và Contact

Vấn đề: Simulation dùng simplified contact models (thường Coulomb friction với fixed coefficient). Real world phức tạp hơn:

Friction thay đổi: Sàn gỗ (mu=0.4), thảm (mu=0.8), sàn ướt (mu=0.2)
Anisotropic friction: Friction khác nhau theo hướng (carpets, grooved surfaces)
Deformable terrain: Đất mềm, cỏ, cát -- chân robot lún xuống
Foot geometry: Sim dùng simple collision shapes, real foot có complex geometry

Gap 3: Sensor Noise và Latency

Vấn đề cho locomotion cụ thể:

IMU drift: Gyroscope drift tích lũy theo thời gian, đặc biệt problematic cho balance
Joint encoder noise: +/- 0.1-0.5 degree, tích lũy qua kinematic chain
Foot contact detection: Binary contact sensors không chính xác -- thường dùng thresholding trên foot force
Communication latency: 1-5ms bus delay giữa sensors và controller

Gap 4: Unmodeled Dynamics

Những thứ simulation thường bỏ qua:

Cable routing: Dây điện ảnh hưởng joint movement
Structural flexibility: Cơ khí không rigid tuyệt đối
Thermal effects: Motor nóng → performance thay đổi theo thời gian
Battery voltage drop: Voltage giảm → motor output thay đổi

Gap giữa simulation và thực tế trong robot locomotion

Giải pháp 1: Actuator Network

Concept

Thay vì dùng ideal motor model trong sim, train một neural network để predict actual motor response:

Input: [position_error_history, velocity_history, command_history]
       (window 50-100 timesteps)
Output: predicted_actual_torque

Thu thập data

Đặt robot thật trên stand (chân không chạm đất)
Gửi random joint commands (sinusoidal, step, random)
Record: commanded position, actual position, actual velocity, actual torque (nếu có torque sensor)
~30 phút data đủ cho 1 robot

Train actuator network

class ActuatorNet(nn.Module):
    def __init__(self, history_length=50):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(history_length * 3, 256),  # pos_err, vel, cmd
            nn.ELU(),
            nn.Linear(256, 128),
            nn.ELU(),
            nn.Linear(128, 1)  # predicted torque
        )

    def forward(self, pos_error_hist, vel_hist, cmd_hist):
        x = torch.cat([pos_error_hist, vel_hist, cmd_hist], dim=-1)
        return self.net(x)

Integrate vào simulation

Thay motor model mặc định bằng actuator network:

# Trong simulation loop
for step in range(num_steps):
    action = policy(obs)                    # RL policy output
    # Thay vì: sim.set_torque(action)
    predicted_torque = actuator_net(        # Qua actuator network
        pos_error_history, vel_history, action_history
    )
    sim.set_torque(predicted_torque)        # Apply predicted torque
    sim.step()

Kết quả

ANYmal với actuator network:

~500K timesteps/second simulation speed (đủ nhanh cho RL)
Sim-to-real gap giảm đáng kể: Policy transfer thành công mà không cần heavy domain randomization
Actuator network chạy tại ~50K Hz inference -- nhẹ hơn physics simulation

Giới hạn

Cần torque sensing trên robot (không phải robot nào cũng có)
Actuator net specific cho từng robot -- phải re-collect data và re-train cho robot mới
Không capture temperature-dependent changes (trừ khi collect data ở nhiều temperatures)

Giải pháp 2: Domain Randomization cho Locomotion

Domain randomization (DR) là approach "brute force" -- thay vì model chính xác real world, randomize simulation parameters để policy robust với mọi variation.

Ranges khuyến nghị cho locomotion

Dựa trên papers từ ETH Zurich (ANYmal), CMU (Unitree), Berkeley (Cassie):

Parameter	Range	Ghi chú
Body mass	+/- 15-25%	Simulate payload variation
CoM position	+/- 3-5 cm	Manufacturing tolerance
Friction coefficient	0.3 - 2.0	Diverse floor types
Restitution	0.0 - 0.5	Bounce on contact
Motor strength	+/- 10-20%	Motor variation
Joint damping	+/- 30%	Mechanical wear
Action delay	0 - 20 ms	Communication latency
PD gains	+/- 20%	Controller variation
Terrain height noise	+/- 2-5 cm	Uneven ground
Push force	0 - 80 N	External perturbation
Gravity direction	+/- 3 degrees	IMU calibration error

Adaptive Domain Randomization

Thay vì fixed ranges, adaptive DR điều chỉnh ranges theo performance:

# Automatic Domain Randomization (ADR)
for epoch in training:
    success_rate = evaluate(policy, current_dr_ranges)
    if success_rate > 0.8:
        # Expand ranges (make harder)
        dr_ranges *= 1.1
    elif success_rate < 0.5:
        # Shrink ranges (make easier)
        dr_ranges *= 0.9

Giải pháp 3: Terrain Curriculum

Specific cho locomotion, terrain curriculum là form of domain randomization cho ground geometry.

Progressive Difficulty

Phase 1 (0-500 epochs):     Flat ground only
Phase 2 (500-1000 epochs):  + Rough terrain (2cm noise)
Phase 3 (1000-1500 epochs): + Slopes (5-15 degrees)
Phase 4 (1500-2000 epochs): + Stairs (10-20cm steps)
Phase 5 (2000+ epochs):     + Mixed (all terrains random)

Terrain Types quan trọng

Height noise terrain: Random height perturbations, simulate uneven ground
Discrete obstacles: Random boxes/barriers on flat ground
Gaps: Missing ground sections (robot phải step over)
Slopes: Inclines và declines
Stairs: Regular steps, ascending và descending
Stepping stones: Discrete footholds với gaps between

Success Rates trong thực tế

Dựa trên published results và reported numbers từ các teams:

Platform	Task	Sim Success	Real Success	Gap
ANYmal-D	Rough terrain walking	98%	~93%	5%
Unitree A1	Stair climbing	95%	~91%	4%
Cassie	Flat walking	99%	~95%	4%
Digit	Warehouse walking	92%	~84%	8%
Unitree H1	Flat walking	97%	~90%	7%
ANYmal-C	Parkour (mixed)	85%	~75%	10%

Observations:

Quadruped > Bipedal: ANYmal/A1 consistently chuyển tốt hơn Digit/H1
Flat > Complex terrain: Gap tăng khi terrain difficulty tăng
Actuator quality matters: QDD actuators (Berkeley Humanoid) có gap nhỏ hơn geared motors

Step-by-Step: Train → Deploy Locomotion Policy

Step 1: Setup Simulation

# Clone Humanoid-Gym hoặc legged_gym
git clone https://github.com/roboterax/humanoid-gym
cd humanoid-gym
pip install -e .

# Hoặc cho quadruped
git clone https://github.com/leggedrobotics/legged_gym

Configure robot URDF, reward weights, DR ranges.

Step 2: Train với Progressive Curriculum

# Config example
train_cfg = {
    "terrain": {
        "curriculum": True,
        "terrain_types": ["flat", "rough", "slope", "stairs"],
        "difficulty_scale": [0.0, 0.25, 0.5, 0.75, 1.0],
    },
    "domain_randomization": {
        "mass_range": [-0.15, 0.15],        # +/- 15%
        "friction_range": [0.4, 1.8],
        "motor_strength_range": [0.9, 1.1],
        "push_force_range": [0, 50],          # Newtons
        "action_delay": [0, 0.02],            # seconds
    },
    "reward": {
        "forward_vel": 1.0,
        "upright": 0.5,
        "energy": -0.001,
        "action_rate": -0.01,
        "feet_air_time": 0.5,                # Encourage lifting feet
    }
}

Train 2000-5000 epochs, ~2-8 giờ trên RTX 4090 (4096 parallel envs).

Step 3: Evaluate trong Sim

Trước khi deploy, evaluate kỹ trong simulation:

# Test scenarios
test_scenarios = [
    "flat_ground_forward_1.0ms",
    "flat_ground_turning_0.5rads",
    "rough_terrain_forward_0.5ms",
    "slope_15deg_ascending",
    "stairs_15cm_ascending",
    "push_recovery_30N_lateral",
    "push_recovery_50N_lateral",
]

for scenario in test_scenarios:
    success_rate = evaluate(policy, scenario, n_trials=100)
    print(f"{scenario}: {success_rate:.1%}")
    assert success_rate > 0.85, f"FAIL: {scenario}"

Red flags trong sim evaluation:

Success rate < 90% trên flat ground → policy chưa converge
Robot falls khi bị push 30N → balance reward quá thấp
Shuffling gait → cần periodic gait reward
High energy consumption → energy penalty quá thấp

Step 4: Sim-to-Sim Verification (Optional nhưng khuyến nghị)

Transfer policy từ Isaac Gym sang MuJoCo (hoặc ngược lại):

# Load policy trained in Isaac Gym
policy = load_policy("checkpoints/best_policy.pt")

# Test trong MuJoCo
mujoco_env = create_mujoco_env(robot_urdf)
for episode in range(50):
    obs = mujoco_env.reset()
    for step in range(1000):
        action = policy(obs)
        obs, reward, done, info = mujoco_env.step(action)
        if done:
            break
    print(f"Episode {episode}: survived {step} steps")

Nếu policy hoạt động tốt trong cả hai simulators, khả năng cao sẽ transfer tốt sang real robot.

Quy trình sim-to-real deployment cho robot locomotion

Step 5: Real-World Deployment

Deploy checklist:

Safety harness cho robot (nhất là bipedal)
Flat ground trước, complex terrain sau
Start với low velocity commands (0.3 m/s)
Monitor motor temperatures
Record data cho debugging

# Deploy script (simplified)
import robot_sdk

robot = robot_sdk.connect("192.168.1.100")
policy = load_policy("checkpoints/best_policy.pt")

# Safety limits
MAX_VEL = 0.5  # Start slow
MAX_TORQUE = 0.8 * robot.max_torque  # 80% limit

while True:
    obs = robot.get_observation()
    action = policy(obs)
    action = clip(action, -MAX_TORQUE, MAX_TORQUE)

    robot.set_action(action)
    robot.step()

    # Safety check
    if robot.is_fallen():
        robot.stop()
        break

Step 6: Iterate

Sau deployment lần đầu, observe failure modes và quay lại sim:

Robot trượt trên sàn → tăng friction randomization range
Motor overheat → thêm torque penalty, giảm max velocity
Robot không stable trên terrain thật → thêm terrain noise trong sim
Latency gây instability → tăng action delay randomization

Common Failure Modes và Fixes

Failure 1: Robot ngã ngay khi đặt xuống

Nguyên nhân: Initial state khác giữa sim và real. Sim thường bắt đầu từ perfect standing, real robot có slight imbalance.

Fix: Randomize initial state trong sim (body orientation +/- 5 degrees, velocity +/- 0.1 m/s).

Failure 2: Gait không smooth, giật

Nguyên nhân: Action delay trong real system lớn hơn sim. Policy output thay đổi nhanh nhưng robot response chậm.

Fix: (a) Thêm action delay randomization (0-20ms), (b) Thêm action rate penalty trong reward, (c) Giảm control frequency.

Failure 3: Robot trượt khi turning

Nguyên nhân: Friction coefficient trong sim khác real. Policy học dựa trên higher friction.

Fix: Randomize friction coefficient rộng hơn (0.3-2.0). Thêm lateral velocity penalty.

Failure 4: Motor overheat sau vài phút

Nguyên nhân: Policy dùng quá nhiều torque. Sim không model thermal limits.

Fix: (a) Tăng energy penalty weight, (b) Thêm torque RMS constraint, (c) Giảm max action magnitude.

Failure 5: Policy hoạt động khác nhau mỗi lần deploy

Nguyên nhân: Policy sensitive với initial conditions hoặc sensor noise. Không robust.

Fix: Increase domain randomization ranges, đặc biệt sensor noise và initial state randomization.

Papers quan trọng cho Sim-to-Real Locomotion

Danh sách papers essential đọc:

Learning Agile and Dynamic Motor Skills for Legged Robots (arXiv:1901.08652) -- Hwangbo et al., 2019. Actuator network cho ANYmal, foundation paper.
Learning Robust, Agile, Natural Legged Locomotion Skills in the Wild (arXiv:2304.10888) -- Adversarial training + teacher-student cho natural locomotion on wild terrain.
Reinforcement Learning for Versatile, Dynamic, and Robust Bipedal Locomotion Control (arXiv:2401.16889) -- Comprehensive bipedal RL framework trên Cassie.
Humanoid-Gym: RL for Humanoid Robot with Zero-Shot Sim2Real Transfer (arXiv:2404.05695) -- Open-source framework, verified zero-shot transfer.
Berkeley Humanoid: A Research Platform for Learning-based Control (arXiv:2407.21781) -- Hardware co-design giúp sim-to-real dễ hơn.
Towards Bridging the Gap: Systematic Sim-to-Real Transfer for Diverse Legged Robots (arXiv:2509.06342) -- PACE: joint-space dynamics alignment, alternative to actuator networks.

Best Practices tổng hợp

Từ kinh nghiệm của các teams hàng đầu (ETH RSL, CMU, Berkeley, Unitree):

1. Start simple, add complexity gradually

Flat ground walking trước
Thêm terrain curriculum sau khi flat walking stable
Thêm vision sau khi proprioceptive policy robust

2. Actuator modeling > Domain randomization

Nếu có torque sensor → train actuator network (tốt nhất)
Nếu không → aggressive motor strength randomization (+/- 20%)
QDD actuators giảm cần cả hai

3. Sim-to-sim trước sim-to-real

Train Isaac Gym → test MuJoCo (hoặc ngược lại)
Nếu transfer tốt giữa 2 sims → likely transfer tốt sang real

4. Safety first khi deploy

Harness cho bipedal (luôn luôn)
Start low velocity, tăng dần
Monitor motor temperature real-time
Kill switch accessible

5. Log everything

Record toàn bộ observations, actions, rewards trong real deployment
So sánh distributions với sim data → identify gaps
Dùng real data để refine sim parameters

Kết luận

Đọc lại toàn bộ series:

Và các bài liên quan trong Simulation series:

Sim-to-Real cho Locomotion: Không chỉ là Domain Randomization

Các Sim-to-Real Gaps chính trong Locomotion

Gap 1: Actuator Dynamics

Gap 2: Terrain Friction và Contact

Gap 3: Sensor Noise và Latency

Gap 4: Unmodeled Dynamics

Giải pháp 1: Actuator Network

Concept

Thu thập data

Train actuator network

Integrate vào simulation

Kết quả

Giới hạn

Giải pháp 2: Domain Randomization cho Locomotion

Ranges khuyến nghị cho locomotion

Adaptive Domain Randomization

Giải pháp 3: Terrain Curriculum

Progressive Difficulty

Terrain Types quan trọng

Success Rates trong thực tế

Step-by-Step: Train → Deploy Locomotion Policy

Step 1: Setup Simulation

Step 2: Train với Progressive Curriculum

Step 3: Evaluate trong Sim

Step 4: Sim-to-Sim Verification (Optional nhưng khuyến nghị)

Step 5: Real-World Deployment

Step 6: Iterate

Common Failure Modes và Fixes

Failure 1: Robot ngã ngay khi đặt xuống

Failure 2: Gait không smooth, giật

Failure 3: Robot trượt khi turning

Failure 4: Motor overheat sau vài phút

Failure 5: Policy hoạt động khác nhau mỗi lần deploy

Papers quan trọng cho Sim-to-Real Locomotion

Best Practices tổng hợp

1. Start simple, add complexity gradually

2. Actuator modeling > Domain randomization

3. Sim-to-sim trước sim-to-real

4. Safety first khi deploy

5. Log everything

Kết luận

Bài viết liên quan

Nguyễn Anh Tuấn

Bài viết liên quan

Robot Parkour: Nhảy, leo cầu thang bằng RL

Walk These Ways: Adaptive locomotion một policy

Quadruped Locomotion: legged_gym đến Unitree Go2

Sim-to-Real cho Locomotion: Không chỉ là Domain Randomization

Các Sim-to-Real Gaps chính trong Locomotion

Gap 1: Actuator Dynamics

Gap 2: Terrain Friction và Contact

Gap 3: Sensor Noise và Latency

Gap 4: Unmodeled Dynamics

Giải pháp 1: Actuator Network

Concept

Thu thập data

Train actuator network

Integrate vào simulation

Kết quả

Giới hạn

Giải pháp 2: Domain Randomization cho Locomotion

Ranges khuyến nghị cho locomotion

Adaptive Domain Randomization

Giải pháp 3: Terrain Curriculum

Progressive Difficulty

Terrain Types quan trọng

Success Rates trong thực tế

Step-by-Step: Train → Deploy Locomotion Policy

Step 1: Setup Simulation

Step 2: Train với Progressive Curriculum

Step 3: Evaluate trong Sim

Step 4: Sim-to-Sim Verification (Optional nhưng khuyến nghị)

Step 5: Real-World Deployment

Step 6: Iterate

Common Failure Modes và Fixes

Failure 1: Robot ngã ngay khi đặt xuống

Failure 2: Gait không smooth, giật

Failure 3: Robot trượt khi turning

Failure 4: Motor overheat sau vài phút

Failure 5: Policy hoạt động khác nhau mỗi lần deploy