Bipedal Walking: Điều khiển robot 2 chân bằng RL

Tại sao Bipedal khó hơn Quadruped?

Nếu bạn đã theo dõi series từ Part 1 đến Part 5, bạn thấy rằng quadruped locomotion đã đạt được kết quả ấn tượng -- ANYmal chạy parkour, Unitree A1 leo cầu thang. Nhưng khi chuyển sang robot 2 chân (bipedal), mọi thứ khó hơn gấp bội.

Underactuated Dynamics

Robot quadruped có 4 điểm tiếp xúc mặt đất, tạo thành support polygon rộng. Trọng tâm (Center of Mass - CoM) dễ dàng nằm trong polygon này, nên robot ổn định tĩnh.

Robot bipedal chỉ có 2 chân, và trong phase swing (một chân nhấc lên), chỉ còn 1 điểm tiếp xúc. Support polygon thu nhỏ thành diện tích bàn chân. Đây là hệ underactuated -- robot không có đủ actuators để trực tiếp control tất cả degrees of freedom, phải dựa vào dynamic balance (giống con người đi bộ -- thực chất là "ngã có kiểm soát").

Các thách thức cụ thể

Thách thức	Quadruped	Bipedal
Support polygon	Rộng (4 chân)	Hẹp (1-2 chân)
Static stability	Dễ (3 chân trên đất)	Không có (phải dynamic)
Fall risk	Thấp	Rất cao
DoF cần control	12 (3/chân)	10-30+ (hips, knees, ankles, torso)
Impact forces	Phân tán 4 chân	Tập trung 2 chân
Recovery khi ngã	Dễ (crawl position)	Khó (phải đứng dậy từ nằm)

Chính vì những thách thức này, bipedal locomotion bằng RL là lĩnh vực phát triển chậm hơn so với quadruped, nhưng đang tăng tốc mạnh mẽ nhờ compute power và simulation tools tốt hơn.

Các nền tảng Bipedal chính

Cassie (Agility Robotics)

Cassie là nền tảng bipedal research phổ biến nhất, được phát triển bởi Agility Robotics (spin-off từ Oregon State University). Robot này có thiết kế đặc biệt:

Compliant leg design: Chân có spring ở knee, giúp absorb impact và store energy
Underactuated ankle: Không có ankle actuator -- buộc policy phải balance bằng hip và knee
10 DoF: 5 joints mỗi chân (2 hip, 1 knee, 2 ankle -- nhưng ankle passive)
Cân nặng: ~32 kg
Nổi tiếng: Chạy 100m trong 24.73 giây (Guinness World Record cho bipedal robot, 2022)

Cassie là testbed cho rất nhiều RL research papers vì: (a) available cho các lab mua, (b) đủ phức tạp để test advanced control, (c) sim-to-real gap được study kỹ.

Digit (Agility Robotics)

Digit là thế hệ kế tiếp của Cassie, thêm torso, arms, và head:

Full humanoid form: Torso + 2 arms + 2 legs
30 DoF: Nhiều hơn Cassie đáng kể
Mục đích: Warehouse logistics (Agility hợp tác với Amazon)
Manipulation + Locomotion: Có thể vừa đi vừa mang hàng
Cân nặng: ~65 kg

Unitree H1 / G1

Unitree H1 là humanoid robot "giá rẻ" từ Unitree Robotics (Trung Quốc):

19 DoF, 1.8m tall, ~47 kg
Giá: ~$90,000 (rẻ hơn nhiều so với Atlas hay Digit)
Tốc độ: Đạt 3.3 m/s walking speed (record cho humanoid)
Open-source friendly: Hỗ trợ Isaac Gym, MuJoCo

Unitree G1 là phiên bản nhỏ hơn, 1.27m, 35 kg, ~$16,000 -- accessible cho labs nhỏ.

Atlas (Boston Dynamics)

Atlas là humanoid nổi tiếng nhất thế giới từ Boston Dynamics:

Hydraulic → Electric: Phiên bản mới (2024) chuyển sang fully electric
28 DoF, extreme agility (backflip, dance, parkour)
Không bán: Chỉ dùng internal research
State-of-the-art hardware nhưng control chủ yếu dùng model-based (MPC), đang chuyển dần sang RL

Tesla Optimus

Tesla Optimus (Gen 2) là nỗ lực humanoid từ Tesla:

28 DoF, 1.73m, ~57 kg
Tesla-designed actuators: 14 rotary + 14 linear
Mục đích: Factory automation, general-purpose
RL training: Dùng massive compute infrastructure của Tesla

Bảng so sánh tổng hợp

Platform	DoF	Weight	Height	Max Speed	Actuator	Price	RL Research
Cassie	10	32 kg	1.1m	4.0 m/s	Electric	~$150K	Rất nhiều
Digit	30	65 kg	1.75m	1.5 m/s	Electric	~$250K	Nhiều
Unitree H1	19	47 kg	1.8m	3.3 m/s	Electric	~$90K	Đang tăng
Unitree G1	23	35 kg	1.27m	2.0 m/s	Electric	~$16K	Mới
Atlas (new)	28	~89 kg	1.5m	2.5 m/s	Electric	N/A	Internal
Tesla Optimus	28	57 kg	1.73m	1.3 m/s	Electric	N/A	Internal

RL cho Bipedal: Reward Design

Reward design cho bipedal locomotion phức tạp hơn quadruped. Ngoài các reward cơ bản (forward velocity, energy efficiency), cần thêm nhiều thành phần cho balance và naturalness.

Core Reward Components

# Reward function cho bipedal walking
reward = (
    # Forward progress
    w_vel * forward_velocity_tracking
    # Balance (quan trọng nhất cho bipedal)
    + w_balance * upright_reward          # Torso vertical
    + w_com * com_over_support_foot       # CoM trên chân trụ
    # Gait quality
    + w_gait * periodic_gait_reward       # Nhịp đi đều đặn
    + w_sym * symmetry_reward             # Hai chân đối xứng
    + w_natural * joint_angle_penalty     # Tránh pose bất thường
    # Energy
    - w_energy * torque_squared           # Tiết kiệm năng lượng
    - w_jerk * action_jerk_penalty        # Smooth control
    # Safety
    - w_contact * body_contact_penalty    # Không chạm đất bằng thân
    - w_fall * fall_penalty               # Không ngã
)

Balance Reward chi tiết

Balance là yếu tố sống còn. Các cách phổ biến:

1. Upright torso reward: Giữ torso orientation gần vertical

upright = cos(torso_pitch) * cos(torso_roll)
reward_upright = max(0, upright)  # 1.0 khi đứng thẳng

2. CoM projection reward: Trọng tâm chiếu xuống nằm trong support polygon

com_xy = get_com_projection()
support = get_support_polygon()
reward_com = is_inside(com_xy, support)

3. Angular momentum regulation: Hạn chế angular momentum quá lớn (tránh spinning out)

reward_angmom = -||angular_momentum||^2

Periodic Gait Reward

Để bipedal robot đi tự nhiên (không shuffle), dùng periodic reward:

# Phase variable: 0 → 2pi cho mỗi gait cycle
phase = (time % gait_period) / gait_period * 2 * pi

# Desired foot contact pattern
left_contact_desired = sin(phase) > 0      # Left foot stance
right_contact_desired = sin(phase + pi) > 0 # Right foot stance (opposite)

# Reward matching desired contact pattern
reward_gait = (
    match(left_foot_contact, left_contact_desired)
    + match(right_foot_contact, right_contact_desired)
)

Cách này tạo ra alternating gait tự nhiên thay vì để RL tìm ra bất kỳ gait nào (có thể là shuffling hoặc hopping).

Fall Recovery Policy

Một vấn đề quan trọng cho bipedal: robot sẽ ngã. Câu hỏi là làm sao đứng dậy?

Separate Recovery Policy

Approach phổ biến: train 2 policies riêng biệt:

Walking policy: Điều khiển đi bộ bình thường
Recovery policy: Đứng dậy từ nằm sấp/ngửa

State machine chuyển đổi:

Walking → [phát hiện ngã] → Recovery → [đứng thẳng] → Walking

Fall detection dựa trên torso orientation: nếu |pitch| > 60° hoặc |roll| > 60°, kích hoạt recovery.

Push Recovery

Thay vì chờ ngã rồi đứng dậy, push recovery giúp robot resist perturbation:

Train với random external forces (push) trong simulation
Robot học stepping strategy: bước thêm một bước về hướng bị đẩy để recover
Giống phản xạ của con người khi bị xô -- tự động bước chân ra để giữ thăng bằng

Berkeley Humanoid (arXiv:2407.21781)

Paper Berkeley Humanoid: A Research Platform for Learning-based Control giới thiệu platform humanoid mới từ UC Berkeley, thiết kế riêng cho RL research.

Design Principles

16 kg lightweight: Nhẹ hơn nhiều so với Digit (65kg) hay Atlas (89kg)
Quasi-Direct-Drive (QDD) actuators: Giảm gearing ratio, giúp sim-to-real gap nhỏ hơn vì actuator dynamics đơn giản hơn
Low cost: In-house manufactured, affordable cho university labs

RL Results

Omnidirectional walking: Forward, backward, lateral, turning
Dynamic hopping: Single-leg và double-leg hops
Outdoor terrain: Steep unpaved trails, hundreds of meters traversal
Simple RL controller: PPO với light domain randomization (không cần heavy randomization nhờ QDD actuators có sim-to-real gap nhỏ)

Điểm đáng chú ý: Berkeley Humanoid chứng minh rằng hardware design tốt có thể giảm đáng kể complexity của RL training. QDD actuators cho response gần linear, dễ simulate chính xác, nên không cần actuator network hay heavy domain randomization.

Humanoid-Gym (arXiv:2404.05695)

Humanoid-Gym là open-source RL framework cho humanoid locomotion, xây trên NVIDIA Isaac Gym. Đây là công cụ practical nhất hiện nay để bắt đầu với bipedal RL.

Features chính

Isaac Gym backend: Massive parallel training (4096+ environments)
Sim-to-sim verification: Train trong Isaac Gym, verify trong MuJoCo trước khi deploy
Pre-configured robots: Hỗ trợ RobotEra XBot-S (1.2m), XBot-L (1.65m), và có thể thêm custom robots
Zero-shot sim-to-real: Verified trên real XBot-S và XBot-L

Training Pipeline

1. Define robot URDF/MJCF
2. Configure reward weights (balance, velocity, energy...)
3. Train PPO in Isaac Gym (4096 parallel envs)
4. Verify in MuJoCo (sim-to-sim check)
5. Deploy to real robot (zero-shot)

Terrain Curriculum trong Humanoid-Gym

Humanoid-Gym hỗ trợ diverse terrains:

Flat ground (baseline)
Rough terrain (random height perturbations)
Slopes (up to 15 degrees)
Stairs (ascending + descending)
Discrete stepping stones

Dynamics randomization bao gồm:

Mass randomization: +/- 15%
Friction: 0.5 - 2.0
Motor strength: +/- 10%
Push perturbation: random forces up to 50N

Kết quả

RobotEra XBot-L (1.65m humanoid) đạt:

Stable walking trên flat ground và rough terrain
Stair climbing (ascending)
Push recovery (50N lateral forces)
Zero-shot transfer từ Isaac Gym sang real robot

RL cho Bipedal: State of the Art

Cassie RL Milestones

Research trên Cassie đã đạt được nhiều milestones quan trọng:

1. Robust Parameterized Locomotion (arXiv:2103.14295):

Train walking policy với variable speed, height, turning
Domain randomization cho sim-to-real
Đầu tiên demo robust sim-to-real bipedal walking bằng RL

2. All Common Bipedal Gaits (arXiv:2011.01387):

Single policy cho standing, walking, hopping, running, skipping
Periodic reward composition -- design reward function cho từng gait dựa trên phase variable
Chuyển đổi smooth giữa các gaits

3. Versatile Dynamic Locomotion (arXiv:2401.16889):

General solution cho diverse bipedal skills
Walking, running, jumping, standing trong cùng một framework
LSTM-based policy cho temporal reasoning

Xu hướng 2024-2026

Whole-body control: Kết hợp locomotion + manipulation (Digit mang hàng, Optimus lắp ráp)
Vision-based bipedal: Thêm camera cho terrain-aware walking (như quadruped parkour)
Foundation policies: Pre-train general locomotion policy, fine-tune cho specific tasks
Faster sim-to-real: QDD actuators + better simulation giảm gap

Practical Guide: Bắt đầu với Bipedal RL

Nếu bạn muốn thử bipedal RL:

Hardware accessible

Unitree G1 (~$16K): Giá tốt nhất cho full humanoid
Simulation only: Dùng Humanoid-Gym với MuJoCo humanoid models (miễn phí)

Software stack

Humanoid-Gym (recommended): Isaac Gym + PPO, pre-configured cho humanoid
legged_gym (ETH Zurich): Flexible hơn, hỗ trợ cả quadruped và bipedal
MuJoCo + Stable-Baselines3: Lightweight, dễ customize

Tips cho beginners

Bắt đầu với standing balance trước khi walking
Periodic gait reward quan trọng -- không có nó, robot sẽ shuffle
Curriculum: Flat → rough → slopes → stairs
Symmetry reward giúp gait tự nhiên hơn
Sim-to-sim (Isaac Gym → MuJoCo) trước khi sim-to-real

Kết luận

Bipedal locomotion bằng RL đang ở giai đoạn bùng nổ. Từ Cassie (chạy 100m record) đến Berkeley Humanoid (QDD simplicity) đến Humanoid-Gym (open-source tools), cộng đồng đang nhanh chóng thu hẹp khoảng cách với quadruped locomotion. Hardware giá rẻ hơn (Unitree G1/H1) và simulation tools tốt hơn (Isaac Gym, MuJoCo) đang democratize lĩnh vực này.

Đọc thêm các phần trước trong series:

Bài tiếp theo -- Part 7: Sim-to-Real cho Locomotion -- sẽ đi sâu vào cách chuyển policy từ simulation sang robot thật, với actuator network và best practices.