Walk These Ways: Adaptive locomotion một policy

Vấn đề: Một Gait, Một Policy?

Trong các bài trước của series (Part 1, Part 2, Part 3), chúng ta đã train locomotion policies -- nhưng mỗi policy thường chỉ học một kiểu di chuyển. Muốn robot trot? Train 1 policy. Muốn robot gallop? Train policy khác. Muốn robot đi chậm, đi nhanh, đi ngang? Lại thêm policy nữa.

Cách tiếp cận này có nhiều vấn đề:

Không hiệu quả: N kiểu di chuyển = N policies = N lần training
Chuyển gait khó: Làm sao transition mượt giữa 2 policies?
Không flexible: Khi cần gait mới, phải train lại từ đầu

Paper "Walk These Ways: Tuning Robot Control for Generalization with Multiplicity of Behavior" (arXiv:2212.03238) của Gabriel Margolis và Pulkit Agrawal (MIT CSAIL, CoRL 2023) giải quyết chính xác vấn đề này.

Nhiều kiểu di chuyển khác nhau từ một policy duy nhất

Ý tưởng chính: Multiplicity of Behavior (MoB)

Core insight

Khi train locomotion với RL, có nhiều cách để giải quyết cùng một task. Ví dụ, để đi thẳng 1 m/s, robot có thể:

Trot (2 chân chéo cùng lúc)
Walk (từng chân một)
Bound (2 chân trước rồi 2 chân sau)
Pronk (4 chân cùng lúc)

RL thường converge về 1 strategy duy nhất (thường là trot vì nó ổn định nhất). Paper này hỏi: làm sao để 1 policy học NHIỀU strategies cùng lúc?

Giải pháp: Command conditioning

Thay vì chỉ gửi velocity command (vx, vy, yaw_rate), Walk These Ways thêm một command vector mở rộng điều khiển cách robot di chuyển:

# Standard locomotion command
standard_command = {
    "vx": 1.0,        # m/s, forward
    "vy": 0.0,        # m/s, lateral
    "yaw_rate": 0.0,   # rad/s, turning
}

# Walk These Ways EXTENDED command
wtw_command = {
    # === Velocity (giong standard) ===
    "vx": 1.0,
    "vy": 0.0,
    "yaw_rate": 0.0,
    
    # === Gait parameters (MOI) ===
    "body_height": 0.0,        # [-1, 1] thap/cao
    "step_frequency": 3.0,     # Hz, tan suat buoc
    "gait": [1, 0, 0],         # one-hot: trot/pace/bound
    "swing_height": 0.08,      # m, do cao nang chan
    "stance_width": 0.0,       # [-1, 1] hep/rong
    "body_pitch": 0.0,         # rad, nghieng truoc/sau
    "body_roll": 0.0,          # rad, nghieng trai/phai
    "footswing_height": 0.08,  # m
    
    # TONG: 15 dimensions (thay vi 3)
}

Key idea: Policy nhận command vector 15-dim và học cách thi hành bất kỳ combination nào của các parameters này. Một policy, vô hạn kiểu di chuyển.

Architecture và Training

Observation space

observation = {
    # Proprioception (giong standard)
    "base_angular_velocity": 3,
    "projected_gravity": 3,
    "joint_positions": 12,
    "joint_velocities": 12,
    "previous_actions": 12,
    
    # Extended command (thay vi 3 dims)
    "extended_command": 15,
    
    # TONG: 57 dimensions
}

Reward function

Reward function là phần phức tạp nhất. Ngoài các reward thông thường (tracking velocity, energy penalty), Walk These Ways thêm gait-specific rewards:

def compute_gait_reward(env):
    """
    Reward cho gait pattern matching
    Dua vao commanded gait, khuyen khich dung contact pattern
    """
    rewards = {}
    
    # 1. Step frequency tracking
    # Dem so buoc thuc te vs. commanded frequency
    actual_freq = compute_step_frequency(env.foot_contacts)
    freq_error = (actual_freq - env.commands.step_frequency).square()
    rewards["step_freq"] = torch.exp(-freq_error / 0.25)
    
    # 2. Gait pattern tracking
    # Moi gait co desired phase offsets giua 4 chan
    gait_phases = {
        "trot":  [0.0, 0.5, 0.5, 0.0],  # FL-RR dong pha, FR-RL dong pha
        "pace":  [0.0, 0.5, 0.0, 0.5],  # FL-RL dong pha, FR-RR dong pha  
        "bound": [0.0, 0.0, 0.5, 0.5],  # front dong pha, rear dong pha
    }
    desired_phases = get_desired_phases(env.commands.gait, gait_phases)
    phase_error = compute_phase_error(env.foot_contacts, desired_phases)
    rewards["gait_phase"] = torch.exp(-phase_error / 0.5)
    
    # 3. Swing height tracking
    actual_swing = env.foot_heights.max(dim=-1).values
    swing_error = (actual_swing - env.commands.swing_height).square()
    rewards["swing_height"] = torch.exp(-swing_error / 0.01)
    
    # 4. Body height tracking
    height_error = (env.base_height - env.commands.body_height_target).square()
    rewards["body_height"] = torch.exp(-height_error / 0.01)
    
    # 5. Body orientation tracking
    pitch_error = (env.base_euler[:, 1] - env.commands.body_pitch).square()
    roll_error = (env.base_euler[:, 0] - env.commands.body_roll).square()
    rewards["orientation"] = torch.exp(-(pitch_error + roll_error) / 0.1)
    
    return rewards

Training procedure

training_config = {
    "num_envs": 4096,
    "max_iterations": 3000,   # Nhieu hon standard (1500) vi task phuc tap hon
    
    # Command sampling -- QUAN TRONG
    "command_sampling": {
        "vx_range": [-1.0, 2.0],
        "vy_range": [-0.5, 0.5],
        "yaw_range": [-1.0, 1.0],
        "body_height_range": [-0.1, 0.1],
        "step_freq_range": [2.0, 4.0],
        "gait": "uniform_categorical",   # random chon gait
        "swing_height_range": [0.04, 0.12],
        "stance_width_range": [-0.05, 0.05],
        "body_pitch_range": [-0.3, 0.3],
        "body_roll_range": [-0.2, 0.2],
    },
    
    # Moi episode, sample RANDOM command combination
    # → policy phai hoc tat ca combinations
    "command_resample_interval": 500,  # steps
}

Điểm hay: Mỗi episode, mỗi environment nhận random command combination. Với 4096 envs, mỗi iteration có 4096 combinations khác nhau chạy đồng thời. Sau 3000 iterations, policy đã "nhìn thấy" hàng triệu combinations.

RL training với nhiều gait patterns song song

Kết quả và Demo

Những gì 1 policy có thể làm

Với Walk These Ways, 1 policy duy nhất có thể:

Behavior	Command
Trot 2 m/s	vx=2.0, gait=trot, freq=3.0
Slow walk	vx=0.3, gait=trot, freq=1.5
Crouch walk	vx=0.5, body_height=-0.08
High-step march	vx=0.5, swing_height=0.12
Bound gallop	vx=1.5, gait=bound
Strafe left	vy=-0.5, gait=trot
Spin in place	vx=0, yaw_rate=1.5
Lean forward	vx=0, body_pitch=0.3
Dance rhythm	Oscillate swing_height va body_height
Brace against push	body_height=-0.05, stance_width=0.05

Và tất cả transitions giữa các behaviors đều mượt mà -- vì chỉ là thay đổi continuous command values.

So sánh với single-task policies

Metric	Single-task policy	Walk These Ways
Tracking accuracy	Cao hơn (~5%)	Tốt, thấp hơn 1 chút
Gait diversity	1 gait	Nhiều gaits
Transition quality	Không có	Mượt mà
Training time	20 min x N gaits	60 min (1 lần)
Deploy complexity	N models	1 model
Novel behaviors	Không	Có (tùy chỉnh command)

Hardware demo

Paper demo trên Unitree A1 (thế hệ trước Go2). Policy được deploy lên onboard Jetson Xavier, inference 50Hz. Robot có thể:

Đi trên cỏ, đất, đá
Leo dốc 25 độ
Chịu được push/kick
Chuyển gait real-time qua joystick
"Nhảy" bậc thang 10cm

Cách Replicate

Bước 1: Clone repo

git clone https://github.com/Improbable-AI/walk-these-ways.git
cd walk-these-ways
pip install -e .

Bước 2: Điều chỉnh command ranges

File config chính:

# walk_these_ways/envs/configs/go2_config.py
class Go2WTWCfg:
    class commands:
        # Dieu chinh ranges cho robot cua ban
        lin_vel_x_range = [-1.0, 2.0]
        lin_vel_y_range = [-0.5, 0.5]
        ang_vel_yaw_range = [-1.0, 1.0]
        body_height_range = [-0.05, 0.05]
        step_frequency_range = [2.0, 4.0]
        gait_types = ["trot", "pace", "bound"]
        swing_height_range = [0.04, 0.10]

Bước 3: Train

python train.py --task go2_wtw --num_envs 4096 --max_iterations 3000

# Training mat ~60 phut tren RTX 4090
# Lau hon standard vi observation va reward phuc tap hon

Bước 4: Deploy

Export ONNX và chạy trên Go2 tương tự như Part 3. Điểm khác duy nhất: observation có 15 command dims thay vì 3, và bạn cần GUI/joystick để điều chỉnh commands real-time.

# Joystick mapping cho Walk These Ways
joystick_mapping = {
    "left_stick_x": "vy",
    "left_stick_y": "vx",
    "right_stick_x": "yaw_rate",
    "right_stick_y": "body_height",
    "dpad_up": "swing_height += 0.01",
    "dpad_down": "swing_height -= 0.01",
    "button_a": "gait = trot",
    "button_b": "gait = pace",
    "button_x": "gait = bound",
    "L1": "step_frequency -= 0.5",
    "R1": "step_frequency += 0.5",
}

Ảnh hưởng và các công trình liên quan

Walk These Ways là một trong những paper có ảnh hưởng lớn nhất trong locomotion RL. Nó cho thấy RL policy có thể generalizable -- không chỉ cho terrain, mà còn cho behavior.

Các paper xây dựng trên Walk These Ways

Extreme Parkour with Legged Robots (Cheng et al., 2024) -- Mở rộng từ flat terrain sang parkour (nhảy, leo, chui), vẫn dùng command conditioning approach
DTC: Deep Tracking Control -- Dùng Walk These Ways policy làm low-level controller, thêm high-level vision policy
Humanoid locomotion -- Các teams như Agility Robotics (Digit) và Tesla (Optimus) đã apply tương tự cho bipedal

So sánh với các approaches khác

Approach	Paper	Ưu điểm	Nhược điểm
Walk These Ways	Margolis & Agrawal, 2023	1 policy nhiều gait, open-source	Command design cần nhiều kinh nghiệm
AMP (Adversarial Motion Priors)	Peng et al., 2021	Natural motion từ mocap	Cần motion capture data
DribbleBot	Ji et al., 2023	Soccer + locomotion	Task-specific
Parkour	Cheng et al., 2024	Extreme terrain	Cần depth camera

Bài học từ paper

1. Command space design là cốt lõi

Thiết kế command space là quyết định quan trọng nhất. Quá ít dimensions → không đủ expressive. Quá nhiều → khó train. Walk These Ways chọn 15 dims sau nhiều thực nghiệm.

2. Reward engineering vẫn là nghệ thuật

Dù dùng RL, reward function vẫn cần domain knowledge. Biết gait là gì, phase là gì, swing height là gì -- tất cả đều từ kiến thức classical locomotion (Part 1).

3. Open-source thay đổi tất cả

Walk These Ways được open-source hoàn toàn -- code, config, trained weights. Bất kỳ ai có GPU và Unitree A1/Go2 đều có thể replicate. Đây là lý do paper có impact lớn.

4. Sim-to-real vẫn là bottleneck

Dù policy đã rất tốt trong sim, sim-to-real transfer vẫn là bước khó nhất. Paper dùng domain randomization mạnh (friction, mass, motor strength) nhưng vẫn cần fine-tuning khi lên robot mới.

Tổng kết series Locomotion từ Zero đến Hero

Qua 4 bài viết, chúng ta đã đi từ nền tảng lý thuyết đến paper tiên tiến nhất:

Part 1: ZMP, CPG, IK -- classical methods và tại sao chúng bị thay thế
Part 2: RL formulation -- MDP, reward shaping, PPO, curriculum learning
Part 3: Hands-on -- legged_gym, Unitree Go2, sim-to-real deployment
Part 4 (bài này): Walk These Ways -- multi-gait learning từ 1 policy

Locomotion RL đang phát triển rất nhanh. Các hướng mới đang nóng:

Whole-body control: Không chỉ đi mà còn dùng tay làm việc (loco-manipulation)
Vision-based locomotion: Dùng camera để "nhìn" địa hình trước
Foundation models cho locomotion: Pre-train trên nhiều robots, fine-tune cho robot cụ thể
Humanoid locomotion: Từ quadruped sang bipedal -- khó hơn nhiều nhưng đang có nhiều breakthrough (Agility Digit, Tesla Optimus, Fourier GR-2)

Vấn đề: Một Gait, Một Policy?

Cách tiếp cận này có nhiều vấn đề:

Không hiệu quả: N kiểu di chuyển = N policies = N lần training
Chuyển gait khó: Làm sao transition mượt giữa 2 policies?
Không flexible: Khi cần gait mới, phải train lại từ đầu

Nhiều kiểu di chuyển khác nhau từ một policy duy nhất

Ý tưởng chính: Multiplicity of Behavior (MoB)

Core insight

Khi train locomotion với RL, có nhiều cách để giải quyết cùng một task. Ví dụ, để đi thẳng 1 m/s, robot có thể:

Trot (2 chân chéo cùng lúc)
Walk (từng chân một)
Bound (2 chân trước rồi 2 chân sau)
Pronk (4 chân cùng lúc)

RL thường converge về 1 strategy duy nhất (thường là trot vì nó ổn định nhất). Paper này hỏi: làm sao để 1 policy học NHIỀU strategies cùng lúc?

Giải pháp: Command conditioning

Thay vì chỉ gửi velocity command (vx, vy, yaw_rate), Walk These Ways thêm một command vector mở rộng điều khiển cách robot di chuyển:

# Standard locomotion command
standard_command = {
    "vx": 1.0,        # m/s, forward
    "vy": 0.0,        # m/s, lateral
    "yaw_rate": 0.0,   # rad/s, turning
}

# Walk These Ways EXTENDED command
wtw_command = {
    # === Velocity (giong standard) ===
    "vx": 1.0,
    "vy": 0.0,
    "yaw_rate": 0.0,
    
    # === Gait parameters (MOI) ===
    "body_height": 0.0,        # [-1, 1] thap/cao
    "step_frequency": 3.0,     # Hz, tan suat buoc
    "gait": [1, 0, 0],         # one-hot: trot/pace/bound
    "swing_height": 0.08,      # m, do cao nang chan
    "stance_width": 0.0,       # [-1, 1] hep/rong
    "body_pitch": 0.0,         # rad, nghieng truoc/sau
    "body_roll": 0.0,          # rad, nghieng trai/phai
    "footswing_height": 0.08,  # m
    
    # TONG: 15 dimensions (thay vi 3)
}

Key idea: Policy nhận command vector 15-dim và học cách thi hành bất kỳ combination nào của các parameters này. Một policy, vô hạn kiểu di chuyển.

Architecture và Training

Observation space

observation = {
    # Proprioception (giong standard)
    "base_angular_velocity": 3,
    "projected_gravity": 3,
    "joint_positions": 12,
    "joint_velocities": 12,
    "previous_actions": 12,
    
    # Extended command (thay vi 3 dims)
    "extended_command": 15,
    
    # TONG: 57 dimensions
}

Reward function

Reward function là phần phức tạp nhất. Ngoài các reward thông thường (tracking velocity, energy penalty), Walk These Ways thêm gait-specific rewards:

def compute_gait_reward(env):
    """
    Reward cho gait pattern matching
    Dua vao commanded gait, khuyen khich dung contact pattern
    """
    rewards = {}
    
    # 1. Step frequency tracking
    # Dem so buoc thuc te vs. commanded frequency
    actual_freq = compute_step_frequency(env.foot_contacts)
    freq_error = (actual_freq - env.commands.step_frequency).square()
    rewards["step_freq"] = torch.exp(-freq_error / 0.25)
    
    # 2. Gait pattern tracking
    # Moi gait co desired phase offsets giua 4 chan
    gait_phases = {
        "trot":  [0.0, 0.5, 0.5, 0.0],  # FL-RR dong pha, FR-RL dong pha
        "pace":  [0.0, 0.5, 0.0, 0.5],  # FL-RL dong pha, FR-RR dong pha  
        "bound": [0.0, 0.0, 0.5, 0.5],  # front dong pha, rear dong pha
    }
    desired_phases = get_desired_phases(env.commands.gait, gait_phases)
    phase_error = compute_phase_error(env.foot_contacts, desired_phases)
    rewards["gait_phase"] = torch.exp(-phase_error / 0.5)
    
    # 3. Swing height tracking
    actual_swing = env.foot_heights.max(dim=-1).values
    swing_error = (actual_swing - env.commands.swing_height).square()
    rewards["swing_height"] = torch.exp(-swing_error / 0.01)
    
    # 4. Body height tracking
    height_error = (env.base_height - env.commands.body_height_target).square()
    rewards["body_height"] = torch.exp(-height_error / 0.01)
    
    # 5. Body orientation tracking
    pitch_error = (env.base_euler[:, 1] - env.commands.body_pitch).square()
    roll_error = (env.base_euler[:, 0] - env.commands.body_roll).square()
    rewards["orientation"] = torch.exp(-(pitch_error + roll_error) / 0.1)
    
    return rewards

Training procedure

training_config = {
    "num_envs": 4096,
    "max_iterations": 3000,   # Nhieu hon standard (1500) vi task phuc tap hon
    
    # Command sampling -- QUAN TRONG
    "command_sampling": {
        "vx_range": [-1.0, 2.0],
        "vy_range": [-0.5, 0.5],
        "yaw_range": [-1.0, 1.0],
        "body_height_range": [-0.1, 0.1],
        "step_freq_range": [2.0, 4.0],
        "gait": "uniform_categorical",   # random chon gait
        "swing_height_range": [0.04, 0.12],
        "stance_width_range": [-0.05, 0.05],
        "body_pitch_range": [-0.3, 0.3],
        "body_roll_range": [-0.2, 0.2],
    },
    
    # Moi episode, sample RANDOM command combination
    # → policy phai hoc tat ca combinations
    "command_resample_interval": 500,  # steps
}

RL training với nhiều gait patterns song song

Kết quả và Demo

Những gì 1 policy có thể làm

Với Walk These Ways, 1 policy duy nhất có thể:

Behavior	Command
Trot 2 m/s	vx=2.0, gait=trot, freq=3.0
Slow walk	vx=0.3, gait=trot, freq=1.5
Crouch walk	vx=0.5, body_height=-0.08
High-step march	vx=0.5, swing_height=0.12
Bound gallop	vx=1.5, gait=bound
Strafe left	vy=-0.5, gait=trot
Spin in place	vx=0, yaw_rate=1.5
Lean forward	vx=0, body_pitch=0.3
Dance rhythm	Oscillate swing_height va body_height
Brace against push	body_height=-0.05, stance_width=0.05

Và tất cả transitions giữa các behaviors đều mượt mà -- vì chỉ là thay đổi continuous command values.

So sánh với single-task policies

Metric	Single-task policy	Walk These Ways
Tracking accuracy	Cao hơn (~5%)	Tốt, thấp hơn 1 chút
Gait diversity	1 gait	Nhiều gaits
Transition quality	Không có	Mượt mà
Training time	20 min x N gaits	60 min (1 lần)
Deploy complexity	N models	1 model
Novel behaviors	Không	Có (tùy chỉnh command)

Hardware demo

Paper demo trên Unitree A1 (thế hệ trước Go2). Policy được deploy lên onboard Jetson Xavier, inference 50Hz. Robot có thể:

Đi trên cỏ, đất, đá
Leo dốc 25 độ
Chịu được push/kick
Chuyển gait real-time qua joystick
"Nhảy" bậc thang 10cm

Cách Replicate

Bước 1: Clone repo

git clone https://github.com/Improbable-AI/walk-these-ways.git
cd walk-these-ways
pip install -e .

Bước 2: Điều chỉnh command ranges

File config chính:

# walk_these_ways/envs/configs/go2_config.py
class Go2WTWCfg:
    class commands:
        # Dieu chinh ranges cho robot cua ban
        lin_vel_x_range = [-1.0, 2.0]
        lin_vel_y_range = [-0.5, 0.5]
        ang_vel_yaw_range = [-1.0, 1.0]
        body_height_range = [-0.05, 0.05]
        step_frequency_range = [2.0, 4.0]
        gait_types = ["trot", "pace", "bound"]
        swing_height_range = [0.04, 0.10]

Bước 3: Train

python train.py --task go2_wtw --num_envs 4096 --max_iterations 3000

# Training mat ~60 phut tren RTX 4090
# Lau hon standard vi observation va reward phuc tap hon

Bước 4: Deploy

# Joystick mapping cho Walk These Ways
joystick_mapping = {
    "left_stick_x": "vy",
    "left_stick_y": "vx",
    "right_stick_x": "yaw_rate",
    "right_stick_y": "body_height",
    "dpad_up": "swing_height += 0.01",
    "dpad_down": "swing_height -= 0.01",
    "button_a": "gait = trot",
    "button_b": "gait = pace",
    "button_x": "gait = bound",
    "L1": "step_frequency -= 0.5",
    "R1": "step_frequency += 0.5",
}

Ảnh hưởng và các công trình liên quan

Các paper xây dựng trên Walk These Ways

Extreme Parkour with Legged Robots (Cheng et al., 2024) -- Mở rộng từ flat terrain sang parkour (nhảy, leo, chui), vẫn dùng command conditioning approach
DTC: Deep Tracking Control -- Dùng Walk These Ways policy làm low-level controller, thêm high-level vision policy
Humanoid locomotion -- Các teams như Agility Robotics (Digit) và Tesla (Optimus) đã apply tương tự cho bipedal

So sánh với các approaches khác

Approach	Paper	Ưu điểm	Nhược điểm
Walk These Ways	Margolis & Agrawal, 2023	1 policy nhiều gait, open-source	Command design cần nhiều kinh nghiệm
AMP (Adversarial Motion Priors)	Peng et al., 2021	Natural motion từ mocap	Cần motion capture data
DribbleBot	Ji et al., 2023	Soccer + locomotion	Task-specific
Parkour	Cheng et al., 2024	Extreme terrain	Cần depth camera

Bài học từ paper

1. Command space design là cốt lõi

2. Reward engineering vẫn là nghệ thuật

Dù dùng RL, reward function vẫn cần domain knowledge. Biết gait là gì, phase là gì, swing height là gì -- tất cả đều từ kiến thức classical locomotion (Part 1).

3. Open-source thay đổi tất cả

Walk These Ways được open-source hoàn toàn -- code, config, trained weights. Bất kỳ ai có GPU và Unitree A1/Go2 đều có thể replicate. Đây là lý do paper có impact lớn.

4. Sim-to-real vẫn là bottleneck

Tổng kết series Locomotion từ Zero đến Hero

Qua 4 bài viết, chúng ta đã đi từ nền tảng lý thuyết đến paper tiên tiến nhất:

Part 1: ZMP, CPG, IK -- classical methods và tại sao chúng bị thay thế
Part 2: RL formulation -- MDP, reward shaping, PPO, curriculum learning
Part 3: Hands-on -- legged_gym, Unitree Go2, sim-to-real deployment
Part 4 (bài này): Walk These Ways -- multi-gait learning từ 1 policy

Locomotion RL đang phát triển rất nhanh. Các hướng mới đang nóng:

Whole-body control: Không chỉ đi mà còn dùng tay làm việc (loco-manipulation)
Vision-based locomotion: Dùng camera để "nhìn" địa hình trước
Foundation models cho locomotion: Pre-train trên nhiều robots, fine-tune cho robot cụ thể
Humanoid locomotion: Từ quadruped sang bipedal -- khó hơn nhiều nhưng đang có nhiều breakthrough (Agility Digit, Tesla Optimus, Fourier GR-2)

Vấn đề: Một Gait, Một Policy?

Ý tưởng chính: Multiplicity of Behavior (MoB)

Core insight

Giải pháp: Command conditioning

Architecture và Training

Observation space

Reward function

Training procedure

Kết quả và Demo

Những gì 1 policy có thể làm

So sánh với single-task policies

Hardware demo

Cách Replicate

Bước 1: Clone repo

Bước 2: Điều chỉnh command ranges

Bước 3: Train

Bước 4: Deploy

Ảnh hưởng và các công trình liên quan

Các paper xây dựng trên Walk These Ways

So sánh với các approaches khác

Bài học từ paper

1. Command space design là cốt lõi

2. Reward engineering vẫn là nghệ thuật

3. Open-source thay đổi tất cả

4. Sim-to-real vẫn là bottleneck

Tổng kết series Locomotion từ Zero đến Hero

Bài viết liên quan

Nguyễn Anh Tuấn

Bài viết liên quan

Sim-to-Real cho Locomotion: Thực tế và kinh nghiệm

Robot Parkour: Nhảy, leo cầu thang bằng RL

Quadruped Locomotion: legged_gym đến Unitree Go2

Vấn đề: Một Gait, Một Policy?

Ý tưởng chính: Multiplicity of Behavior (MoB)

Core insight

Giải pháp: Command conditioning

Architecture và Training

Observation space

Reward function

Training procedure

Kết quả và Demo

Những gì 1 policy có thể làm

So sánh với single-task policies

Hardware demo

Cách Replicate

Bước 1: Clone repo

Bước 2: Điều chỉnh command ranges

Bước 3: Train

Bước 4: Deploy

Ảnh hưởng và các công trình liên quan

Các paper xây dựng trên Walk These Ways

So sánh với các approaches khác

Bài học từ paper

1. Command space design là cốt lõi

2. Reward engineering vẫn là nghệ thuật

3. Open-source thay đổi tất cả

4. Sim-to-real vẫn là bottleneck

Tổng kết series Locomotion từ Zero đến Hero

Bài viết liên quan

Nguyễn Anh Tuấn

Bài viết liên quan

Sim-to-Real cho Locomotion: Thực tế và kinh nghiệm

Robot Parkour: Nhảy, leo cầu thang bằng RL

Quadruped Locomotion: legged_gym đến Unitree Go2