Bipedal Walking: Controlling 2-Legged Robots with RL

Why is Bipedal Harder than Quadruped?

If you've followed this series from Part 1 through Part 5, you've seen impressive quadruped results -- ANYmal doing parkour, Unitree A1 climbing stairs. But when switching to 2-legged robots (bipedal), everything becomes significantly harder.

Underactuated Dynamics

A quadruped robot has 4 contact points with the ground, creating a wide support polygon. The center of mass (CoM) easily stays within this polygon, making the robot statically stable.

In contrast, a bipedal robot has only 2 legs, and during swing phase (one leg lifted), only 1 contact point remains. The support polygon shrinks to just the foot area. This is an underactuated system -- the robot doesn't have enough actuators to directly control all degrees of freedom and must rely on dynamic balance (similar to how humans walk -- essentially "controlled falling").

Specific Challenges

Challenge	Quadruped	Bipedal
Support polygon	Wide (4 legs)	Narrow (1-2 legs)
Static stability	Easy (3 feet on ground)	Impossible (must be dynamic)
Fall risk	Low	Extremely high
DoF to control	12 (3/leg)	10-30+ (hips, knees, ankles, torso)
Impact forces	Distributed across 4 legs	Concentrated on 2 legs
Recovery if fallen	Easy (crawl to standing)	Difficult (must rise from lying down)

Because of these challenges, bipedal locomotion with RL develops slower than quadruped, but is now accelerating thanks to better compute power and improved simulation tools.

Challenges of balance and locomotion control for two-legged robots

Major Bipedal Platforms

Cassie (Agility Robotics)

Cassie is the most popular bipedal research platform, developed by Agility Robotics (spinoff from Oregon State University). The robot has distinctive design features:

Compliant leg design: Legs have springs at the knee, helping absorb impact and store energy
Underactuated ankle: No ankle actuator -- forces the policy to balance using hip and knee
10 DoF: 5 joints per leg (2 hip, 1 knee, 2 ankle -- but ankle passive)
Weight: ~32 kg
Notable achievement: Ran 100m in 24.73 seconds (Guinness World Record for bipedal robot, 2022)

Cassie is a testbed for many RL research papers because: (a) available for purchase by labs, (b) complex enough for advanced control testing, (c) sim-to-real gap well-studied.

Digit (Agility Robotics)

Digit is the next generation after Cassie, adding torso, arms, and head:

Full humanoid form: Torso + 2 arms + 2 legs
30 DoF: Significantly more than Cassie
Purpose: Warehouse logistics (Agility partners with Amazon)
Manipulation + Locomotion: Can walk while carrying objects
Weight: ~65 kg

Unitree H1 / G1

Unitree H1 is an "affordable" humanoid from Unitree Robotics (China):

19 DoF, 1.8m tall, ~47 kg
Price: ~$90,000 (much cheaper than Atlas or Digit)
Speed: Achieved 3.3 m/s walking speed (record for humanoid)
Open-source friendly: Supports Isaac Gym, MuJoCo

Unitree G1 is a smaller version, 1.27m, 35 kg, ~$16,000 -- more accessible for smaller labs.

Atlas (Boston Dynamics)

Atlas is the world's most famous humanoid from Boston Dynamics:

Hydraulic → Electric: New version (2024) fully electric
28 DoF, extreme agility (backflips, dancing, parkour)
Not for sale: Internal research only
State-of-the-art hardware but control uses mostly model-based methods (MPC), gradually shifting to RL

Tesla Optimus

Tesla Optimus (Gen 2) is Tesla's humanoid effort:

28 DoF, 1.73m, ~57 kg
Tesla-designed actuators: 14 rotary + 14 linear
Purpose: Factory automation, general-purpose tasks
RL training: Uses Tesla's massive compute infrastructure

Comprehensive Platform Comparison

Platform	DoF	Weight	Height	Max Speed	Actuator	Price	RL Research
Cassie	10	32 kg	1.1m	4.0 m/s	Electric	~$150K	Extensive
Digit	30	65 kg	1.75m	1.5 m/s	Electric	~$250K	Significant
Unitree H1	19	47 kg	1.8m	3.3 m/s	Electric	~$90K	Growing
Unitree G1	23	35 kg	1.27m	2.0 m/s	Electric	~$16K	Emerging
Atlas (new)	28	~89 kg	1.5m	2.5 m/s	Electric	N/A	Internal
Tesla Optimus	28	57 kg	1.73m	1.3 m/s	Electric	N/A	Internal

RL Reward Design for Bipedal

Reward design for bipedal locomotion is more complex than quadruped. Beyond forward velocity and energy efficiency, many components are needed for balance and natural movement.

Core Reward Components

# Reward function for bipedal walking
reward = (
    # Forward progress
    w_vel * forward_velocity_tracking
    # Balance (most critical for bipedal)
    + w_balance * upright_reward          # Torso vertical
    + w_com * com_over_support_foot       # CoM over stance foot
    # Gait quality
    + w_gait * periodic_gait_reward       # Regular stepping pattern
    + w_sym * symmetry_reward             # Both legs symmetric
    + w_natural * joint_angle_penalty     # Avoid unnatural poses
    # Energy
    - w_energy * torque_squared           # Energy efficiency
    - w_jerk * action_jerk_penalty        # Smooth control
    # Safety
    - w_contact * body_contact_penalty    # Don't touch ground with body
    - w_fall * fall_penalty               # Don't fall
)

Balance Reward in Detail

Balance is the critical factor. Common approaches:

1. Upright torso reward: Keep torso orientation near vertical

upright = cos(torso_pitch) * cos(torso_roll)
reward_upright = max(0, upright)  # 1.0 when standing straight

2. CoM projection reward: Center of mass projection within support polygon

com_xy = get_com_projection()
support = get_support_polygon()
reward_com = is_inside(com_xy, support)

3. Angular momentum regulation: Limit excessive angular momentum (avoid spinning out)

reward_angmom = -||angular_momentum||^2

Periodic Gait Reward

For bipedal robots to walk naturally (not shuffling), use periodic reward:

# Phase variable: 0 → 2pi per gait cycle
phase = (time % gait_period) / gait_period * 2 * pi

# Desired foot contact pattern
left_contact_desired = sin(phase) > 0      # Left foot stance
right_contact_desired = sin(phase + pi) > 0 # Right foot stance (opposite)

# Reward matching desired contact pattern
reward_gait = (
    match(left_foot_contact, left_contact_desired)
    + match(right_foot_contact, right_contact_desired)
)

This creates alternating gait naturally rather than letting RL discover any walking pattern (which might be shuffling or hopping).

Fall Recovery Policy

An important issue for bipedal: robots will fall. The question is how to stand back up.

Separate Recovery Policy

Common approach: train 2 separate policies:

Walking policy: Normal walking control
Recovery policy: Standing up from lying down

State machine transition:

Walking → [detect fall] → Recovery → [standing up] → Walking

Fall detection is based on torso orientation: if |pitch| > 60° or |roll| > 60°, trigger recovery.

Push Recovery

Instead of waiting to fall then standing up, push recovery helps robots resist perturbations:

Train with random external forces (pushes) in simulation
Robot learns stepping strategy: take an additional step in push direction to recover
Similar to human reflex when pushed -- automatically step out to maintain balance

Berkeley Humanoid (arXiv:2407.21781)

Paper Berkeley Humanoid: A Research Platform for Learning-based Control introduces a new humanoid platform from UC Berkeley, specifically designed for RL research.

Design Principles

16 kg lightweight: Much lighter than Digit (65kg) or Atlas (89kg)
Quasi-Direct-Drive (QDD) actuators: Reduced gearing ratio, smaller sim-to-real gap because actuator dynamics are simpler
Low cost: In-house manufactured, affordable for university labs

RL Results

Omnidirectional walking: Forward, backward, lateral, turning
Dynamic hopping: Single-leg and double-leg hops
Outdoor terrain: Steep unpaved trails, hundreds of meters traversal
Simple RL controller: PPO with light domain randomization (minimal needed due to QDD actuators with small sim-to-real gap)

Key insight: Berkeley Humanoid demonstrates that good hardware design can significantly reduce RL training complexity. QDD actuators have nearly linear response, easy to simulate accurately, so no need for actuator networks or heavy domain randomization.

Reinforcement learning training bipedal locomotion across multiple platforms

Humanoid-Gym (arXiv:2404.05695)

Humanoid-Gym is an open-source RL framework for humanoid locomotion, built on NVIDIA Isaac Gym. It's the most practical tool currently available for starting with bipedal RL.

Key Features

Isaac Gym backend: Massive parallel training (4096+ environments)
Sim-to-sim verification: Train in Isaac Gym, verify in MuJoCo before deploying
Pre-configured robots: Supports RobotEra XBot-S (1.2m), XBot-L (1.65m), can add custom robots
Zero-shot sim-to-real: Verified on real XBot-S and XBot-L

Training Pipeline

1. Define robot URDF/MJCF
2. Configure reward weights (balance, velocity, energy...)
3. Train PPO in Isaac Gym (4096 parallel envs)
4. Verify in MuJoCo (sim-to-sim check)
5. Deploy to real robot (zero-shot)

Terrain Curriculum in Humanoid-Gym

Humanoid-Gym supports diverse terrains:

Flat ground (baseline)
Rough terrain (random height perturbations)
Slopes (up to 15 degrees)
Stairs (ascending + descending)
Discrete stepping stones

Dynamics randomization includes:

Mass randomization: +/- 15%
Friction: 0.5 - 2.0
Motor strength: +/- 10%
Push perturbation: random forces up to 50N

Results

RobotEra XBot-L (1.65m humanoid) achieves:

Stable walking on flat ground and rough terrain
Stair climbing (ascending)
Push recovery (50N lateral forces)
Zero-shot transfer from Isaac Gym to real robot

State-of-the-Art: Bipedal RL Milestones

Cassie RL Milestones

Research on Cassie has achieved many important milestones:

1. Robust Parameterized Locomotion (arXiv:2103.14295):

Train walking policy with variable speed, height, turning
Domain randomization for sim-to-real
First demonstration of robust sim-to-real bipedal walking with RL

2. All Common Bipedal Gaits (arXiv:2011.01387):

Single policy for standing, walking, hopping, running, skipping
Periodic reward composition -- design reward for each gait based on phase variable
Smooth transitions between gaits

3. Versatile Dynamic Locomotion (arXiv:2401.16889):

General solution for diverse bipedal skills
Walking, running, jumping, standing in one framework
LSTM-based policy for temporal reasoning

Trends 2024-2026

Whole-body control: Combining locomotion + manipulation (Digit carrying objects, Optimus assembling)
Vision-based bipedal: Add camera for terrain-aware walking (like quadruped parkour)
Foundation policies: Pre-train general locomotion policy, fine-tune for specific tasks
Faster sim-to-real: QDD actuators + better simulation reducing gap

Practical Guide: Getting Started with Bipedal RL

If you want to try bipedal RL:

Accessible Hardware

Unitree G1 (~$16K): Best price for full humanoid
Simulation only: Use Humanoid-Gym with MuJoCo humanoid models (free)

Software Stack

Humanoid-Gym (recommended): Isaac Gym + PPO, pre-configured for humanoid
legged_gym (ETH Zurich): More flexible, supports quadruped and bipedal
MuJoCo + Stable-Baselines3: Lightweight, easy to customize

Tips for Beginners

Start with standing balance before attempting walking
Periodic gait reward is critical -- without it, robot shuffles
Curriculum: Flat → rough → slopes → stairs
Symmetry reward helps natural gait
Sim-to-sim (Isaac Gym → MuJoCo) before sim-to-real

Conclusion

Bipedal locomotion with RL is entering an explosion phase. From Cassie (100m record) to Berkeley Humanoid (QDD simplicity) to Humanoid-Gym (open-source tools), the community is rapidly closing the gap with quadruped locomotion. Cheaper hardware (Unitree G1/H1) and better simulators (Isaac Gym, MuJoCo) are democratizing the field.

Read the earlier parts of this series:

Next -- Part 7: Sim-to-Real for Locomotion -- will deep dive into transferring policies from simulation to real robot, with actuator networks and best practices.

Sim-to-Real for Locomotion: Reality and Experience -- Actuator networks, domain randomization, deployment
Robot Parkour: Jumping and Climbing with RL -- Extreme parkour and SoloParkour
Humanoid Robotics: A Comprehensive Guide -- Overview of humanoid robotics
RL for Bipedal Walking -- Reinforcement learning fundamentals for bipedal
Foundation Models for Robotics: RT-2, Octo, OpenVLA -- Combining locomotion with foundation models

Why is Bipedal Harder than Quadruped?

Underactuated Dynamics

A quadruped robot has 4 contact points with the ground, creating a wide support polygon. The center of mass (CoM) easily stays within this polygon, making the robot statically stable.

Specific Challenges

Challenge	Quadruped	Bipedal
Support polygon	Wide (4 legs)	Narrow (1-2 legs)
Static stability	Easy (3 feet on ground)	Impossible (must be dynamic)
Fall risk	Low	Extremely high
DoF to control	12 (3/leg)	10-30+ (hips, knees, ankles, torso)
Impact forces	Distributed across 4 legs	Concentrated on 2 legs
Recovery if fallen	Easy (crawl to standing)	Difficult (must rise from lying down)

Because of these challenges, bipedal locomotion with RL develops slower than quadruped, but is now accelerating thanks to better compute power and improved simulation tools.

Challenges of balance and locomotion control for two-legged robots

Major Bipedal Platforms

Cassie (Agility Robotics)

Cassie is the most popular bipedal research platform, developed by Agility Robotics (spinoff from Oregon State University). The robot has distinctive design features:

Compliant leg design: Legs have springs at the knee, helping absorb impact and store energy
Underactuated ankle: No ankle actuator -- forces the policy to balance using hip and knee
10 DoF: 5 joints per leg (2 hip, 1 knee, 2 ankle -- but ankle passive)
Weight: ~32 kg
Notable achievement: Ran 100m in 24.73 seconds (Guinness World Record for bipedal robot, 2022)

Cassie is a testbed for many RL research papers because: (a) available for purchase by labs, (b) complex enough for advanced control testing, (c) sim-to-real gap well-studied.

Digit (Agility Robotics)

Digit is the next generation after Cassie, adding torso, arms, and head:

Full humanoid form: Torso + 2 arms + 2 legs
30 DoF: Significantly more than Cassie
Purpose: Warehouse logistics (Agility partners with Amazon)
Manipulation + Locomotion: Can walk while carrying objects
Weight: ~65 kg

Unitree H1 / G1

Unitree H1 is an "affordable" humanoid from Unitree Robotics (China):

19 DoF, 1.8m tall, ~47 kg
Price: ~$90,000 (much cheaper than Atlas or Digit)
Speed: Achieved 3.3 m/s walking speed (record for humanoid)
Open-source friendly: Supports Isaac Gym, MuJoCo

Unitree G1 is a smaller version, 1.27m, 35 kg, ~$16,000 -- more accessible for smaller labs.

Atlas (Boston Dynamics)

Atlas is the world's most famous humanoid from Boston Dynamics:

Hydraulic → Electric: New version (2024) fully electric
28 DoF, extreme agility (backflips, dancing, parkour)
Not for sale: Internal research only
State-of-the-art hardware but control uses mostly model-based methods (MPC), gradually shifting to RL

Tesla Optimus

Tesla Optimus (Gen 2) is Tesla's humanoid effort:

28 DoF, 1.73m, ~57 kg
Tesla-designed actuators: 14 rotary + 14 linear
Purpose: Factory automation, general-purpose tasks
RL training: Uses Tesla's massive compute infrastructure

Comprehensive Platform Comparison

Platform	DoF	Weight	Height	Max Speed	Actuator	Price	RL Research
Cassie	10	32 kg	1.1m	4.0 m/s	Electric	~$150K	Extensive
Digit	30	65 kg	1.75m	1.5 m/s	Electric	~$250K	Significant
Unitree H1	19	47 kg	1.8m	3.3 m/s	Electric	~$90K	Growing
Unitree G1	23	35 kg	1.27m	2.0 m/s	Electric	~$16K	Emerging
Atlas (new)	28	~89 kg	1.5m	2.5 m/s	Electric	N/A	Internal
Tesla Optimus	28	57 kg	1.73m	1.3 m/s	Electric	N/A	Internal

RL Reward Design for Bipedal

Reward design for bipedal locomotion is more complex than quadruped. Beyond forward velocity and energy efficiency, many components are needed for balance and natural movement.

Core Reward Components

# Reward function for bipedal walking
reward = (
    # Forward progress
    w_vel * forward_velocity_tracking
    # Balance (most critical for bipedal)
    + w_balance * upright_reward          # Torso vertical
    + w_com * com_over_support_foot       # CoM over stance foot
    # Gait quality
    + w_gait * periodic_gait_reward       # Regular stepping pattern
    + w_sym * symmetry_reward             # Both legs symmetric
    + w_natural * joint_angle_penalty     # Avoid unnatural poses
    # Energy
    - w_energy * torque_squared           # Energy efficiency
    - w_jerk * action_jerk_penalty        # Smooth control
    # Safety
    - w_contact * body_contact_penalty    # Don't touch ground with body
    - w_fall * fall_penalty               # Don't fall
)

Balance Reward in Detail

Balance is the critical factor. Common approaches:

1. Upright torso reward: Keep torso orientation near vertical

upright = cos(torso_pitch) * cos(torso_roll)
reward_upright = max(0, upright)  # 1.0 when standing straight

2. CoM projection reward: Center of mass projection within support polygon

com_xy = get_com_projection()
support = get_support_polygon()
reward_com = is_inside(com_xy, support)

3. Angular momentum regulation: Limit excessive angular momentum (avoid spinning out)

reward_angmom = -||angular_momentum||^2

Periodic Gait Reward

For bipedal robots to walk naturally (not shuffling), use periodic reward:

# Phase variable: 0 → 2pi per gait cycle
phase = (time % gait_period) / gait_period * 2 * pi

# Desired foot contact pattern
left_contact_desired = sin(phase) > 0      # Left foot stance
right_contact_desired = sin(phase + pi) > 0 # Right foot stance (opposite)

# Reward matching desired contact pattern
reward_gait = (
    match(left_foot_contact, left_contact_desired)
    + match(right_foot_contact, right_contact_desired)
)

This creates alternating gait naturally rather than letting RL discover any walking pattern (which might be shuffling or hopping).

Fall Recovery Policy

An important issue for bipedal: robots will fall. The question is how to stand back up.

Separate Recovery Policy

Common approach: train 2 separate policies:

Walking policy: Normal walking control
Recovery policy: Standing up from lying down

State machine transition:

Walking → [detect fall] → Recovery → [standing up] → Walking

Fall detection is based on torso orientation: if |pitch| > 60° or |roll| > 60°, trigger recovery.

Push Recovery

Instead of waiting to fall then standing up, push recovery helps robots resist perturbations:

Train with random external forces (pushes) in simulation
Robot learns stepping strategy: take an additional step in push direction to recover
Similar to human reflex when pushed -- automatically step out to maintain balance

Berkeley Humanoid (arXiv:2407.21781)

Paper Berkeley Humanoid: A Research Platform for Learning-based Control introduces a new humanoid platform from UC Berkeley, specifically designed for RL research.

Design Principles

16 kg lightweight: Much lighter than Digit (65kg) or Atlas (89kg)
Quasi-Direct-Drive (QDD) actuators: Reduced gearing ratio, smaller sim-to-real gap because actuator dynamics are simpler
Low cost: In-house manufactured, affordable for university labs

RL Results

Omnidirectional walking: Forward, backward, lateral, turning
Dynamic hopping: Single-leg and double-leg hops
Outdoor terrain: Steep unpaved trails, hundreds of meters traversal
Simple RL controller: PPO with light domain randomization (minimal needed due to QDD actuators with small sim-to-real gap)

Reinforcement learning training bipedal locomotion across multiple platforms

Humanoid-Gym (arXiv:2404.05695)

Humanoid-Gym is an open-source RL framework for humanoid locomotion, built on NVIDIA Isaac Gym. It's the most practical tool currently available for starting with bipedal RL.

Key Features

Isaac Gym backend: Massive parallel training (4096+ environments)
Sim-to-sim verification: Train in Isaac Gym, verify in MuJoCo before deploying
Pre-configured robots: Supports RobotEra XBot-S (1.2m), XBot-L (1.65m), can add custom robots
Zero-shot sim-to-real: Verified on real XBot-S and XBot-L

Training Pipeline

1. Define robot URDF/MJCF
2. Configure reward weights (balance, velocity, energy...)
3. Train PPO in Isaac Gym (4096 parallel envs)
4. Verify in MuJoCo (sim-to-sim check)
5. Deploy to real robot (zero-shot)

Terrain Curriculum in Humanoid-Gym

Humanoid-Gym supports diverse terrains:

Flat ground (baseline)
Rough terrain (random height perturbations)
Slopes (up to 15 degrees)
Stairs (ascending + descending)
Discrete stepping stones

Dynamics randomization includes:

Mass randomization: +/- 15%
Friction: 0.5 - 2.0
Motor strength: +/- 10%
Push perturbation: random forces up to 50N

Results

RobotEra XBot-L (1.65m humanoid) achieves:

Stable walking on flat ground and rough terrain
Stair climbing (ascending)
Push recovery (50N lateral forces)
Zero-shot transfer from Isaac Gym to real robot

State-of-the-Art: Bipedal RL Milestones

Cassie RL Milestones

Research on Cassie has achieved many important milestones:

1. Robust Parameterized Locomotion (arXiv:2103.14295):

Train walking policy with variable speed, height, turning
Domain randomization for sim-to-real
First demonstration of robust sim-to-real bipedal walking with RL

2. All Common Bipedal Gaits (arXiv:2011.01387):

Single policy for standing, walking, hopping, running, skipping
Periodic reward composition -- design reward for each gait based on phase variable
Smooth transitions between gaits

3. Versatile Dynamic Locomotion (arXiv:2401.16889):

General solution for diverse bipedal skills
Walking, running, jumping, standing in one framework
LSTM-based policy for temporal reasoning

Trends 2024-2026

Whole-body control: Combining locomotion + manipulation (Digit carrying objects, Optimus assembling)
Vision-based bipedal: Add camera for terrain-aware walking (like quadruped parkour)
Foundation policies: Pre-train general locomotion policy, fine-tune for specific tasks
Faster sim-to-real: QDD actuators + better simulation reducing gap

Practical Guide: Getting Started with Bipedal RL

If you want to try bipedal RL:

Accessible Hardware

Unitree G1 (~$16K): Best price for full humanoid
Simulation only: Use Humanoid-Gym with MuJoCo humanoid models (free)

Software Stack

Humanoid-Gym (recommended): Isaac Gym + PPO, pre-configured for humanoid
legged_gym (ETH Zurich): More flexible, supports quadruped and bipedal
MuJoCo + Stable-Baselines3: Lightweight, easy to customize

Tips for Beginners

Start with standing balance before attempting walking
Periodic gait reward is critical -- without it, robot shuffles
Curriculum: Flat → rough → slopes → stairs
Symmetry reward helps natural gait
Sim-to-sim (Isaac Gym → MuJoCo) before sim-to-real

Conclusion

Read the earlier parts of this series:

Next -- Part 7: Sim-to-Real for Locomotion -- will deep dive into transferring policies from simulation to real robot, with actuator networks and best practices.

Sim-to-Real for Locomotion: Reality and Experience -- Actuator networks, domain randomization, deployment
Robot Parkour: Jumping and Climbing with RL -- Extreme parkour and SoloParkour
Humanoid Robotics: A Comprehensive Guide -- Overview of humanoid robotics
RL for Bipedal Walking -- Reinforcement learning fundamentals for bipedal
Foundation Models for Robotics: RT-2, Octo, OpenVLA -- Combining locomotion with foundation models

Why is Bipedal Harder than Quadruped?

Underactuated Dynamics

Specific Challenges

Major Bipedal Platforms

Cassie (Agility Robotics)

Digit (Agility Robotics)

Unitree H1 / G1

Atlas (Boston Dynamics)

Tesla Optimus

Comprehensive Platform Comparison

RL Reward Design for Bipedal

Core Reward Components

Balance Reward in Detail

Periodic Gait Reward

Fall Recovery Policy

Separate Recovery Policy

Push Recovery

Berkeley Humanoid (arXiv:2407.21781)

Design Principles

RL Results

Humanoid-Gym (arXiv:2404.05695)

Key Features

Training Pipeline

Terrain Curriculum in Humanoid-Gym

Results

State-of-the-Art: Bipedal RL Milestones

Cassie RL Milestones

Trends 2024-2026

Practical Guide: Getting Started with Bipedal RL

Accessible Hardware

Software Stack

Tips for Beginners

Conclusion

Related Posts

Nguyễn Anh Tuấn

Related Posts

Sim-to-Real cho Locomotion: Thực tế và kinh nghiệm

Robot Parkour: Nhảy, leo cầu thang bằng RL

Walk These Ways: Adaptive locomotion một policy

Why is Bipedal Harder than Quadruped?

Underactuated Dynamics

Specific Challenges

Major Bipedal Platforms

Cassie (Agility Robotics)

Digit (Agility Robotics)

Unitree H1 / G1

Atlas (Boston Dynamics)

Tesla Optimus

Comprehensive Platform Comparison

RL Reward Design for Bipedal

Core Reward Components

Balance Reward in Detail

Periodic Gait Reward

Fall Recovery Policy

Separate Recovery Policy

Push Recovery

Berkeley Humanoid (arXiv:2407.21781)

Design Principles

RL Results

Humanoid-Gym (arXiv:2404.05695)

Key Features

Training Pipeline

Terrain Curriculum in Humanoid-Gym

Results

State-of-the-Art: Bipedal RL Milestones

Cassie RL Milestones

Trends 2024-2026

Practical Guide: Getting Started with Bipedal RL

Accessible Hardware

Software Stack

Tips for Beginners

Conclusion

Related Posts

Nguyễn Anh Tuấn

Related Posts

Sim-to-Real cho Locomotion: Thực tế và kinh nghiệm

Robot Parkour: Nhảy, leo cầu thang bằng RL

Walk These Ways: Adaptive locomotion một policy