← Back to Blog
locomotionlocomotionbipedalhumanoid

Bipedal Walking: Controlling 2-Legged Robots with RL

Challenges of bipedal robot control — balance, fall recovery, terrain adaptation from Cassie to Digit and humanoid.

Nguyen Anh Tuan25 tháng 2, 202610 min read
Bipedal Walking: Controlling 2-Legged Robots with RL

Why is Bipedal Harder than Quadruped?

If you've followed this series from Part 1 through Part 5, you've seen impressive quadruped results -- ANYmal doing parkour, Unitree A1 climbing stairs. But when switching to 2-legged robots (bipedal), everything becomes significantly harder.

Underactuated Dynamics

A quadruped robot has 4 contact points with the ground, creating a wide support polygon. The center of mass (CoM) easily stays within this polygon, making the robot statically stable.

In contrast, a bipedal robot has only 2 legs, and during swing phase (one leg lifted), only 1 contact point remains. The support polygon shrinks to just the foot area. This is an underactuated system -- the robot doesn't have enough actuators to directly control all degrees of freedom and must rely on dynamic balance (similar to how humans walk -- essentially "controlled falling").

Specific Challenges

Challenge Quadruped Bipedal
Support polygon Wide (4 legs) Narrow (1-2 legs)
Static stability Easy (3 feet on ground) Impossible (must be dynamic)
Fall risk Low Extremely high
DoF to control 12 (3/leg) 10-30+ (hips, knees, ankles, torso)
Impact forces Distributed across 4 legs Concentrated on 2 legs
Recovery if fallen Easy (crawl to standing) Difficult (must rise from lying down)

Because of these challenges, bipedal locomotion with RL develops slower than quadruped, but is now accelerating thanks to better compute power and improved simulation tools.

Challenges of balance and locomotion control for two-legged robots

Major Bipedal Platforms

Cassie (Agility Robotics)

Cassie is the most popular bipedal research platform, developed by Agility Robotics (spinoff from Oregon State University). The robot has distinctive design features:

Cassie is a testbed for many RL research papers because: (a) available for purchase by labs, (b) complex enough for advanced control testing, (c) sim-to-real gap well-studied.

Digit (Agility Robotics)

Digit is the next generation after Cassie, adding torso, arms, and head:

Unitree H1 / G1

Unitree H1 is an "affordable" humanoid from Unitree Robotics (China):

Unitree G1 is a smaller version, 1.27m, 35 kg, ~$16,000 -- more accessible for smaller labs.

Atlas (Boston Dynamics)

Atlas is the world's most famous humanoid from Boston Dynamics:

Tesla Optimus

Tesla Optimus (Gen 2) is Tesla's humanoid effort:

Comprehensive Platform Comparison

Platform DoF Weight Height Max Speed Actuator Price RL Research
Cassie 10 32 kg 1.1m 4.0 m/s Electric ~$150K Extensive
Digit 30 65 kg 1.75m 1.5 m/s Electric ~$250K Significant
Unitree H1 19 47 kg 1.8m 3.3 m/s Electric ~$90K Growing
Unitree G1 23 35 kg 1.27m 2.0 m/s Electric ~$16K Emerging
Atlas (new) 28 ~89 kg 1.5m 2.5 m/s Electric N/A Internal
Tesla Optimus 28 57 kg 1.73m 1.3 m/s Electric N/A Internal

RL Reward Design for Bipedal

Reward design for bipedal locomotion is more complex than quadruped. Beyond forward velocity and energy efficiency, many components are needed for balance and natural movement.

Core Reward Components

# Reward function for bipedal walking
reward = (
    # Forward progress
    w_vel * forward_velocity_tracking
    # Balance (most critical for bipedal)
    + w_balance * upright_reward          # Torso vertical
    + w_com * com_over_support_foot       # CoM over stance foot
    # Gait quality
    + w_gait * periodic_gait_reward       # Regular stepping pattern
    + w_sym * symmetry_reward             # Both legs symmetric
    + w_natural * joint_angle_penalty     # Avoid unnatural poses
    # Energy
    - w_energy * torque_squared           # Energy efficiency
    - w_jerk * action_jerk_penalty        # Smooth control
    # Safety
    - w_contact * body_contact_penalty    # Don't touch ground with body
    - w_fall * fall_penalty               # Don't fall
)

Balance Reward in Detail

Balance is the critical factor. Common approaches:

1. Upright torso reward: Keep torso orientation near vertical

upright = cos(torso_pitch) * cos(torso_roll)
reward_upright = max(0, upright)  # 1.0 when standing straight

2. CoM projection reward: Center of mass projection within support polygon

com_xy = get_com_projection()
support = get_support_polygon()
reward_com = is_inside(com_xy, support)

3. Angular momentum regulation: Limit excessive angular momentum (avoid spinning out)

reward_angmom = -||angular_momentum||^2

Periodic Gait Reward

For bipedal robots to walk naturally (not shuffling), use periodic reward:

# Phase variable: 0 → 2pi per gait cycle
phase = (time % gait_period) / gait_period * 2 * pi

# Desired foot contact pattern
left_contact_desired = sin(phase) > 0      # Left foot stance
right_contact_desired = sin(phase + pi) > 0 # Right foot stance (opposite)

# Reward matching desired contact pattern
reward_gait = (
    match(left_foot_contact, left_contact_desired)
    + match(right_foot_contact, right_contact_desired)
)

This creates alternating gait naturally rather than letting RL discover any walking pattern (which might be shuffling or hopping).

Fall Recovery Policy

An important issue for bipedal: robots will fall. The question is how to stand back up.

Separate Recovery Policy

Common approach: train 2 separate policies:

  1. Walking policy: Normal walking control
  2. Recovery policy: Standing up from lying down

State machine transition:

Walking → [detect fall] → Recovery → [standing up] → Walking

Fall detection is based on torso orientation: if |pitch| > 60° or |roll| > 60°, trigger recovery.

Push Recovery

Instead of waiting to fall then standing up, push recovery helps robots resist perturbations:

Berkeley Humanoid (arXiv:2407.21781)

Paper Berkeley Humanoid: A Research Platform for Learning-based Control introduces a new humanoid platform from UC Berkeley, specifically designed for RL research.

Design Principles

RL Results

Key insight: Berkeley Humanoid demonstrates that good hardware design can significantly reduce RL training complexity. QDD actuators have nearly linear response, easy to simulate accurately, so no need for actuator networks or heavy domain randomization.

Reinforcement learning training bipedal locomotion across multiple platforms

Humanoid-Gym (arXiv:2404.05695)

Humanoid-Gym is an open-source RL framework for humanoid locomotion, built on NVIDIA Isaac Gym. It's the most practical tool currently available for starting with bipedal RL.

Key Features

Training Pipeline

1. Define robot URDF/MJCF
2. Configure reward weights (balance, velocity, energy...)
3. Train PPO in Isaac Gym (4096 parallel envs)
4. Verify in MuJoCo (sim-to-sim check)
5. Deploy to real robot (zero-shot)

Terrain Curriculum in Humanoid-Gym

Humanoid-Gym supports diverse terrains:

Dynamics randomization includes:

Results

RobotEra XBot-L (1.65m humanoid) achieves:

State-of-the-Art: Bipedal RL Milestones

Cassie RL Milestones

Research on Cassie has achieved many important milestones:

1. Robust Parameterized Locomotion (arXiv:2103.14295):

2. All Common Bipedal Gaits (arXiv:2011.01387):

3. Versatile Dynamic Locomotion (arXiv:2401.16889):

Trends 2024-2026

  1. Whole-body control: Combining locomotion + manipulation (Digit carrying objects, Optimus assembling)
  2. Vision-based bipedal: Add camera for terrain-aware walking (like quadruped parkour)
  3. Foundation policies: Pre-train general locomotion policy, fine-tune for specific tasks
  4. Faster sim-to-real: QDD actuators + better simulation reducing gap

Practical Guide: Getting Started with Bipedal RL

If you want to try bipedal RL:

Accessible Hardware

  1. Unitree G1 (~$16K): Best price for full humanoid
  2. Simulation only: Use Humanoid-Gym with MuJoCo humanoid models (free)

Software Stack

  1. Humanoid-Gym (recommended): Isaac Gym + PPO, pre-configured for humanoid
  2. legged_gym (ETH Zurich): More flexible, supports quadruped and bipedal
  3. MuJoCo + Stable-Baselines3: Lightweight, easy to customize

Tips for Beginners

Conclusion

Bipedal locomotion with RL is entering an explosion phase. From Cassie (100m record) to Berkeley Humanoid (QDD simplicity) to Humanoid-Gym (open-source tools), the community is rapidly closing the gap with quadruped locomotion. Cheaper hardware (Unitree G1/H1) and better simulators (Isaac Gym, MuJoCo) are democratizing the field.

Read the earlier parts of this series:

Next -- Part 7: Sim-to-Real for Locomotion -- will deep dive into transferring policies from simulation to real robot, with actuator networks and best practices.


Related Posts

Related Posts

Unitree G1 vs H1 vs Tesla Optimus: So sánh humanoid 2026
humanoidroboticsresearch

Unitree G1 vs H1 vs Tesla Optimus: So sánh humanoid 2026

Phân tích chi tiết 3 nền tảng humanoid robot phổ biến nhất — specs, giá thành, SDK và khả năng ứng dụng thực tế.

23/3/202612 min read
ResearchTrung Quốc dẫn đầu cuộc đua Humanoid Robot 2026
humanoidresearch

Trung Quốc dẫn đầu cuộc đua Humanoid Robot 2026

Phân tích thị trường humanoid Trung Quốc -- Unitree, UBTECH, Fourier, Agibot và chiến lược quốc gia.

12/3/20269 min read
Wheeled Humanoid: Tương lai robot logistics và warehouse
humanoidfleetamr

Wheeled Humanoid: Tương lai robot logistics và warehouse

Robot hình người trên bánh xe — tại sao thiết kế hybrid này đang thay đổi ngành logistics và vận hành kho hàng.

3/3/202611 min read