locomotionlocomotionbipedalhumanoid

Bipedal Walking: Controlling 2-Legged Robots with RL

Challenges of bipedal robot control — balance, fall recovery, terrain adaptation from Cassie to Digit and humanoid.

Nguyen Anh Tuan25 tháng 2, 202610 phút đọc
Bipedal Walking: Controlling 2-Legged Robots with RL

Why is Bipedal Harder than Quadruped?

If you've followed this series from Part 1 through Part 5, you've seen impressive quadruped results -- ANYmal doing parkour, Unitree A1 climbing stairs. But when switching to 2-legged robots (bipedal), everything becomes significantly harder.

Underactuated Dynamics

A quadruped robot has 4 contact points with the ground, creating a wide support polygon. The center of mass (CoM) easily stays within this polygon, making the robot statically stable.

In contrast, a bipedal robot has only 2 legs, and during swing phase (one leg lifted), only 1 contact point remains. The support polygon shrinks to just the foot area. This is an underactuated system -- the robot doesn't have enough actuators to directly control all degrees of freedom and must rely on dynamic balance (similar to how humans walk -- essentially "controlled falling").

Specific Challenges

Challenge Quadruped Bipedal
Support polygon Wide (4 legs) Narrow (1-2 legs)
Static stability Easy (3 feet on ground) Impossible (must be dynamic)
Fall risk Low Extremely high
DoF to control 12 (3/leg) 10-30+ (hips, knees, ankles, torso)
Impact forces Distributed across 4 legs Concentrated on 2 legs
Recovery if fallen Easy (crawl to standing) Difficult (must rise from lying down)

Because of these challenges, bipedal locomotion with RL develops slower than quadruped, but is now accelerating thanks to better compute power and improved simulation tools.

Challenges of balance and locomotion control for two-legged robots

Major Bipedal Platforms

Cassie (Agility Robotics)

Cassie is the most popular bipedal research platform, developed by Agility Robotics (spinoff from Oregon State University). The robot has distinctive design features:

  • Compliant leg design: Legs have springs at the knee, helping absorb impact and store energy
  • Underactuated ankle: No ankle actuator -- forces the policy to balance using hip and knee
  • 10 DoF: 5 joints per leg (2 hip, 1 knee, 2 ankle -- but ankle passive)
  • Weight: ~32 kg
  • Notable achievement: Ran 100m in 24.73 seconds (Guinness World Record for bipedal robot, 2022)

Cassie is a testbed for many RL research papers because: (a) available for purchase by labs, (b) complex enough for advanced control testing, (c) sim-to-real gap well-studied.

Digit (Agility Robotics)

Digit is the next generation after Cassie, adding torso, arms, and head:

  • Full humanoid form: Torso + 2 arms + 2 legs
  • 30 DoF: Significantly more than Cassie
  • Purpose: Warehouse logistics (Agility partners with Amazon)
  • Manipulation + Locomotion: Can walk while carrying objects
  • Weight: ~65 kg

Unitree H1 / G1

Unitree H1 is an "affordable" humanoid from Unitree Robotics (China):

  • 19 DoF, 1.8m tall, ~47 kg
  • Price: ~$90,000 (much cheaper than Atlas or Digit)
  • Speed: Achieved 3.3 m/s walking speed (record for humanoid)
  • Open-source friendly: Supports Isaac Gym, MuJoCo

Unitree G1 is a smaller version, 1.27m, 35 kg, ~$16,000 -- more accessible for smaller labs.

Atlas (Boston Dynamics)

Atlas is the world's most famous humanoid from Boston Dynamics:

  • Hydraulic → Electric: New version (2024) fully electric
  • 28 DoF, extreme agility (backflips, dancing, parkour)
  • Not for sale: Internal research only
  • State-of-the-art hardware but control uses mostly model-based methods (MPC), gradually shifting to RL

Tesla Optimus

Tesla Optimus (Gen 2) is Tesla's humanoid effort:

  • 28 DoF, 1.73m, ~57 kg
  • Tesla-designed actuators: 14 rotary + 14 linear
  • Purpose: Factory automation, general-purpose tasks
  • RL training: Uses Tesla's massive compute infrastructure

Comprehensive Platform Comparison

Platform DoF Weight Height Max Speed Actuator Price RL Research
Cassie 10 32 kg 1.1m 4.0 m/s Electric ~$150K Extensive
Digit 30 65 kg 1.75m 1.5 m/s Electric ~$250K Significant
Unitree H1 19 47 kg 1.8m 3.3 m/s Electric ~$90K Growing
Unitree G1 23 35 kg 1.27m 2.0 m/s Electric ~$16K Emerging
Atlas (new) 28 ~89 kg 1.5m 2.5 m/s Electric N/A Internal
Tesla Optimus 28 57 kg 1.73m 1.3 m/s Electric N/A Internal

RL Reward Design for Bipedal

Reward design for bipedal locomotion is more complex than quadruped. Beyond forward velocity and energy efficiency, many components are needed for balance and natural movement.

Core Reward Components

# Reward function for bipedal walking
reward = (
    # Forward progress
    w_vel * forward_velocity_tracking
    # Balance (most critical for bipedal)
    + w_balance * upright_reward          # Torso vertical
    + w_com * com_over_support_foot       # CoM over stance foot
    # Gait quality
    + w_gait * periodic_gait_reward       # Regular stepping pattern
    + w_sym * symmetry_reward             # Both legs symmetric
    + w_natural * joint_angle_penalty     # Avoid unnatural poses
    # Energy
    - w_energy * torque_squared           # Energy efficiency
    - w_jerk * action_jerk_penalty        # Smooth control
    # Safety
    - w_contact * body_contact_penalty    # Don't touch ground with body
    - w_fall * fall_penalty               # Don't fall
)

Balance Reward in Detail

Balance is the critical factor. Common approaches:

1. Upright torso reward: Keep torso orientation near vertical

upright = cos(torso_pitch) * cos(torso_roll)
reward_upright = max(0, upright)  # 1.0 when standing straight

2. CoM projection reward: Center of mass projection within support polygon

com_xy = get_com_projection()
support = get_support_polygon()
reward_com = is_inside(com_xy, support)

3. Angular momentum regulation: Limit excessive angular momentum (avoid spinning out)

reward_angmom = -||angular_momentum||^2

Periodic Gait Reward

For bipedal robots to walk naturally (not shuffling), use periodic reward:

# Phase variable: 0 → 2pi per gait cycle
phase = (time % gait_period) / gait_period * 2 * pi

# Desired foot contact pattern
left_contact_desired = sin(phase) > 0      # Left foot stance
right_contact_desired = sin(phase + pi) > 0 # Right foot stance (opposite)

# Reward matching desired contact pattern
reward_gait = (
    match(left_foot_contact, left_contact_desired)
    + match(right_foot_contact, right_contact_desired)
)

This creates alternating gait naturally rather than letting RL discover any walking pattern (which might be shuffling or hopping).

Fall Recovery Policy

An important issue for bipedal: robots will fall. The question is how to stand back up.

Separate Recovery Policy

Common approach: train 2 separate policies:

  1. Walking policy: Normal walking control
  2. Recovery policy: Standing up from lying down

State machine transition:

Walking → [detect fall] → Recovery → [standing up] → Walking

Fall detection is based on torso orientation: if |pitch| > 60° or |roll| > 60°, trigger recovery.

Push Recovery

Instead of waiting to fall then standing up, push recovery helps robots resist perturbations:

  • Train with random external forces (pushes) in simulation
  • Robot learns stepping strategy: take an additional step in push direction to recover
  • Similar to human reflex when pushed -- automatically step out to maintain balance

Berkeley Humanoid (arXiv:2407.21781)

Paper Berkeley Humanoid: A Research Platform for Learning-based Control introduces a new humanoid platform from UC Berkeley, specifically designed for RL research.

Design Principles

  • 16 kg lightweight: Much lighter than Digit (65kg) or Atlas (89kg)
  • Quasi-Direct-Drive (QDD) actuators: Reduced gearing ratio, smaller sim-to-real gap because actuator dynamics are simpler
  • Low cost: In-house manufactured, affordable for university labs

RL Results

  • Omnidirectional walking: Forward, backward, lateral, turning
  • Dynamic hopping: Single-leg and double-leg hops
  • Outdoor terrain: Steep unpaved trails, hundreds of meters traversal
  • Simple RL controller: PPO with light domain randomization (minimal needed due to QDD actuators with small sim-to-real gap)

Key insight: Berkeley Humanoid demonstrates that good hardware design can significantly reduce RL training complexity. QDD actuators have nearly linear response, easy to simulate accurately, so no need for actuator networks or heavy domain randomization.

Reinforcement learning training bipedal locomotion across multiple platforms

Humanoid-Gym (arXiv:2404.05695)

Humanoid-Gym is an open-source RL framework for humanoid locomotion, built on NVIDIA Isaac Gym. It's the most practical tool currently available for starting with bipedal RL.

Key Features

  • Isaac Gym backend: Massive parallel training (4096+ environments)
  • Sim-to-sim verification: Train in Isaac Gym, verify in MuJoCo before deploying
  • Pre-configured robots: Supports RobotEra XBot-S (1.2m), XBot-L (1.65m), can add custom robots
  • Zero-shot sim-to-real: Verified on real XBot-S and XBot-L

Training Pipeline

1. Define robot URDF/MJCF
2. Configure reward weights (balance, velocity, energy...)
3. Train PPO in Isaac Gym (4096 parallel envs)
4. Verify in MuJoCo (sim-to-sim check)
5. Deploy to real robot (zero-shot)

Terrain Curriculum in Humanoid-Gym

Humanoid-Gym supports diverse terrains:

  • Flat ground (baseline)
  • Rough terrain (random height perturbations)
  • Slopes (up to 15 degrees)
  • Stairs (ascending + descending)
  • Discrete stepping stones

Dynamics randomization includes:

  • Mass randomization: +/- 15%
  • Friction: 0.5 - 2.0
  • Motor strength: +/- 10%
  • Push perturbation: random forces up to 50N

Results

RobotEra XBot-L (1.65m humanoid) achieves:

  • Stable walking on flat ground and rough terrain
  • Stair climbing (ascending)
  • Push recovery (50N lateral forces)
  • Zero-shot transfer from Isaac Gym to real robot

State-of-the-Art: Bipedal RL Milestones

Cassie RL Milestones

Research on Cassie has achieved many important milestones:

1. Robust Parameterized Locomotion (arXiv:2103.14295):

  • Train walking policy with variable speed, height, turning
  • Domain randomization for sim-to-real
  • First demonstration of robust sim-to-real bipedal walking with RL

2. All Common Bipedal Gaits (arXiv:2011.01387):

  • Single policy for standing, walking, hopping, running, skipping
  • Periodic reward composition -- design reward for each gait based on phase variable
  • Smooth transitions between gaits

3. Versatile Dynamic Locomotion (arXiv:2401.16889):

  • General solution for diverse bipedal skills
  • Walking, running, jumping, standing in one framework
  • LSTM-based policy for temporal reasoning
  1. Whole-body control: Combining locomotion + manipulation (Digit carrying objects, Optimus assembling)
  2. Vision-based bipedal: Add camera for terrain-aware walking (like quadruped parkour)
  3. Foundation policies: Pre-train general locomotion policy, fine-tune for specific tasks
  4. Faster sim-to-real: QDD actuators + better simulation reducing gap

Practical Guide: Getting Started with Bipedal RL

If you want to try bipedal RL:

Accessible Hardware

  1. Unitree G1 (~$16K): Best price for full humanoid
  2. Simulation only: Use Humanoid-Gym with MuJoCo humanoid models (free)

Software Stack

  1. Humanoid-Gym (recommended): Isaac Gym + PPO, pre-configured for humanoid
  2. legged_gym (ETH Zurich): More flexible, supports quadruped and bipedal
  3. MuJoCo + Stable-Baselines3: Lightweight, easy to customize

Tips for Beginners

  • Start with standing balance before attempting walking
  • Periodic gait reward is critical -- without it, robot shuffles
  • Curriculum: Flat → rough → slopes → stairs
  • Symmetry reward helps natural gait
  • Sim-to-sim (Isaac Gym → MuJoCo) before sim-to-real

Conclusion

Bipedal locomotion with RL is entering an explosion phase. From Cassie (100m record) to Berkeley Humanoid (QDD simplicity) to Humanoid-Gym (open-source tools), the community is rapidly closing the gap with quadruped locomotion. Cheaper hardware (Unitree G1/H1) and better simulators (Isaac Gym, MuJoCo) are democratizing the field.

Read the earlier parts of this series:

Next -- Part 7: Sim-to-Real for Locomotion -- will deep dive into transferring policies from simulation to real robot, with actuator networks and best practices.


NT

Nguyễn Anh Tuấn

Robotics & AI Engineer. Building VnRobo — sharing knowledge about robot learning, VLA models, and automation.

Bài viết liên quan

NEWTutorial
Fine-Tune GR00T N1.6 với Cosmos Reason 2
grootnvidiavlacosmosfine-tuninghumanoidisaac

Fine-Tune GR00T N1.6 với Cosmos Reason 2

Hướng dẫn chi tiết fine-tune NVIDIA GR00T N1.6 — VLA model 3B tham số kết hợp Cosmos Reason 2 để điều khiển humanoid robot từ ảnh và ngôn ngữ.

15/4/202611 phút đọc
NEWTutorial
GEAR-SONIC: Whole-Body Control cho Humanoid Robot
humanoidwhole-body-controlnvidiareinforcement-learningmotion-trackingvr-teleoperationisaac-lab

GEAR-SONIC: Whole-Body Control cho Humanoid Robot

Hướng dẫn chi tiết GEAR-SONIC của NVIDIA — huấn luyện whole-body controller cho humanoid robot với dataset BONES-SEED và VR teleoperation.

13/4/202612 phút đọc
NEWTutorial
Genie Sim 3.0: Huấn luyện Humanoid với AGIBOT
simulationhumanoidisaac-simgenie-simagibotsim-to-realreinforcement-learning

Genie Sim 3.0: Huấn luyện Humanoid với AGIBOT

Hướng dẫn chi tiết dựng môi trường simulation với Genie Sim 3.0 — nền tảng open-source từ AGIBOT trên Isaac Sim để huấn luyện robot humanoid.

12/4/202611 phút đọc