Why is Bipedal Harder than Quadruped?
If you've followed this series from Part 1 through Part 5, you've seen impressive quadruped results -- ANYmal doing parkour, Unitree A1 climbing stairs. But when switching to 2-legged robots (bipedal), everything becomes significantly harder.
Underactuated Dynamics
A quadruped robot has 4 contact points with the ground, creating a wide support polygon. The center of mass (CoM) easily stays within this polygon, making the robot statically stable.
In contrast, a bipedal robot has only 2 legs, and during swing phase (one leg lifted), only 1 contact point remains. The support polygon shrinks to just the foot area. This is an underactuated system -- the robot doesn't have enough actuators to directly control all degrees of freedom and must rely on dynamic balance (similar to how humans walk -- essentially "controlled falling").
Specific Challenges
| Challenge | Quadruped | Bipedal |
|---|---|---|
| Support polygon | Wide (4 legs) | Narrow (1-2 legs) |
| Static stability | Easy (3 feet on ground) | Impossible (must be dynamic) |
| Fall risk | Low | Extremely high |
| DoF to control | 12 (3/leg) | 10-30+ (hips, knees, ankles, torso) |
| Impact forces | Distributed across 4 legs | Concentrated on 2 legs |
| Recovery if fallen | Easy (crawl to standing) | Difficult (must rise from lying down) |
Because of these challenges, bipedal locomotion with RL develops slower than quadruped, but is now accelerating thanks to better compute power and improved simulation tools.
Major Bipedal Platforms
Cassie (Agility Robotics)
Cassie is the most popular bipedal research platform, developed by Agility Robotics (spinoff from Oregon State University). The robot has distinctive design features:
- Compliant leg design: Legs have springs at the knee, helping absorb impact and store energy
- Underactuated ankle: No ankle actuator -- forces the policy to balance using hip and knee
- 10 DoF: 5 joints per leg (2 hip, 1 knee, 2 ankle -- but ankle passive)
- Weight: ~32 kg
- Notable achievement: Ran 100m in 24.73 seconds (Guinness World Record for bipedal robot, 2022)
Cassie is a testbed for many RL research papers because: (a) available for purchase by labs, (b) complex enough for advanced control testing, (c) sim-to-real gap well-studied.
Digit (Agility Robotics)
Digit is the next generation after Cassie, adding torso, arms, and head:
- Full humanoid form: Torso + 2 arms + 2 legs
- 30 DoF: Significantly more than Cassie
- Purpose: Warehouse logistics (Agility partners with Amazon)
- Manipulation + Locomotion: Can walk while carrying objects
- Weight: ~65 kg
Unitree H1 / G1
Unitree H1 is an "affordable" humanoid from Unitree Robotics (China):
- 19 DoF, 1.8m tall, ~47 kg
- Price: ~$90,000 (much cheaper than Atlas or Digit)
- Speed: Achieved 3.3 m/s walking speed (record for humanoid)
- Open-source friendly: Supports Isaac Gym, MuJoCo
Unitree G1 is a smaller version, 1.27m, 35 kg, ~$16,000 -- more accessible for smaller labs.
Atlas (Boston Dynamics)
Atlas is the world's most famous humanoid from Boston Dynamics:
- Hydraulic → Electric: New version (2024) fully electric
- 28 DoF, extreme agility (backflips, dancing, parkour)
- Not for sale: Internal research only
- State-of-the-art hardware but control uses mostly model-based methods (MPC), gradually shifting to RL
Tesla Optimus
Tesla Optimus (Gen 2) is Tesla's humanoid effort:
- 28 DoF, 1.73m, ~57 kg
- Tesla-designed actuators: 14 rotary + 14 linear
- Purpose: Factory automation, general-purpose tasks
- RL training: Uses Tesla's massive compute infrastructure
Comprehensive Platform Comparison
| Platform | DoF | Weight | Height | Max Speed | Actuator | Price | RL Research |
|---|---|---|---|---|---|---|---|
| Cassie | 10 | 32 kg | 1.1m | 4.0 m/s | Electric | ~$150K | Extensive |
| Digit | 30 | 65 kg | 1.75m | 1.5 m/s | Electric | ~$250K | Significant |
| Unitree H1 | 19 | 47 kg | 1.8m | 3.3 m/s | Electric | ~$90K | Growing |
| Unitree G1 | 23 | 35 kg | 1.27m | 2.0 m/s | Electric | ~$16K | Emerging |
| Atlas (new) | 28 | ~89 kg | 1.5m | 2.5 m/s | Electric | N/A | Internal |
| Tesla Optimus | 28 | 57 kg | 1.73m | 1.3 m/s | Electric | N/A | Internal |
RL Reward Design for Bipedal
Reward design for bipedal locomotion is more complex than quadruped. Beyond forward velocity and energy efficiency, many components are needed for balance and natural movement.
Core Reward Components
# Reward function for bipedal walking
reward = (
# Forward progress
w_vel * forward_velocity_tracking
# Balance (most critical for bipedal)
+ w_balance * upright_reward # Torso vertical
+ w_com * com_over_support_foot # CoM over stance foot
# Gait quality
+ w_gait * periodic_gait_reward # Regular stepping pattern
+ w_sym * symmetry_reward # Both legs symmetric
+ w_natural * joint_angle_penalty # Avoid unnatural poses
# Energy
- w_energy * torque_squared # Energy efficiency
- w_jerk * action_jerk_penalty # Smooth control
# Safety
- w_contact * body_contact_penalty # Don't touch ground with body
- w_fall * fall_penalty # Don't fall
)
Balance Reward in Detail
Balance is the critical factor. Common approaches:
1. Upright torso reward: Keep torso orientation near vertical
upright = cos(torso_pitch) * cos(torso_roll)
reward_upright = max(0, upright) # 1.0 when standing straight
2. CoM projection reward: Center of mass projection within support polygon
com_xy = get_com_projection()
support = get_support_polygon()
reward_com = is_inside(com_xy, support)
3. Angular momentum regulation: Limit excessive angular momentum (avoid spinning out)
reward_angmom = -||angular_momentum||^2
Periodic Gait Reward
For bipedal robots to walk naturally (not shuffling), use periodic reward:
# Phase variable: 0 → 2pi per gait cycle
phase = (time % gait_period) / gait_period * 2 * pi
# Desired foot contact pattern
left_contact_desired = sin(phase) > 0 # Left foot stance
right_contact_desired = sin(phase + pi) > 0 # Right foot stance (opposite)
# Reward matching desired contact pattern
reward_gait = (
match(left_foot_contact, left_contact_desired)
+ match(right_foot_contact, right_contact_desired)
)
This creates alternating gait naturally rather than letting RL discover any walking pattern (which might be shuffling or hopping).
Fall Recovery Policy
An important issue for bipedal: robots will fall. The question is how to stand back up.
Separate Recovery Policy
Common approach: train 2 separate policies:
- Walking policy: Normal walking control
- Recovery policy: Standing up from lying down
State machine transition:
Walking → [detect fall] → Recovery → [standing up] → Walking
Fall detection is based on torso orientation: if |pitch| > 60° or |roll| > 60°, trigger recovery.
Push Recovery
Instead of waiting to fall then standing up, push recovery helps robots resist perturbations:
- Train with random external forces (pushes) in simulation
- Robot learns stepping strategy: take an additional step in push direction to recover
- Similar to human reflex when pushed -- automatically step out to maintain balance
Berkeley Humanoid (arXiv:2407.21781)
Paper Berkeley Humanoid: A Research Platform for Learning-based Control introduces a new humanoid platform from UC Berkeley, specifically designed for RL research.
Design Principles
- 16 kg lightweight: Much lighter than Digit (65kg) or Atlas (89kg)
- Quasi-Direct-Drive (QDD) actuators: Reduced gearing ratio, smaller sim-to-real gap because actuator dynamics are simpler
- Low cost: In-house manufactured, affordable for university labs
RL Results
- Omnidirectional walking: Forward, backward, lateral, turning
- Dynamic hopping: Single-leg and double-leg hops
- Outdoor terrain: Steep unpaved trails, hundreds of meters traversal
- Simple RL controller: PPO with light domain randomization (minimal needed due to QDD actuators with small sim-to-real gap)
Key insight: Berkeley Humanoid demonstrates that good hardware design can significantly reduce RL training complexity. QDD actuators have nearly linear response, easy to simulate accurately, so no need for actuator networks or heavy domain randomization.
Humanoid-Gym (arXiv:2404.05695)
Humanoid-Gym is an open-source RL framework for humanoid locomotion, built on NVIDIA Isaac Gym. It's the most practical tool currently available for starting with bipedal RL.
Key Features
- Isaac Gym backend: Massive parallel training (4096+ environments)
- Sim-to-sim verification: Train in Isaac Gym, verify in MuJoCo before deploying
- Pre-configured robots: Supports RobotEra XBot-S (1.2m), XBot-L (1.65m), can add custom robots
- Zero-shot sim-to-real: Verified on real XBot-S and XBot-L
Training Pipeline
1. Define robot URDF/MJCF
2. Configure reward weights (balance, velocity, energy...)
3. Train PPO in Isaac Gym (4096 parallel envs)
4. Verify in MuJoCo (sim-to-sim check)
5. Deploy to real robot (zero-shot)
Terrain Curriculum in Humanoid-Gym
Humanoid-Gym supports diverse terrains:
- Flat ground (baseline)
- Rough terrain (random height perturbations)
- Slopes (up to 15 degrees)
- Stairs (ascending + descending)
- Discrete stepping stones
Dynamics randomization includes:
- Mass randomization: +/- 15%
- Friction: 0.5 - 2.0
- Motor strength: +/- 10%
- Push perturbation: random forces up to 50N
Results
RobotEra XBot-L (1.65m humanoid) achieves:
- Stable walking on flat ground and rough terrain
- Stair climbing (ascending)
- Push recovery (50N lateral forces)
- Zero-shot transfer from Isaac Gym to real robot
State-of-the-Art: Bipedal RL Milestones
Cassie RL Milestones
Research on Cassie has achieved many important milestones:
1. Robust Parameterized Locomotion (arXiv:2103.14295):
- Train walking policy with variable speed, height, turning
- Domain randomization for sim-to-real
- First demonstration of robust sim-to-real bipedal walking with RL
2. All Common Bipedal Gaits (arXiv:2011.01387):
- Single policy for standing, walking, hopping, running, skipping
- Periodic reward composition -- design reward for each gait based on phase variable
- Smooth transitions between gaits
3. Versatile Dynamic Locomotion (arXiv:2401.16889):
- General solution for diverse bipedal skills
- Walking, running, jumping, standing in one framework
- LSTM-based policy for temporal reasoning
Trends 2024-2026
- Whole-body control: Combining locomotion + manipulation (Digit carrying objects, Optimus assembling)
- Vision-based bipedal: Add camera for terrain-aware walking (like quadruped parkour)
- Foundation policies: Pre-train general locomotion policy, fine-tune for specific tasks
- Faster sim-to-real: QDD actuators + better simulation reducing gap
Practical Guide: Getting Started with Bipedal RL
If you want to try bipedal RL:
Accessible Hardware
- Unitree G1 (~$16K): Best price for full humanoid
- Simulation only: Use Humanoid-Gym with MuJoCo humanoid models (free)
Software Stack
- Humanoid-Gym (recommended): Isaac Gym + PPO, pre-configured for humanoid
- legged_gym (ETH Zurich): More flexible, supports quadruped and bipedal
- MuJoCo + Stable-Baselines3: Lightweight, easy to customize
Tips for Beginners
- Start with standing balance before attempting walking
- Periodic gait reward is critical -- without it, robot shuffles
- Curriculum: Flat → rough → slopes → stairs
- Symmetry reward helps natural gait
- Sim-to-sim (Isaac Gym → MuJoCo) before sim-to-real
Conclusion
Bipedal locomotion with RL is entering an explosion phase. From Cassie (100m record) to Berkeley Humanoid (QDD simplicity) to Humanoid-Gym (open-source tools), the community is rapidly closing the gap with quadruped locomotion. Cheaper hardware (Unitree G1/H1) and better simulators (Isaac Gym, MuJoCo) are democratizing the field.
Read the earlier parts of this series:
- Part 1: What is Locomotion?
- Part 2: Sim-to-Real Basics
- Part 3: Reward Design for Locomotion
- Part 4: Terrain Curriculum
- Part 5: Robot Parkour
Next -- Part 7: Sim-to-Real for Locomotion -- will deep dive into transferring policies from simulation to real robot, with actuator networks and best practices.
Related Posts
- Sim-to-Real for Locomotion: Reality and Experience -- Actuator networks, domain randomization, deployment
- Robot Parkour: Jumping and Climbing with RL -- Extreme parkour and SoloParkour
- Humanoid Robotics: A Comprehensive Guide -- Overview of humanoid robotics
- RL for Bipedal Walking -- Reinforcement learning fundamentals for bipedal
- Foundation Models for Robotics: RT-2, Octo, OpenVLA -- Combining locomotion with foundation models