researchai-perceptionreinforcement-learningmanipulationsim-to-realcomputer-visionresearch

DeepMind Ping-Pong Robot: AI Plays Table Tennis at Human Level

How Google DeepMind built a table tennis robot that beats amateur humans using hierarchical RL, LLC-HLC architecture, and zero-shot sim-to-real transfer.

Nguyen Anh Tuan22 tháng 4, 202610 min read
DeepMind Ping-Pong Robot: AI Plays Table Tennis at Human Level

When a Robot Learns to Play Ping-Pong Like a Human

Picture a table tennis ball flying across the net at over 10 m/s, spinning at dozens of revolutions per second, and landing on the table in under half a second. The human brain handles the full reflex chain — tracking the trajectory, firing muscles, executing the swing — almost instantly. For a robot, this is an extraordinarily hard problem: extreme speed, ultra-short reaction windows, complex physics, and an opponent who constantly changes tactics.

In August 2024, Google DeepMind published "Achieving Human Level Competitive Robot Table Tennis", the culmination of years of research on robotic table tennis. It is the first learned robot agent to play competitive table tennis at an amateur human level — winning 13 of 29 games, defeating 100% of beginner opponents and 55% of amateur players. The paper was presented at ICRA 2025 in Atlanta.

This article breaks down the entire system — hardware, algorithm architecture, training pipeline, real-world results, and engineering lessons you can apply to other robotics projects.

Why Table Tennis Is the Perfect Robotics Benchmark

Before diving into the technical details, it's worth understanding why table tennis was chosen as a benchmark rather than conventional industrial tasks.

Extreme speed and latency demands: A ball travels from one side of the table (~2.7m) to the other in roughly 200–300ms. The robot must detect the trajectory, compute a contact point, plan arm motion, and execute — all within that window. This is real-time processing pushed to the limit.

Nonlinear physics: Topspin, backspin, and sidespin change the ball's trajectory after the bounce in ways that are extremely difficult to model analytically. No simple equation captures all scenarios.

Opponent adaptation: Different players have different strengths, weaknesses, and playing styles. The robot must "read" the opponent and adjust strategy in real time.

Open environment: Unlike structured pick-and-place tasks, table tennis involves changing lighting, worn balls, air currents, and accumulated mechanical drift.

These properties make ping-pong an ideal stress test for robot learning systems.

Hardware: An 8-DoF Robot System

A competitive table tennis match between athlete and robot system

DeepMind didn't build custom hardware from scratch. Instead, they combined two industrial systems to create a flexible, high-performance setup:

ABB IRB 1100 — a 6-DoF industrial robot arm capable of rotating joints at up to 420–600 degrees/second with 0.01mm repeatability. This arm handles the actual swing.

Festo Linear Gantry — two linear actuator rails:

  • X-axis (horizontal): 4m long, traversing the full width of the table
  • Y-axis (depth): 2m long, moving toward and away from the table

Combined, the system has 8 DoF: 6 (arm) + 2 (gantry). A 3D-printed paddle handle with standard short-pips rubber is attached to the arm's end effector.

Why gantry instead of a mobile base? Speed and precision. Linear actuators can move faster and more accurately than any wheeled base within the confined space of a ping-pong table.

Perception: A high-speed stereo camera system tracks the ball in real time, outputting a 3D state estimate — position, velocity, and spin — at high frequency. Importantly, these are real ball states from the physical world, not simulated observations.

Architecture: Hierarchical Policy — LLC and HLC

This is the most technically significant contribution of the paper. DeepMind designed a two-level hierarchical policy to handle the full complexity of competitive table tennis.

Level 1: Low-Level Controllers (LLC)

Each LLC is a specialized policy trained for a specific hitting technique:

  • Forehand topspin drive — aggressive forward spin
  • Backhand push — controlled defensive stroke
  • Forehand serve — varied serve options
  • Backhand counter — quick counter-attack

Every LLC comes with a skill descriptor — a quantitative characterization of its capabilities and limitations. For example: "This LLC performs best when the ball arrives from the right at medium speed, but degrades significantly against strong backspin."

Skill descriptors are not static metadata — they are updated online based on real in-game performance.

Level 2: High-Level Controller (HLC)

The HLC is the "strategic brain" that selects which technique to use for each incoming ball. The decision process:

  1. Classify situation — Forehand or backhand? Based on the incoming ball's predicted landing point.
  2. Shortlist LLCs — Use tree-search and heuristics to filter candidate LLCs using their skill descriptors.
  3. Analyze the opponent — Use live Opponent Statistics (strengths and weaknesses observed during the match) to prioritize LLCs most likely to win the point.
  4. H-values — The HLC maintains a dynamic preference table (H-values) per LLC, continuously updated based on point-by-point outcomes.

In practice: if an opponent struggles with heavy topspin to their left, the HLC increases the H-value for forehand-topspin-cross-court and selects it more frequently.

               [Opponent hits the ball]
                         ↓
               [Perception System]
                (3D ball state update)
                         ↓
          [High-Level Controller - HLC]
        (Opponent stats + Tree search)
                         ↓
       ┌──────────────────────────────────┐
       │  Select optimal LLC from list    │
       └──────────────────────────────────┘
                         ↓
       [Selected Low-Level Controller]
       (Arm trajectory + paddle angle)
                         ↓
              [Motion execution]

Training Pipeline: From Simulation to Reality

Visualization of the sim-to-real training loop and neural network policy

The most common question about this kind of system: how can you train in simulation and have it work in the real world?

Blackbox Gradient Sensing (BGS)

DeepMind used BGS — a zero-order (gradient-free) optimization method — to train LLCs in simulation. Why not standard PPO or SAC?

Table tennis presents several challenges for gradient-based RL:

  • Sparse rewards — you only know win/loss after many steps
  • Discontinuous dynamics — ball-paddle collisions are non-differentiable
  • High-dimensional action space — simultaneous control of 8 DoF

BGS works by sampling many small perturbations around the current policy, evaluating each perturbation's reward, and estimating a gradient update direction. Slower convergence than backprop-based methods, but far more stable for this problem class.

Iterative Sim-to-Real and Automatic Task Curriculum

The training pipeline is a self-improving loop:

  1. Seed — Collect a small amount of human-vs-human play data to define an initial task distribution.
  2. Simulate — Train LLCs in simulation using BGS on this distribution.
  3. Deploy zero-shot — Transfer policy directly to the real robot without fine-tuning.
  4. Play humans — The robot competes against real players and records game states.
  5. Update distribution — Add situations where the robot failed to the training distribution.
  6. Iterate — Return to step 2 with the expanded distribution.

The key to zero-shot transfer is the skill descriptor framework. Rather than attempting perfect physics simulation (impossible), the team explicitly characterizes the limits of each LLC so the HLC knows when not to select a given skill. The insight: instead of eliminating the sim-to-real gap, inform the robot about the gap.

Ball State Estimation

The vision system uses a high-speed stereo camera pair to triangulate the ball's 3D position. A Kalman Filter then estimates velocity and spin from the position sequence. The full state vector includes:

  • Position (x, y, z)
  • Velocity (vx, vy, vz)
  • Spin estimate (ωx, ωy, ωz)

This state is fed continuously to the HLC for tactical decisions and to the selected LLC for arm trajectory planning.

Results: 100% Win Rate vs. Beginners, 55% vs. Amateurs

To evaluate the system, DeepMind organized 29 games against human players at three skill levels:

Opponent level Games Robot outcome
Beginner ~10 100% win
Amateur ~14 ~55% win
Advanced ~5 0% win
Total 29 13 wins / 29

Key observations:

  • The robot developed a recognizable playing style — favoring certain court angles and playing more conservatively when leading.
  • The robot lacks a powerful forehand topspin comparable to advanced players, which is its primary weakness.
  • Amateur opponents described it as playing like "a consistent beginner" — reliable but without creative shot-making.

Official paper GitHub: google-deepmind/competitive_robot_table_tennis — includes code, model weights, and video demos.

AIMY: Open-Source Ball Launcher for Research

You don't need an ABB IRB 1100 to start researching table tennis robotics. The Max Planck Institute for Intelligent Systems developed AIMY — an open-source table tennis ball launcher.

Paper: AIMY: An Open-source Table Tennis Ball Launcher for Versatile and High-fidelity Trajectory Generation — Presented at ICRA 2023.

AIMY specifications:

  • 3 independently controllable wheels → precise spin and speed control
  • Maximum ball speed: 15.4 m/s
  • Maximum spin rate: 192 rev/s (comparable to advanced human players)
  • Interface: Ethernet or Wi-Fi
  • Fully open-source hardware and software

GitHub: intelligent-soft-robots/aimy_target_shooting

AIMY enables reproducible data collection for training and benchmarking without requiring a human sparring partner — essential for systematic robot learning experiments.

Future Directions: Humanoid Table Tennis

If a robot arm reaches amateur level, what can a humanoid robot achieve?

The paper "Towards Versatile Humanoid Table Tennis" (arXiv:2509.21690, 2025) extends the problem to a humanoid form with Prediction Augmentation — a technique for improving trajectory prediction even when the ball is partially occluded.

Another direction: SpikePingpong (arXiv:2506.06690) replaces standard cameras with event cameras (neuromorphic vision). Event cameras record pixel-level changes with microsecond temporal resolution rather than capturing discrete frames. The result: faster reaction times and no motion blur when tracking high-speed balls. SpikePingpong achieves 92% accuracy in a 30cm target zone and 70% in the more demanding 20cm zone.

These signals are clear: robotic table tennis is evolving from proof-of-concept to a serious research platform.

Engineering Lessons You Can Transfer

Even if you're not building a ping-pong robot, this paper offers broadly applicable lessons:

1. Hierarchical policies outperform monolithic ones: For complex tasks, specialized LLCs coordinated by an HLC typically outperform a single policy trying to learn everything.

2. Skill descriptors solve sim-to-real: Rather than perfecting your simulator, explicitly characterize what the simulator fails to capture and communicate those limits to the policy.

3. Online adaptation matters as much as offline training: The HLC's real-time H-value updates during a match gave performance gains that no amount of offline training could replicate.

4. Build automatic task curricula from failures: The loop of collecting failure cases, adding them to the training distribution, and retraining is a pattern applicable to virtually any robotics learning task.

For deeper context on the RL algorithms powering this work, see RL Basics for Robotics. For how hierarchical architectures extend to full-body humanoid control, see Humanoid Loco-Manipulation.


Technical Summary

Component Details
Hardware ABB IRB 1100 (6 DoF) + Festo gantry (2 DoF) = 8 DoF
Perception Stereo camera + Kalman Filter → 3D ball state
Policy Hierarchical: LLC (techniques) + HLC (strategy)
Training Blackbox Gradient Sensing (BGS) in simulation
Sim-to-Real Zero-shot via skill descriptors
Adaptation Online H-value update from live game stats
Results 13/29 games, 100% vs beginner, 55% vs amateur
Paper arXiv:2408.03906, ICRA 2025
GitHub google-deepmind/competitive_robot_table_tennis

NT

Nguyễn Anh Tuấn

Robotics & AI Engineer. Building VnRobo — sharing knowledge about robot learning, VLA models, and automation.

Khám phá VnRobo

Related Posts

NEWDeep Dive
VLA-RFT: RL Fine-Tune VLA trong World Simulator
vlavla-rftreinforcement-learningworld-modelgrpoliberoopenhelixmanipulation

VLA-RFT: RL Fine-Tune VLA trong World Simulator

VLA-RFT dùng world model làm simulator để fine-tune VLA bằng GRPO, reward kiểm chứng và code GitHub trên LIBERO.

3/6/202614 min read
NEWTutorial
Chạy Wall-OSS-0.5 với LeRobot
wall-ossvlalerobotmanipulationzero-shot

Chạy Wall-OSS-0.5 với LeRobot

Hướng dẫn chạy Wall-OSS-0.5, VLA 4B open-source zero-shot cho robot manipulation, từ paper đến LeRobot training và inference.

3/6/202613 min read
NEWResearch
A1 VLA: Deploy VLA SOTA với Latency Giảm 72%
vlarobot-armfrankaagibotopen-sourceflow-matchinginference-optimizationmanipulation

A1 VLA: Deploy VLA SOTA với Latency Giảm 72%

Hướng dẫn A1 VLA open-source: giảm latency 72% trên Franka/AgiBot nhờ Inter-Layer Truncated Flow Matching, đạt SOTA trên LIBERO 96.6% và VLABench 53.5%.

1/6/202612 min read