Robot Grasping 101: Analytical to Learning-Based

Why is Grasping the Hardest Problem in Manipulation?

For humans, picking up a coffee cup is unconscious -- eyes see, hand reaches, fingers close. For robots, this combines perception, planning, and control extremely complexly.

A robot arm must answer 3 questions before grasping: (1) where on the object to contact? (2) what orientation of gripper is stable? (3) how much force is enough without breaking? Each question is a separate problem, and solution methods evolved from analytical (geometry + physics calculations) to learning-based (learning from data) over the past two decades.

This is Part 1 of Robot Manipulation Masterclass series -- I'll go from classical theory to latest deep learning models, with comparison tables to help you choose the right approach for your project.

Robot gripper contacting object -- fundamental manipulation problem

Analytical Grasping: Physical Foundations

Force Closure -- Necessary and Sufficient Condition

The most important concept in analytical grasping is force closure: a grasp has force closure when its contact forces can resist any external wrench (force + moment) applied to the object.

Simple analogy: when holding a ball with 2 fingers (thumb and index), friction at the 2 contact points creates a wrench space large enough to hold the ball steady without slipping, even if you tilt your hand or someone pushes it gently.

Mathematical condition for force closure:

A grasp G with n contact points has force closure if and only if:
the convex hull of primitive wrenches encloses origin
in 6D wrench space (3 force + 3 torque).

Steps to compute:

Define contact model: point contact with friction (PCWF), soft finger, or rigid body
Compute friction cone: at each contact, create cone of allowable forces based on friction coefficient (mu)
Map to wrench space: each contact force → wrench = [force; torque] via cross product with contact position
Check force closure: does convex hull of all wrenches contain origin?

GraspIt! -- Classical Grasp Simulator

GraspIt! is a classical grasp planning simulator developed at Columbia University since 2004, still widely used for analytical grasping research.

GraspIt! allows:

Load robot hand models (Barrett, Shadow, Allegro) and object meshes
Auto-search grasp poses via eigengrasp planner or simulated annealing
Calculate grasp quality metrics: epsilon metric (largest smallest wrench it can resist), volume metric (total wrench space volume)

# Install GraspIt! (Ubuntu)
sudo apt-get install libqt5-dev libsoqt520-dev libcoin-dev
git clone https://github.com/graspit-simulator/graspit.git
cd graspit && mkdir build && cd build
cmake .. && make -j$(nproc)

Limitation of analytical approach: needs accurate 3D object models, precise friction coefficients, and doesn't scale well to unknown objects. This is what motivated learning-based methods.

Learning-Based Grasping: Learning from Data

PointNetGPD -- Direct Point Cloud

PointNetGPD (Liang et al., 2018) is one of the first papers using PointNet to evaluate grasp quality directly from point cloud.

Core idea: instead of complex geometric analysis, let a neural network learn from 350K grasp samples on the YCB object set. Input is point cloud inside the gripper volume, output is a grasp quality score.

# PointNetGPD inference pipeline (simplified)
import torch
from pointnet_gpd import PointNetGPD

model = PointNetGPD(num_classes=2)
model.load_state_dict(torch.load("pointnetgpd_weights.pth"))

# 1. Sample grasp candidates from point cloud
grasp_candidates = sample_antipodal_grasps(point_cloud, num_samples=200)

# 2. For each candidate, crop point cloud inside gripper volume
for grasp in grasp_candidates:
    local_points = crop_points_in_gripper(point_cloud, grasp)
    # 3. PointNet predicts quality score
    score = model(local_points)
    grasp.quality = score

# 4. Select grasp with highest score
best_grasp = max(grasp_candidates, key=lambda g: g.quality)

Advantages: real-time, no object model needed, generalizes well to unseen objects.

Disadvantages: only supports parallel-jaw gripper, doesn't consider scene context (occlusion, clutter).

Contact-GraspNet -- 6-DoF in Clutter

Contact-GraspNet (Sundermeyer et al., 2021) solves PointNetGPD's limitations by generating 6-DoF grasps directly from scene point cloud, while accounting for clutter.

Deep learning for robot grasping -- from point cloud to grasp pose

Key breakthroughs:

Contact-based representation: each point in the point cloud is a potential grasp contact -- only need to predict 4-DoF (approach direction + grasp width) instead of full 6-DoF, which reduces learning complexity
Trained on 17 million simulated grasps, generalizes well to real sensor data
Achieves >90% success rate on unseen objects in structured clutter -- double the prior state-of-the-art

# Contact-GraspNet inference (simplified)
from contact_graspnet import ContactGraspNet

model = ContactGraspNet.load_pretrained()

# Input: single-view depth image -> point cloud
point_cloud = depth_to_pointcloud(depth_image, camera_intrinsics)

# Output: set of 6-DoF grasps with confidence scores
grasps, scores, contact_points = model.predict(
    point_cloud,
    forward_passes=5  # multiple passes for uncertainty estimation
)

# Filter and rank
valid_grasps = grasps[scores > 0.5]
best_grasp = valid_grasps[scores.argmax()]

Method Comparison

Criterion	Analytical (GraspIt!)	PointNetGPD	Contact-GraspNet
Input	3D mesh + friction	Point cloud (local)	Point cloud (scene)
Output	Grasp + quality metric	Grasp score	6-DoF grasps + scores
Unknown objects	No (needs mesh)	Yes	Yes
Clutter handling	No	Limited	Good
Speed	Slow (optimization)	Real-time	~0.5s/scene
Gripper type	Multi-finger	Parallel-jaw	Parallel-jaw
Training data	Not needed	350K grasps	17M grasps
Success rate (real)	~70-80% (known objects)	~85%	>90%
Best use case	Research, multi-finger	Quick prototype	Production clutter

Grasp Quality Metrics

Regardless of method, you need to measure grasp quality. Here are the most common metrics:

Epsilon Metric (Force Closure Quality)

The epsilon metric is the radius of the largest inscribed ball in the wrench space convex hull. Epsilon > 0 means force closure. Larger epsilon means the grasp is more robust against external disturbances.

# Compute epsilon metric
from scipy.spatial import ConvexHull
import numpy as np

def epsilon_metric(wrenches):
    """
    wrenches: (N, 6) array of primitive wrenches
    Returns: epsilon value (>0 = force closure)
    """
    hull = ConvexHull(wrenches)
    # Shortest distance from origin to each facet
    distances = []
    for eq in hull.equations:
        normal = eq[:-1]
        offset = eq[-1]
        dist = abs(offset) / np.linalg.norm(normal)
        distances.append(dist)
    return min(distances)

Grasp Success Rate (Empirical)

The most practical metric: run N grasp attempts and count successes. A grasp "succeeds" when the robot picks up the object, lifts it 10cm, and holds for 3 seconds without dropping.

Diversity and Coverage

Beyond quality, diversity also matters: a good grasp planner generates many candidates from different approach directions, giving the robot fallback options when the preferred grasp is blocked by obstacles.

Analytical vs. Learning: When to Use What?

Choose Analytical when:

Working with known objects that have accurate 3D models (e.g., assembly lines with fixed parts)
Need interpretability -- explain why a grasp is good/bad (important for safety-critical applications)
Using multi-finger hands (Shadow, Allegro) -- learning methods for multi-finger are still immature
Need grasp quality guarantees (provable force closure)

Choose Learning-Based when:

Need to generalize to unknown objects (warehouse, home environment)
Environment has clutter (multiple overlapping objects)
Only have partial observations (single-view depth camera)
Need real-time performance (<1s per grasp)
Using parallel-jaw gripper (most common in industry)

Hybrid Approach

The 2025-2026 trend is to combine both: use a learning model to quickly generate grasp candidates, then use analytical metrics to verify and rank them. This is the approach being taken by Google DeepMind and UC Berkeley in their latest systems.

Robot arm performing grasping in cluttered environment

Hands-on: Running Contact-GraspNet

If you want to try it right away, here's the fastest setup:

# Clone repo
git clone https://github.com/NVlabs/contact_graspnet.git
cd contact_graspnet

# Install dependencies (Python 3.8+, CUDA 11.x)
pip install -r requirements.txt

# Download pre-trained weights
bash download_weights.sh

# Run inference on sample depth image
python contact_graspnet/inference.py \
    --np_path=test_data/scene_0.npy \
    --forward_passes=5 \
    --z_range=[0.2,1.2]

Output is a set of 6-DoF grasps visualized on the point cloud. From there you can integrate with a robot arm via ROS 2 or directly through inverse kinematics.

Resources

GraspIt! docs: https://graspit-simulator.github.io/
Contact-GraspNet paper: arXiv:2103.14127
PointNetGPD paper: arXiv:1809.06267
Grasp quality survey: Ferrari & Canny, "Planning Optimal Grasps" (1992) -- foundational paper on force closure metrics

Next in Series

This is Part 1 of Robot Manipulation Masterclass. Coming up:

Part 2: Imitation Learning for Manipulation: BC, DAgger, ACT -- Teaching robots manipulation from demonstrations
Part 3: Diffusion Policy in Practice: From Theory to Code -- State-of-the-art policy learning

Imitation Learning for Manipulation: BC, DAgger, ACT -- Part 2 of this series
Tactile Sensing for Manipulation -- How tactile sensors improve grasping precision
Foundation Models for Robots: RT-2, Octo, OpenVLA -- VLA models that can grasp zero-shot
Inverse Kinematics for 6-DOF Robots -- IK needed to execute grasp poses

Why is Grasping the Hardest Problem in Manipulation?

For humans, picking up a coffee cup is unconscious -- eyes see, hand reaches, fingers close. For robots, this combines perception, planning, and control extremely complexly.

Robot gripper contacting object -- fundamental manipulation problem

Analytical Grasping: Physical Foundations

Force Closure -- Necessary and Sufficient Condition

The most important concept in analytical grasping is force closure: a grasp has force closure when its contact forces can resist any external wrench (force + moment) applied to the object.

Mathematical condition for force closure:

A grasp G with n contact points has force closure if and only if:
the convex hull of primitive wrenches encloses origin
in 6D wrench space (3 force + 3 torque).

Steps to compute:

Define contact model: point contact with friction (PCWF), soft finger, or rigid body
Compute friction cone: at each contact, create cone of allowable forces based on friction coefficient (mu)
Map to wrench space: each contact force → wrench = [force; torque] via cross product with contact position
Check force closure: does convex hull of all wrenches contain origin?

GraspIt! -- Classical Grasp Simulator

GraspIt! is a classical grasp planning simulator developed at Columbia University since 2004, still widely used for analytical grasping research.

GraspIt! allows:

Load robot hand models (Barrett, Shadow, Allegro) and object meshes
Auto-search grasp poses via eigengrasp planner or simulated annealing
Calculate grasp quality metrics: epsilon metric (largest smallest wrench it can resist), volume metric (total wrench space volume)

# Install GraspIt! (Ubuntu)
sudo apt-get install libqt5-dev libsoqt520-dev libcoin-dev
git clone https://github.com/graspit-simulator/graspit.git
cd graspit && mkdir build && cd build
cmake .. && make -j$(nproc)

Limitation of analytical approach: needs accurate 3D object models, precise friction coefficients, and doesn't scale well to unknown objects. This is what motivated learning-based methods.

Learning-Based Grasping: Learning from Data

PointNetGPD -- Direct Point Cloud

PointNetGPD (Liang et al., 2018) is one of the first papers using PointNet to evaluate grasp quality directly from point cloud.

# PointNetGPD inference pipeline (simplified)
import torch
from pointnet_gpd import PointNetGPD

model = PointNetGPD(num_classes=2)
model.load_state_dict(torch.load("pointnetgpd_weights.pth"))

# 1. Sample grasp candidates from point cloud
grasp_candidates = sample_antipodal_grasps(point_cloud, num_samples=200)

# 2. For each candidate, crop point cloud inside gripper volume
for grasp in grasp_candidates:
    local_points = crop_points_in_gripper(point_cloud, grasp)
    # 3. PointNet predicts quality score
    score = model(local_points)
    grasp.quality = score

# 4. Select grasp with highest score
best_grasp = max(grasp_candidates, key=lambda g: g.quality)

Advantages: real-time, no object model needed, generalizes well to unseen objects.

Disadvantages: only supports parallel-jaw gripper, doesn't consider scene context (occlusion, clutter).

Contact-GraspNet -- 6-DoF in Clutter

Contact-GraspNet (Sundermeyer et al., 2021) solves PointNetGPD's limitations by generating 6-DoF grasps directly from scene point cloud, while accounting for clutter.

Deep learning for robot grasping -- from point cloud to grasp pose

Key breakthroughs:

Contact-based representation: each point in the point cloud is a potential grasp contact -- only need to predict 4-DoF (approach direction + grasp width) instead of full 6-DoF, which reduces learning complexity
Trained on 17 million simulated grasps, generalizes well to real sensor data
Achieves >90% success rate on unseen objects in structured clutter -- double the prior state-of-the-art

# Contact-GraspNet inference (simplified)
from contact_graspnet import ContactGraspNet

model = ContactGraspNet.load_pretrained()

# Input: single-view depth image -> point cloud
point_cloud = depth_to_pointcloud(depth_image, camera_intrinsics)

# Output: set of 6-DoF grasps with confidence scores
grasps, scores, contact_points = model.predict(
    point_cloud,
    forward_passes=5  # multiple passes for uncertainty estimation
)

# Filter and rank
valid_grasps = grasps[scores > 0.5]
best_grasp = valid_grasps[scores.argmax()]

Method Comparison

Criterion	Analytical (GraspIt!)	PointNetGPD	Contact-GraspNet
Input	3D mesh + friction	Point cloud (local)	Point cloud (scene)
Output	Grasp + quality metric	Grasp score	6-DoF grasps + scores
Unknown objects	No (needs mesh)	Yes	Yes
Clutter handling	No	Limited	Good
Speed	Slow (optimization)	Real-time	~0.5s/scene
Gripper type	Multi-finger	Parallel-jaw	Parallel-jaw
Training data	Not needed	350K grasps	17M grasps
Success rate (real)	~70-80% (known objects)	~85%	>90%
Best use case	Research, multi-finger	Quick prototype	Production clutter

Grasp Quality Metrics

Regardless of method, you need to measure grasp quality. Here are the most common metrics:

Epsilon Metric (Force Closure Quality)

# Compute epsilon metric
from scipy.spatial import ConvexHull
import numpy as np

def epsilon_metric(wrenches):
    """
    wrenches: (N, 6) array of primitive wrenches
    Returns: epsilon value (>0 = force closure)
    """
    hull = ConvexHull(wrenches)
    # Shortest distance from origin to each facet
    distances = []
    for eq in hull.equations:
        normal = eq[:-1]
        offset = eq[-1]
        dist = abs(offset) / np.linalg.norm(normal)
        distances.append(dist)
    return min(distances)

Grasp Success Rate (Empirical)

The most practical metric: run N grasp attempts and count successes. A grasp "succeeds" when the robot picks up the object, lifts it 10cm, and holds for 3 seconds without dropping.

Diversity and Coverage

Analytical vs. Learning: When to Use What?

Choose Analytical when:

Working with known objects that have accurate 3D models (e.g., assembly lines with fixed parts)
Need interpretability -- explain why a grasp is good/bad (important for safety-critical applications)
Using multi-finger hands (Shadow, Allegro) -- learning methods for multi-finger are still immature
Need grasp quality guarantees (provable force closure)

Choose Learning-Based when:

Need to generalize to unknown objects (warehouse, home environment)
Environment has clutter (multiple overlapping objects)
Only have partial observations (single-view depth camera)
Need real-time performance (<1s per grasp)
Using parallel-jaw gripper (most common in industry)

Hybrid Approach

Robot arm performing grasping in cluttered environment

Hands-on: Running Contact-GraspNet

If you want to try it right away, here's the fastest setup:

# Clone repo
git clone https://github.com/NVlabs/contact_graspnet.git
cd contact_graspnet

# Install dependencies (Python 3.8+, CUDA 11.x)
pip install -r requirements.txt

# Download pre-trained weights
bash download_weights.sh

# Run inference on sample depth image
python contact_graspnet/inference.py \
    --np_path=test_data/scene_0.npy \
    --forward_passes=5 \
    --z_range=[0.2,1.2]

Output is a set of 6-DoF grasps visualized on the point cloud. From there you can integrate with a robot arm via ROS 2 or directly through inverse kinematics.

Resources

GraspIt! docs: https://graspit-simulator.github.io/
Contact-GraspNet paper: arXiv:2103.14127
PointNetGPD paper: arXiv:1809.06267
Grasp quality survey: Ferrari & Canny, "Planning Optimal Grasps" (1992) -- foundational paper on force closure metrics

Next in Series

This is Part 1 of Robot Manipulation Masterclass. Coming up:

Part 2: Imitation Learning for Manipulation: BC, DAgger, ACT -- Teaching robots manipulation from demonstrations
Part 3: Diffusion Policy in Practice: From Theory to Code -- State-of-the-art policy learning

Imitation Learning for Manipulation: BC, DAgger, ACT -- Part 2 of this series
Tactile Sensing for Manipulation -- How tactile sensors improve grasping precision
Foundation Models for Robots: RT-2, Octo, OpenVLA -- VLA models that can grasp zero-shot
Inverse Kinematics for 6-DOF Robots -- IK needed to execute grasp poses

Why is Grasping the Hardest Problem in Manipulation?

Analytical Grasping: Physical Foundations

Force Closure -- Necessary and Sufficient Condition

GraspIt! -- Classical Grasp Simulator

Learning-Based Grasping: Learning from Data

PointNetGPD -- Direct Point Cloud

Contact-GraspNet -- 6-DoF in Clutter

Method Comparison

Grasp Quality Metrics

Epsilon Metric (Force Closure Quality)

Grasp Success Rate (Empirical)

Diversity and Coverage

Analytical vs. Learning: When to Use What?

Choose Analytical when:

Choose Learning-Based when:

Hybrid Approach

Hands-on: Running Contact-GraspNet

Resources

Next in Series

Related Posts

Nguyễn Anh Tuấn

Related Posts

Xây dựng hệ thống manipulation với LeRobot

Bimanual Manipulation: Dạy robot dùng 2 tay

Dexterous Manipulation: Thao tác bàn tay robot

Why is Grasping the Hardest Problem in Manipulation?

Analytical Grasping: Physical Foundations

Force Closure -- Necessary and Sufficient Condition

GraspIt! -- Classical Grasp Simulator

Learning-Based Grasping: Learning from Data

PointNetGPD -- Direct Point Cloud

Contact-GraspNet -- 6-DoF in Clutter

Method Comparison

Grasp Quality Metrics

Epsilon Metric (Force Closure Quality)

Grasp Success Rate (Empirical)

Diversity and Coverage

Analytical vs. Learning: When to Use What?

Choose Analytical when:

Choose Learning-Based when:

Hybrid Approach

Hands-on: Running Contact-GraspNet

Resources

Next in Series

Related Posts

Nguyễn Anh Tuấn

Related Posts

Xây dựng hệ thống manipulation với LeRobot

Bimanual Manipulation: Dạy robot dùng 2 tay

Dexterous Manipulation: Thao tác bàn tay robot