VnRobo
AboutPricingBlogContact
🇻🇳VISign InStart Free Trial
🇻🇳VI
VnRobo logo

AI infrastructure for next-generation industrial robots.

Product

  • Features
  • Pricing
  • Knowledge Base
  • Services

Company

  • About Us
  • Blog
  • Contact

Legal

  • Privacy Policy
  • Terms of Service

© 2026 VnRobo. All rights reserved.

Made with♥in Vietnam
VnRobo
AboutPricingBlogContact
🇻🇳VISign InStart Free Trial
🇻🇳VI
  1. Home
  2. Blog
  3. Robot Grasping 101: Analytical to Learning-Based
manipulationgraspingmanipulationdeep-learningpoint-cloud

Robot Grasping 101: Analytical to Learning-Based

From traditional force closure to deep learning grasp detection -- overview of modern grasping methods and how to choose right approach for your project.

Nguyen Anh TuanFebruary 6, 20268 min readUpdated: Jun 14, 2026
Robot Grasping 101: Analytical to Learning-Based

Why is Grasping the Hardest Problem in Manipulation?

For humans, picking up a coffee cup is unconscious -- eyes see, hand reaches, fingers close. For robots, this combines perception, planning, and control extremely complexly.

A robot arm must answer 3 questions before grasping: (1) where on the object to contact? (2) what orientation of gripper is stable? (3) how much force is enough without breaking? Each question is a separate problem, and solution methods evolved from analytical (geometry + physics calculations) to learning-based (learning from data) over the past two decades.

This is Part 1 of Robot Manipulation Masterclass series -- I'll go from classical theory to latest deep learning models, with comparison tables to help you choose the right approach for your project.

Robot gripper contacting object -- fundamental manipulation problem
Robot gripper contacting object -- fundamental manipulation problem

Analytical Grasping: Physical Foundations

Force Closure -- Necessary and Sufficient Condition

The most important concept in analytical grasping is force closure: a grasp has force closure when its contact forces can resist any external wrench (force + moment) applied to the object.

Simple analogy: when holding a ball with 2 fingers (thumb and index), friction at the 2 contact points creates a wrench space large enough to hold the ball steady without slipping, even if you tilt your hand or someone pushes it gently.

Mathematical condition for force closure:

A grasp G with n contact points has force closure if and only if:
the convex hull of primitive wrenches encloses origin
in 6D wrench space (3 force + 3 torque).

Steps to compute:

  1. Define contact model: point contact with friction (PCWF), soft finger, or rigid body
  2. Compute friction cone: at each contact, create cone of allowable forces based on friction coefficient (mu)
  3. Map to wrench space: each contact force → wrench = [force; torque] via cross product with contact position
  4. Check force closure: does convex hull of all wrenches contain origin?

GraspIt! -- Classical Grasp Simulator

GraspIt! is a classical grasp planning simulator developed at Columbia University since 2004, still widely used for analytical grasping research.

GraspIt! allows:

  • Load robot hand models (Barrett, Shadow, Allegro) and object meshes
  • Auto-search grasp poses via eigengrasp planner or simulated annealing
  • Calculate grasp quality metrics: epsilon metric (largest smallest wrench it can resist), volume metric (total wrench space volume)
# Install GraspIt! (Ubuntu)
sudo apt-get install libqt5-dev libsoqt520-dev libcoin-dev
git clone https://github.com/graspit-simulator/graspit.git
cd graspit && mkdir build && cd build
cmake .. && make -j$(nproc)

Limitation of analytical approach: needs accurate 3D object models, precise friction coefficients, and doesn't scale well to unknown objects. This is what motivated learning-based methods.

Learning-Based Grasping: Learning from Data

PointNetGPD -- Direct Point Cloud

PointNetGPD (Liang et al., 2018) is one of the first papers using PointNet to evaluate grasp quality directly from point cloud.

Core idea: instead of complex geometric analysis, let a neural network learn from 350K grasp samples on the YCB object set. Input is point cloud inside the gripper volume, output is a grasp quality score.

# PointNetGPD inference pipeline (simplified)
import torch
from pointnet_gpd import PointNetGPD

model = PointNetGPD(num_classes=2)
model.load_state_dict(torch.load("pointnetgpd_weights.pth"))

# 1. Sample grasp candidates from point cloud
grasp_candidates = sample_antipodal_grasps(point_cloud, num_samples=200)

# 2. For each candidate, crop point cloud inside gripper volume
for grasp in grasp_candidates:
    local_points = crop_points_in_gripper(point_cloud, grasp)
    # 3. PointNet predicts quality score
    score = model(local_points)
    grasp.quality = score

# 4. Select grasp with highest score
best_grasp = max(grasp_candidates, key=lambda g: g.quality)

Advantages: real-time, no object model needed, generalizes well to unseen objects.

Disadvantages: only supports parallel-jaw gripper, doesn't consider scene context (occlusion, clutter).

Contact-GraspNet -- 6-DoF in Clutter

Contact-GraspNet (Sundermeyer et al., 2021) solves PointNetGPD's limitations by generating 6-DoF grasps directly from scene point cloud, while accounting for clutter.

Deep learning for robot grasping -- from point cloud to grasp pose
Deep learning for robot grasping -- from point cloud to grasp pose

Key breakthroughs:

  • Contact-based representation: each point in the point cloud is a potential grasp contact -- only need to predict 4-DoF (approach direction + grasp width) instead of full 6-DoF, which reduces learning complexity
  • Trained on 17 million simulated grasps, generalizes well to real sensor data
  • Achieves >90% success rate on unseen objects in structured clutter -- double the prior state-of-the-art
# Contact-GraspNet inference (simplified)
from contact_graspnet import ContactGraspNet

model = ContactGraspNet.load_pretrained()

# Input: single-view depth image -> point cloud
point_cloud = depth_to_pointcloud(depth_image, camera_intrinsics)

# Output: set of 6-DoF grasps with confidence scores
grasps, scores, contact_points = model.predict(
    point_cloud,
    forward_passes=5  # multiple passes for uncertainty estimation
)

# Filter and rank
valid_grasps = grasps[scores > 0.5]
best_grasp = valid_grasps[scores.argmax()]

Method Comparison

Criterion Analytical (GraspIt!) PointNetGPD Contact-GraspNet
Input 3D mesh + friction Point cloud (local) Point cloud (scene)
Output Grasp + quality metric Grasp score 6-DoF grasps + scores
Unknown objects No (needs mesh) Yes Yes
Clutter handling No Limited Good
Speed Slow (optimization) Real-time ~0.5s/scene
Gripper type Multi-finger Parallel-jaw Parallel-jaw
Training data Not needed 350K grasps 17M grasps
Success rate (real) ~70-80% (known objects) ~85% >90%
Best use case Research, multi-finger Quick prototype Production clutter

Grasp Quality Metrics

Regardless of method, you need to measure grasp quality. Here are the most common metrics:

Epsilon Metric (Force Closure Quality)

The epsilon metric is the radius of the largest inscribed ball in the wrench space convex hull. Epsilon > 0 means force closure. Larger epsilon means the grasp is more robust against external disturbances.

# Compute epsilon metric
from scipy.spatial import ConvexHull
import numpy as np

def epsilon_metric(wrenches):
    """
    wrenches: (N, 6) array of primitive wrenches
    Returns: epsilon value (>0 = force closure)
    """
    hull = ConvexHull(wrenches)
    # Shortest distance from origin to each facet
    distances = []
    for eq in hull.equations:
        normal = eq[:-1]
        offset = eq[-1]
        dist = abs(offset) / np.linalg.norm(normal)
        distances.append(dist)
    return min(distances)

Grasp Success Rate (Empirical)

The most practical metric: run N grasp attempts and count successes. A grasp "succeeds" when the robot picks up the object, lifts it 10cm, and holds for 3 seconds without dropping.

Diversity and Coverage

Beyond quality, diversity also matters: a good grasp planner generates many candidates from different approach directions, giving the robot fallback options when the preferred grasp is blocked by obstacles.

Analytical vs. Learning: When to Use What?

Choose Analytical when:

  • Working with known objects that have accurate 3D models (e.g., assembly lines with fixed parts)
  • Need interpretability -- explain why a grasp is good/bad (important for safety-critical applications)
  • Using multi-finger hands (Shadow, Allegro) -- learning methods for multi-finger are still immature
  • Need grasp quality guarantees (provable force closure)

Choose Learning-Based when:

  • Need to generalize to unknown objects (warehouse, home environment)
  • Environment has clutter (multiple overlapping objects)
  • Only have partial observations (single-view depth camera)
  • Need real-time performance (<1s per grasp)
  • Using parallel-jaw gripper (most common in industry)

Hybrid Approach

The 2025-2026 trend is to combine both: use a learning model to quickly generate grasp candidates, then use analytical metrics to verify and rank them. This is the approach being taken by Google DeepMind and UC Berkeley in their latest systems.

Robot arm performing grasping in cluttered environment
Robot arm performing grasping in cluttered environment

Hands-on: Running Contact-GraspNet

If you want to try it right away, here's the fastest setup:

# Clone repo
git clone https://github.com/NVlabs/contact_graspnet.git
cd contact_graspnet

# Install dependencies (Python 3.8+, CUDA 11.x)
pip install -r requirements.txt

# Download pre-trained weights
bash download_weights.sh

# Run inference on sample depth image
python contact_graspnet/inference.py \
    --np_path=test_data/scene_0.npy \
    --forward_passes=5 \
    --z_range=[0.2,1.2]

Output is a set of 6-DoF grasps visualized on the point cloud. From there you can integrate with a robot arm via ROS 2 or directly through inverse kinematics.

Resources

  • GraspIt! docs: https://graspit-simulator.github.io/
  • Contact-GraspNet paper: arXiv:2103.14127
  • PointNetGPD paper: arXiv:1809.06267
  • Grasp quality survey: Ferrari & Canny, "Planning Optimal Grasps" (1992) -- foundational paper on force closure metrics

Next in Series

This is Part 1 of Robot Manipulation Masterclass. Coming up:

  • Part 2: Imitation Learning for Manipulation: BC, DAgger, ACT -- Teaching robots manipulation from demonstrations
  • Part 3: Diffusion Policy in Practice: From Theory to Code -- State-of-the-art policy learning

Tool recommendations

VLA train/deploy stack

Train on cloud/workstation, then deploy optimized models to Jetson or the robot computer.

Cloud GPU for VLA / policy training Use for imitation learning, diffusion policies, RL, and robotics model fine-tuning. View cloud GPU → NVIDIA Jetson Orin NX / Orin Nano Edge deployment hardware for perception, logging, and optimized inference. View Jetson → Hugging Face / robotics dataset hosting Host datasets, checkpoints, and model cards for cleaner LeRobot/VLA workflows. View platform →

Related Posts

  • Imitation Learning for Manipulation: BC, DAgger, ACT -- Part 2 of this series
  • Tactile Sensing for Manipulation -- How tactile sensors improve grasping precision
  • Foundation Models for Robots: RT-2, Octo, OpenVLA -- VLA models that can grasp zero-shot
  • Inverse Kinematics for 6-DOF Robots -- IK needed to execute grasp poses
NT

Nguyễn Anh Tuấn

Robotics & AI Engineer. Building VnRobo — sharing knowledge about robot learning, VLA models, and automation.

Khám phá VnRobo

Fleet MonitoringROS 2 IntegrationAMR Solutions
manipulation-masterclass — Phần 1/7
Imitation Learning for Manipulation: BC, DAgger, ACT →

Related Posts

Tutorial
Xây dựng hệ thống manipulation với LeRobot
LeRobotmanipulationdeploymentPart 7
manipulation

Xây dựng hệ thống manipulation với LeRobot

End-to-end tutorial: setup LeRobot, record demonstrations, train policy (ACT/Diffusion), evaluate và deploy lên robot arm thật.

3/2/20269 min read
NT
Tutorial
Bimanual Manipulation: Dạy robot dùng 2 tay
bimanualmanipulationALOHAPart 6
manipulation

Bimanual Manipulation: Dạy robot dùng 2 tay

ALOHA hardware, Mobile ALOHA, ACT for bimanual tasks, data collection tips và LeRobot SO-100 dual arm -- hướng dẫn đầy đủ về bimanual manipulation.

2/26/20268 min read
NT
Deep Dive
Dexterous Manipulation: Thao tác bàn tay robot
dexterousmanipulationtactile-sensingPart 5
manipulation

Dexterous Manipulation: Thao tác bàn tay robot

In-hand rotation, tool use, DexGraspNet và tactile sensing -- hướng dẫn toàn diện về dexterous manipulation với multi-finger robot hands.

2/22/20268 min read
NT
VnRobo logo

AI infrastructure for next-generation industrial robots.

Product

  • Features
  • Pricing
  • Knowledge Base
  • Services

Company

  • About Us
  • Blog
  • Contact

Legal

  • Privacy Policy
  • Terms of Service

© 2026 VnRobo. All rights reserved.

Made with♥in Vietnam