Tool use is what distinguishes humans from most other animals. We use hammers to drive nails, knives to cut, screwdrivers to turn screws — each tool is an "extension" of our hands. Teaching robots to do this is the frontier of manipulation research, because it requires the robot to understand both how to grasp the tool (grasp affordance) and how to use it (functional affordance).
After mastering contact-rich manipulation, we enter the world of tool use — where robots do not just interact with objects, but interact with the world through objects.
Tool Use: Why Is It Hard?
| Challenge | Description |
|---|---|
| Dual object | Must manage both tool and target object |
| Affordance | Know to grasp hammer handle, not the head |
| Extended kinematics | Tool changes workspace and dynamics |
| Multi-phase | Grasp tool, position, use, release |
| Force transmission | Force transmitted through tool is complex |
| Tool variety | Each tool has unique dynamics |
Affordance Learning
Affordance answers the question "which part of the tool should I grasp, and which part do I use?" For example:
- Hammer: Grasp the handle, strike with the head
- Screwdriver: Grasp the handle, rotate the tip into the screw
- Spatula: Grasp the handle, slide the flat surface under food
import numpy as np
class ToolAffordance:
"""Define affordances for different tool types."""
def __init__(self):
self.tools = {
'hammer': {
'grasp_region': {'center': [0, 0, 0.1], 'radius': 0.03},
'functional_region': {'center': [0, 0, -0.05], 'radius': 0.02},
'grasp_orientation': 'perpendicular',
'use_direction': [0, 0, -1],
'mass': 0.5,
'length': 0.25,
},
'screwdriver': {
'grasp_region': {'center': [0, 0, 0.06], 'radius': 0.02},
'functional_region': {'center': [0, 0, -0.08], 'radius': 0.005},
'grasp_orientation': 'parallel',
'use_direction': [0, 0, -1],
'use_rotation': True,
'mass': 0.15,
'length': 0.18,
},
'spatula': {
'grasp_region': {'center': [0, 0, 0.12], 'radius': 0.015},
'functional_region': {'center': [0.04, 0, -0.01], 'radius': 0.04},
'grasp_orientation': 'perpendicular',
'use_direction': [1, 0, 0],
'mass': 0.1,
'length': 0.3,
},
}
def get_grasp_reward(self, tool_name, grasp_pos, tool_frame):
"""Reward for grasping at the correct position."""
tool = self.tools[tool_name]
grasp_center = np.array(tool['grasp_region']['center'])
grasp_world = tool_frame @ np.append(grasp_center, 1)
dist = np.linalg.norm(grasp_pos - grasp_world[:3])
radius = tool['grasp_region']['radius']
if dist < radius:
return 1.0
else:
return max(0, 1.0 - (dist - radius) / 0.05)
def get_functional_alignment(self, tool_name, tool_tip_pos,
target_pos, tool_orientation):
"""Check if tool is correctly oriented for use."""
tool = self.tools[tool_name]
use_dir = np.array(tool['use_direction'])
dist = np.linalg.norm(tool_tip_pos - target_pos)
return 1.0 - np.tanh(5.0 * dist)
Two-Phase Learning: Grasp Tool, Then Use Tool
The most effective strategy for tool use is to split into 2 phases and train separately:
Phase 1: Tool Grasping Policy
class ToolGraspEnv:
"""Environment for learning to grasp tools correctly."""
def __init__(self, tool_name="screwdriver"):
self.tool_name = tool_name
self.affordance = ToolAffordance()
def compute_reward(self, gripper_pos, tool_qpos, tool_contacts):
"""Reward for grasping tool at correct position and orientation."""
# 1. Reach grasp region
grasp_reward = self.affordance.get_grasp_reward(
self.tool_name, gripper_pos, self.tool_frame
)
# 2. Stable grasp — lift without dropping
if self._is_grasped():
lift_height = tool_qpos[2] - self.tool_init_height
stable_reward = min(lift_height / 0.1, 1.0)
else:
stable_reward = 0.0
return grasp_reward + 2.0 * stable_reward
Phase 2: Tool Use Policy
class ToolUseEnv:
"""Environment for learning to use a grasped tool."""
def __init__(self, tool_name="screwdriver"):
self.tool_name = tool_name
# Tool is welded to gripper (skip grasping)
def compute_reward(self, tool_tip_pos, target_pos,
force_on_target, task_progress):
"""Reward for using the tool."""
# 1. Alignment — bring tool to target
alignment = 1.0 - np.tanh(10.0 * np.linalg.norm(
tool_tip_pos - target_pos
))
# 2. Force application — apply force in correct direction
if np.linalg.norm(tool_tip_pos - target_pos) < 0.01:
force_reward = np.tanh(force_on_target / 5.0)
else:
force_reward = 0.0
# 3. Task progress
progress_reward = 10.0 * task_progress
# 4. Completion
if task_progress >= 0.95:
completion = 50.0
else:
completion = 0.0
return alignment + 3.0 * force_reward + progress_reward + completion
Screwdriver Task in MuJoCo
Here is a complete example — a robot uses a screwdriver to turn a screw:
import mujoco
SCREWDRIVER_XML = """
<mujoco model="screwdriver_task">
<option timestep="0.002" gravity="0 0 -9.81"/>
<worldbody>
<light pos="0 0 2"/>
<geom type="plane" size="1 1 0.1" rgba="0.9 0.9 0.9 1"/>
<!-- Workbench -->
<body name="bench" pos="0.5 0 0.4">
<geom type="box" size="0.3 0.3 0.02" rgba="0.5 0.3 0.1 1" mass="100"/>
</body>
<!-- Screw -->
<body name="screw" pos="0.5 0 0.42">
<joint name="screw_rot" type="hinge" axis="0 0 1"
range="0 31.4" damping="0.5"/>
<geom name="screw_head" type="cylinder" size="0.008 0.003"
rgba="0.7 0.7 0.7 1" contype="1" conaffinity="1"/>
<geom name="screw_slot" type="box" size="0.001 0.006 0.001"
pos="0 0 0.003" rgba="0.5 0.5 0.5 1"/>
<site name="screw_top" pos="0 0 0.003" size="0.002"/>
</body>
<!-- Robot arm with attached screwdriver -->
<body name="arm_base" pos="0 0 0.42">
<joint name="j0" type="hinge" axis="0 0 1" range="-3.14 3.14" damping="2"/>
<geom type="cylinder" size="0.04 0.03" rgba="0.7 0.7 0.7 1"/>
<body name="l1" pos="0 0 0.06">
<joint name="j1" type="hinge" axis="0 1 0" range="-1.5 1.5" damping="1.5"/>
<geom type="capsule" fromto="0 0 0 0.25 0 0" size="0.03" rgba="0.7 0.7 0.7 1"/>
<body name="l2" pos="0.25 0 0">
<joint name="j2" type="hinge" axis="0 1 0" range="-2 2" damping="1"/>
<geom type="capsule" fromto="0 0 0 0.2 0 0" size="0.025" rgba="0.7 0.7 0.7 1"/>
<body name="wrist" pos="0.2 0 0">
<joint name="j3" type="hinge" axis="0 0 1" range="-100 100" damping="0.3"/>
<joint name="j4" type="hinge" axis="1 0 0" range="-1.57 1.57" damping="0.5"/>
<site name="ee" pos="0 0 0"/>
<!-- Screwdriver (rigidly attached) -->
<body name="screwdriver" pos="0 0 -0.01">
<geom name="sd_handle" type="cylinder" size="0.012 0.05"
rgba="1 0.8 0.1 1" mass="0.08"/>
<geom name="sd_shaft" type="cylinder" size="0.003 0.06"
pos="0 0 -0.11" rgba="0.7 0.7 0.7 1" mass="0.02"/>
<geom name="sd_tip" type="box" size="0.001 0.005 0.003"
pos="0 0 -0.173" rgba="0.6 0.6 0.6 1" mass="0.005"
contype="1" conaffinity="1" friction="1 0.1 0.01"/>
<site name="tip" pos="0 0 -0.176" size="0.002"/>
</body>
</body>
</body>
</body>
</body>
</worldbody>
<actuator>
<position name="a0" joint="j0" kp="200"/>
<position name="a1" joint="j1" kp="200"/>
<position name="a2" joint="j2" kp="150"/>
<velocity name="a3" joint="j3" kv="20"/>
<position name="a4" joint="j4" kp="80"/>
</actuator>
<sensor>
<force name="tip_force" site="tip"/>
<jointpos name="screw_pos" joint="screw_rot"/>
</sensor>
</mujoco>
"""
class ScrewdriverEnv:
"""Screwdriver insertion environment."""
def __init__(self):
self.model = mujoco.MjModel.from_xml_string(SCREWDRIVER_XML)
self.data = mujoco.MjData(self.model)
self.max_steps = 500
self.target_turns = 5
self.target_angle = self.target_turns * 2 * np.pi
def _get_obs(self):
joint_pos = self.data.qpos[:5]
joint_vel = self.data.qvel[:5]
tip_pos = self.data.site_xpos[1]
screw_pos = self.data.site_xpos[0]
screw_angle = self.data.qpos[5]
rel = tip_pos - screw_pos
tip_force = self.data.sensor('tip_force').data.copy()
return np.concatenate([
joint_pos, joint_vel,
tip_pos, screw_pos, rel,
[screw_angle / self.target_angle],
tip_force,
])
def compute_reward(self):
tip_pos = self.data.site_xpos[1]
screw_pos = self.data.site_xpos[0]
lateral_dist = np.linalg.norm(tip_pos[:2] - screw_pos[:2])
height_dist = abs(tip_pos[2] - screw_pos[2])
align_reward = 2.0 * (1.0 - np.tanh(20.0 * lateral_dist))
contact_reward = 0.0
if lateral_dist < 0.005 and height_dist < 0.005:
contact_reward = 2.0
screw_angle = self.data.qpos[5]
progress = screw_angle / self.target_angle
rotation_reward = 10.0 * progress
if progress >= 0.95:
complete_reward = 100.0
else:
complete_reward = 0.0
force = np.linalg.norm(self.data.sensor('tip_force').data)
force_penalty = -0.1 * max(0, force - 10.0)
return (align_reward + contact_reward +
rotation_reward + complete_reward + force_penalty)
Tool Transfer: One Tool to Many
One of the most exciting goals is transferring skills between similar tools:
class ToolTransferTraining:
"""Train policy transfer between tools."""
def __init__(self, source_tool, target_tools):
self.source_tool = source_tool
self.target_tools = target_tools
def train_with_tool_embedding(self):
"""Use tool embedding to generalize."""
tool_embeddings = {
'screwdriver_flat': np.array([0.18, 0.005, 0.15, 1, 0]),
'screwdriver_phillips': np.array([0.18, 0.005, 0.15, 0, 1]),
'allen_key': np.array([0.12, 0.003, 0.08, 0, 0]),
'hex_driver': np.array([0.15, 0.004, 0.12, 0, 0]),
}
# [length, tip_radius, mass, is_flat, is_phillips]
return tool_embeddings
| Source Tool | Target Tool | Success (no transfer) | Success (with transfer) |
|---|---|---|---|
| Flat screwdriver | Phillips screwdriver | 15% | 72% |
| Flat screwdriver | Allen key | 8% | 58% |
| Hammer | Mallet | 22% | 81% |
| Spatula | Paint scraper | 12% | 65% |
Transfer learning reduces training time by up to 5x for tools in the same family.
References
- DexMV: Imitation Learning for Dexterous Manipulation from Human Videos — Qin et al., 2022
- RoboTool: Creative Robot Tool Use with Large Language Models — Xu et al., 2023
- Tool Use and Understanding in Robotic Manipulation — Survey, 2022
Next in the Series
The final post — Multi-Step Manipulation: Curriculum Learning for Long-Horizon Tasks — tackles the hardest problem: chains of multiple sequential manipulation tasks (10+ steps), using hierarchical RL and automatic curriculum.