← Back to Blog
manipulationrltool-usemanipulationdexterous

Tool Use: Teaching Robots to Use Tools with RL

Teach robots to use tools with RL — affordance learning, two-phase grasp-then-use, and screwdriver insertion in MuJoCo.

Nguyễn Anh Tuấn3 tháng 4, 20267 min read
Tool Use: Teaching Robots to Use Tools with RL

Tool use is what distinguishes humans from most other animals. We use hammers to drive nails, knives to cut, screwdrivers to turn screws — each tool is an "extension" of our hands. Teaching robots to do this is the frontier of manipulation research, because it requires the robot to understand both how to grasp the tool (grasp affordance) and how to use it (functional affordance).

After mastering contact-rich manipulation, we enter the world of tool use — where robots do not just interact with objects, but interact with the world through objects.

Tool Use: Why Is It Hard?

Challenge Description
Dual object Must manage both tool and target object
Affordance Know to grasp hammer handle, not the head
Extended kinematics Tool changes workspace and dynamics
Multi-phase Grasp tool, position, use, release
Force transmission Force transmitted through tool is complex
Tool variety Each tool has unique dynamics

Robot using tools

Affordance Learning

Affordance answers the question "which part of the tool should I grasp, and which part do I use?" For example:

import numpy as np

class ToolAffordance:
    """Define affordances for different tool types."""
    
    def __init__(self):
        self.tools = {
            'hammer': {
                'grasp_region': {'center': [0, 0, 0.1], 'radius': 0.03},
                'functional_region': {'center': [0, 0, -0.05], 'radius': 0.02},
                'grasp_orientation': 'perpendicular',
                'use_direction': [0, 0, -1],
                'mass': 0.5,
                'length': 0.25,
            },
            'screwdriver': {
                'grasp_region': {'center': [0, 0, 0.06], 'radius': 0.02},
                'functional_region': {'center': [0, 0, -0.08], 'radius': 0.005},
                'grasp_orientation': 'parallel',
                'use_direction': [0, 0, -1],
                'use_rotation': True,
                'mass': 0.15,
                'length': 0.18,
            },
            'spatula': {
                'grasp_region': {'center': [0, 0, 0.12], 'radius': 0.015},
                'functional_region': {'center': [0.04, 0, -0.01], 'radius': 0.04},
                'grasp_orientation': 'perpendicular',
                'use_direction': [1, 0, 0],
                'mass': 0.1,
                'length': 0.3,
            },
        }
    
    def get_grasp_reward(self, tool_name, grasp_pos, tool_frame):
        """Reward for grasping at the correct position."""
        tool = self.tools[tool_name]
        grasp_center = np.array(tool['grasp_region']['center'])
        
        grasp_world = tool_frame @ np.append(grasp_center, 1)
        
        dist = np.linalg.norm(grasp_pos - grasp_world[:3])
        radius = tool['grasp_region']['radius']
        
        if dist < radius:
            return 1.0
        else:
            return max(0, 1.0 - (dist - radius) / 0.05)
    
    def get_functional_alignment(self, tool_name, tool_tip_pos, 
                                  target_pos, tool_orientation):
        """Check if tool is correctly oriented for use."""
        tool = self.tools[tool_name]
        use_dir = np.array(tool['use_direction'])
        
        dist = np.linalg.norm(tool_tip_pos - target_pos)
        return 1.0 - np.tanh(5.0 * dist)

Two-Phase Learning: Grasp Tool, Then Use Tool

The most effective strategy for tool use is to split into 2 phases and train separately:

Phase 1: Tool Grasping Policy

class ToolGraspEnv:
    """Environment for learning to grasp tools correctly."""
    
    def __init__(self, tool_name="screwdriver"):
        self.tool_name = tool_name
        self.affordance = ToolAffordance()
    
    def compute_reward(self, gripper_pos, tool_qpos, tool_contacts):
        """Reward for grasping tool at correct position and orientation."""
        
        # 1. Reach grasp region
        grasp_reward = self.affordance.get_grasp_reward(
            self.tool_name, gripper_pos, self.tool_frame
        )
        
        # 2. Stable grasp — lift without dropping
        if self._is_grasped():
            lift_height = tool_qpos[2] - self.tool_init_height
            stable_reward = min(lift_height / 0.1, 1.0)
        else:
            stable_reward = 0.0
        
        return grasp_reward + 2.0 * stable_reward

Phase 2: Tool Use Policy

class ToolUseEnv:
    """Environment for learning to use a grasped tool."""
    
    def __init__(self, tool_name="screwdriver"):
        self.tool_name = tool_name
        # Tool is welded to gripper (skip grasping)
    
    def compute_reward(self, tool_tip_pos, target_pos,
                       force_on_target, task_progress):
        """Reward for using the tool."""
        
        # 1. Alignment — bring tool to target
        alignment = 1.0 - np.tanh(10.0 * np.linalg.norm(
            tool_tip_pos - target_pos
        ))
        
        # 2. Force application — apply force in correct direction
        if np.linalg.norm(tool_tip_pos - target_pos) < 0.01:
            force_reward = np.tanh(force_on_target / 5.0)
        else:
            force_reward = 0.0
        
        # 3. Task progress
        progress_reward = 10.0 * task_progress
        
        # 4. Completion
        if task_progress >= 0.95:
            completion = 50.0
        else:
            completion = 0.0
        
        return alignment + 3.0 * force_reward + progress_reward + completion

Screwdriver Task in MuJoCo

Here is a complete example — a robot uses a screwdriver to turn a screw:

import mujoco

SCREWDRIVER_XML = """
<mujoco model="screwdriver_task">
  <option timestep="0.002" gravity="0 0 -9.81"/>
  
  <worldbody>
    <light pos="0 0 2"/>
    <geom type="plane" size="1 1 0.1" rgba="0.9 0.9 0.9 1"/>
    
    <!-- Workbench -->
    <body name="bench" pos="0.5 0 0.4">
      <geom type="box" size="0.3 0.3 0.02" rgba="0.5 0.3 0.1 1" mass="100"/>
    </body>
    
    <!-- Screw -->
    <body name="screw" pos="0.5 0 0.42">
      <joint name="screw_rot" type="hinge" axis="0 0 1" 
             range="0 31.4" damping="0.5"/>
      <geom name="screw_head" type="cylinder" size="0.008 0.003" 
            rgba="0.7 0.7 0.7 1" contype="1" conaffinity="1"/>
      <geom name="screw_slot" type="box" size="0.001 0.006 0.001" 
            pos="0 0 0.003" rgba="0.5 0.5 0.5 1"/>
      <site name="screw_top" pos="0 0 0.003" size="0.002"/>
    </body>
    
    <!-- Robot arm with attached screwdriver -->
    <body name="arm_base" pos="0 0 0.42">
      <joint name="j0" type="hinge" axis="0 0 1" range="-3.14 3.14" damping="2"/>
      <geom type="cylinder" size="0.04 0.03" rgba="0.7 0.7 0.7 1"/>
      
      <body name="l1" pos="0 0 0.06">
        <joint name="j1" type="hinge" axis="0 1 0" range="-1.5 1.5" damping="1.5"/>
        <geom type="capsule" fromto="0 0 0 0.25 0 0" size="0.03" rgba="0.7 0.7 0.7 1"/>
        
        <body name="l2" pos="0.25 0 0">
          <joint name="j2" type="hinge" axis="0 1 0" range="-2 2" damping="1"/>
          <geom type="capsule" fromto="0 0 0 0.2 0 0" size="0.025" rgba="0.7 0.7 0.7 1"/>
          
          <body name="wrist" pos="0.2 0 0">
            <joint name="j3" type="hinge" axis="0 0 1" range="-100 100" damping="0.3"/>
            <joint name="j4" type="hinge" axis="1 0 0" range="-1.57 1.57" damping="0.5"/>
            <site name="ee" pos="0 0 0"/>
            
            <!-- Screwdriver (rigidly attached) -->
            <body name="screwdriver" pos="0 0 -0.01">
              <geom name="sd_handle" type="cylinder" size="0.012 0.05" 
                    rgba="1 0.8 0.1 1" mass="0.08"/>
              <geom name="sd_shaft" type="cylinder" size="0.003 0.06" 
                    pos="0 0 -0.11" rgba="0.7 0.7 0.7 1" mass="0.02"/>
              <geom name="sd_tip" type="box" size="0.001 0.005 0.003" 
                    pos="0 0 -0.173" rgba="0.6 0.6 0.6 1" mass="0.005"
                    contype="1" conaffinity="1" friction="1 0.1 0.01"/>
              <site name="tip" pos="0 0 -0.176" size="0.002"/>
            </body>
          </body>
        </body>
      </body>
    </body>
  </worldbody>
  
  <actuator>
    <position name="a0" joint="j0" kp="200"/>
    <position name="a1" joint="j1" kp="200"/>
    <position name="a2" joint="j2" kp="150"/>
    <velocity name="a3" joint="j3" kv="20"/>
    <position name="a4" joint="j4" kp="80"/>
  </actuator>
  
  <sensor>
    <force name="tip_force" site="tip"/>
    <jointpos name="screw_pos" joint="screw_rot"/>
  </sensor>
</mujoco>
"""


class ScrewdriverEnv:
    """Screwdriver insertion environment."""
    
    def __init__(self):
        self.model = mujoco.MjModel.from_xml_string(SCREWDRIVER_XML)
        self.data = mujoco.MjData(self.model)
        self.max_steps = 500
        self.target_turns = 5
        self.target_angle = self.target_turns * 2 * np.pi
    
    def _get_obs(self):
        joint_pos = self.data.qpos[:5]
        joint_vel = self.data.qvel[:5]
        tip_pos = self.data.site_xpos[1]
        screw_pos = self.data.site_xpos[0]
        screw_angle = self.data.qpos[5]
        rel = tip_pos - screw_pos
        tip_force = self.data.sensor('tip_force').data.copy()
        
        return np.concatenate([
            joint_pos, joint_vel,
            tip_pos, screw_pos, rel,
            [screw_angle / self.target_angle],
            tip_force,
        ])
    
    def compute_reward(self):
        tip_pos = self.data.site_xpos[1]
        screw_pos = self.data.site_xpos[0]
        
        lateral_dist = np.linalg.norm(tip_pos[:2] - screw_pos[:2])
        height_dist = abs(tip_pos[2] - screw_pos[2])
        align_reward = 2.0 * (1.0 - np.tanh(20.0 * lateral_dist))
        
        contact_reward = 0.0
        if lateral_dist < 0.005 and height_dist < 0.005:
            contact_reward = 2.0
        
        screw_angle = self.data.qpos[5]
        progress = screw_angle / self.target_angle
        rotation_reward = 10.0 * progress
        
        if progress >= 0.95:
            complete_reward = 100.0
        else:
            complete_reward = 0.0
        
        force = np.linalg.norm(self.data.sensor('tip_force').data)
        force_penalty = -0.1 * max(0, force - 10.0)
        
        return (align_reward + contact_reward + 
                rotation_reward + complete_reward + force_penalty)

Tool Transfer: One Tool to Many

One of the most exciting goals is transferring skills between similar tools:

class ToolTransferTraining:
    """Train policy transfer between tools."""
    
    def __init__(self, source_tool, target_tools):
        self.source_tool = source_tool
        self.target_tools = target_tools
    
    def train_with_tool_embedding(self):
        """Use tool embedding to generalize."""
        
        tool_embeddings = {
            'screwdriver_flat': np.array([0.18, 0.005, 0.15, 1, 0]),
            'screwdriver_phillips': np.array([0.18, 0.005, 0.15, 0, 1]),
            'allen_key': np.array([0.12, 0.003, 0.08, 0, 0]),
            'hex_driver': np.array([0.15, 0.004, 0.12, 0, 0]),
        }
        # [length, tip_radius, mass, is_flat, is_phillips]
        
        return tool_embeddings
Source Tool Target Tool Success (no transfer) Success (with transfer)
Flat screwdriver Phillips screwdriver 15% 72%
Flat screwdriver Allen key 8% 58%
Hammer Mallet 22% 81%
Spatula Paint scraper 12% 65%

Tool variety and transfer

Transfer learning reduces training time by up to 5x for tools in the same family.

References

  1. DexMV: Imitation Learning for Dexterous Manipulation from Human Videos — Qin et al., 2022
  2. RoboTool: Creative Robot Tool Use with Large Language Models — Xu et al., 2023
  3. Tool Use and Understanding in Robotic Manipulation — Survey, 2022

Next in the Series

The final post — Multi-Step Manipulation: Curriculum Learning for Long-Horizon Tasks — tackles the hardest problem: chains of multiple sequential manipulation tasks (10+ steps), using hierarchical RL and automatic curriculum.

Related Posts

Related Posts

TutorialSim-to-Real cho Humanoid: Deployment Best Practices
sim2realhumanoiddeploymentrlunitreePart 10

Sim-to-Real cho Humanoid: Deployment Best Practices

Pipeline hoàn chỉnh deploy RL locomotion policy lên robot humanoid thật — domain randomization, system identification, safety, và Unitree SDK.

9/4/202611 min read
ResearchUnifoLM-VLA-0: Mô hình VLA cho Manipulation trên Unitree G1
vlaunitreeg1manipulationhumanoid

UnifoLM-VLA-0: Mô hình VLA cho Manipulation trên Unitree G1

Phân tích và hướng dẫn triển khai UnifoLM-VLA-0 — mô hình VLA open-source đầu tiên chạy trực tiếp trên G1 humanoid

8/4/202623 min read
ResearchWholeBodyVLA: VLA Toàn Thân cho Humanoid Loco-Manipulation
vlahumanoidloco-manipulationiclrrl

WholeBodyVLA: VLA Toàn Thân cho Humanoid Loco-Manipulation

ICLR 2026 — học manipulation từ egocentric video, kết hợp VLA + RL cho locomotion-aware control

7/4/202613 min read