Tool Use: Teaching Robots to Use Tools with RL

Tool use is what distinguishes humans from most other animals. We use hammers to drive nails, knives to cut, screwdrivers to turn screws — each tool is an "extension" of our hands. Teaching robots to do this is the frontier of manipulation research, because it requires the robot to understand both how to grasp the tool (grasp affordance) and how to use it (functional affordance).

After mastering contact-rich manipulation, we enter the world of tool use — where robots do not just interact with objects, but interact with the world through objects.

Tool Use: Why Is It Hard?

Challenge	Description
Dual object	Must manage both tool and target object
Affordance	Know to grasp hammer handle, not the head
Extended kinematics	Tool changes workspace and dynamics
Multi-phase	Grasp tool, position, use, release
Force transmission	Force transmitted through tool is complex
Tool variety	Each tool has unique dynamics

Robot using tools

Affordance Learning

Affordance answers the question "which part of the tool should I grasp, and which part do I use?" For example:

Hammer: Grasp the handle, strike with the head
Screwdriver: Grasp the handle, rotate the tip into the screw
Spatula: Grasp the handle, slide the flat surface under food

import numpy as np

class ToolAffordance:
    """Define affordances for different tool types."""
    
    def __init__(self):
        self.tools = {
            'hammer': {
                'grasp_region': {'center': [0, 0, 0.1], 'radius': 0.03},
                'functional_region': {'center': [0, 0, -0.05], 'radius': 0.02},
                'grasp_orientation': 'perpendicular',
                'use_direction': [0, 0, -1],
                'mass': 0.5,
                'length': 0.25,
            },
            'screwdriver': {
                'grasp_region': {'center': [0, 0, 0.06], 'radius': 0.02},
                'functional_region': {'center': [0, 0, -0.08], 'radius': 0.005},
                'grasp_orientation': 'parallel',
                'use_direction': [0, 0, -1],
                'use_rotation': True,
                'mass': 0.15,
                'length': 0.18,
            },
            'spatula': {
                'grasp_region': {'center': [0, 0, 0.12], 'radius': 0.015},
                'functional_region': {'center': [0.04, 0, -0.01], 'radius': 0.04},
                'grasp_orientation': 'perpendicular',
                'use_direction': [1, 0, 0],
                'mass': 0.1,
                'length': 0.3,
            },
        }
    
    def get_grasp_reward(self, tool_name, grasp_pos, tool_frame):
        """Reward for grasping at the correct position."""
        tool = self.tools[tool_name]
        grasp_center = np.array(tool['grasp_region']['center'])
        
        grasp_world = tool_frame @ np.append(grasp_center, 1)
        
        dist = np.linalg.norm(grasp_pos - grasp_world[:3])
        radius = tool['grasp_region']['radius']
        
        if dist < radius:
            return 1.0
        else:
            return max(0, 1.0 - (dist - radius) / 0.05)
    
    def get_functional_alignment(self, tool_name, tool_tip_pos, 
                                  target_pos, tool_orientation):
        """Check if tool is correctly oriented for use."""
        tool = self.tools[tool_name]
        use_dir = np.array(tool['use_direction'])
        
        dist = np.linalg.norm(tool_tip_pos - target_pos)
        return 1.0 - np.tanh(5.0 * dist)

Two-Phase Learning: Grasp Tool, Then Use Tool

The most effective strategy for tool use is to split into 2 phases and train separately:

Phase 1: Tool Grasping Policy

class ToolGraspEnv:
    """Environment for learning to grasp tools correctly."""
    
    def __init__(self, tool_name="screwdriver"):
        self.tool_name = tool_name
        self.affordance = ToolAffordance()
    
    def compute_reward(self, gripper_pos, tool_qpos, tool_contacts):
        """Reward for grasping tool at correct position and orientation."""
        
        # 1. Reach grasp region
        grasp_reward = self.affordance.get_grasp_reward(
            self.tool_name, gripper_pos, self.tool_frame
        )
        
        # 2. Stable grasp — lift without dropping
        if self._is_grasped():
            lift_height = tool_qpos[2] - self.tool_init_height
            stable_reward = min(lift_height / 0.1, 1.0)
        else:
            stable_reward = 0.0
        
        return grasp_reward + 2.0 * stable_reward

Phase 2: Tool Use Policy

class ToolUseEnv:
    """Environment for learning to use a grasped tool."""
    
    def __init__(self, tool_name="screwdriver"):
        self.tool_name = tool_name
        # Tool is welded to gripper (skip grasping)
    
    def compute_reward(self, tool_tip_pos, target_pos,
                       force_on_target, task_progress):
        """Reward for using the tool."""
        
        # 1. Alignment — bring tool to target
        alignment = 1.0 - np.tanh(10.0 * np.linalg.norm(
            tool_tip_pos - target_pos
        ))
        
        # 2. Force application — apply force in correct direction
        if np.linalg.norm(tool_tip_pos - target_pos) < 0.01:
            force_reward = np.tanh(force_on_target / 5.0)
        else:
            force_reward = 0.0
        
        # 3. Task progress
        progress_reward = 10.0 * task_progress
        
        # 4. Completion
        if task_progress >= 0.95:
            completion = 50.0
        else:
            completion = 0.0
        
        return alignment + 3.0 * force_reward + progress_reward + completion

Screwdriver Task in MuJoCo

Here is a complete example — a robot uses a screwdriver to turn a screw:

import mujoco

SCREWDRIVER_XML = """
<mujoco model="screwdriver_task">
  <option timestep="0.002" gravity="0 0 -9.81"/>
  
  <worldbody>
    <light pos="0 0 2"/>
    <geom type="plane" size="1 1 0.1" rgba="0.9 0.9 0.9 1"/>
    
    <!-- Workbench -->
    <body name="bench" pos="0.5 0 0.4">
      <geom type="box" size="0.3 0.3 0.02" rgba="0.5 0.3 0.1 1" mass="100"/>
    </body>
    
    <!-- Screw -->
    <body name="screw" pos="0.5 0 0.42">
      <joint name="screw_rot" type="hinge" axis="0 0 1" 
             range="0 31.4" damping="0.5"/>
      <geom name="screw_head" type="cylinder" size="0.008 0.003" 
            rgba="0.7 0.7 0.7 1" contype="1" conaffinity="1"/>
      <geom name="screw_slot" type="box" size="0.001 0.006 0.001" 
            pos="0 0 0.003" rgba="0.5 0.5 0.5 1"/>
      <site name="screw_top" pos="0 0 0.003" size="0.002"/>
    </body>
    
    <!-- Robot arm with attached screwdriver -->
    <body name="arm_base" pos="0 0 0.42">
      <joint name="j0" type="hinge" axis="0 0 1" range="-3.14 3.14" damping="2"/>
      <geom type="cylinder" size="0.04 0.03" rgba="0.7 0.7 0.7 1"/>
      
      <body name="l1" pos="0 0 0.06">
        <joint name="j1" type="hinge" axis="0 1 0" range="-1.5 1.5" damping="1.5"/>
        <geom type="capsule" fromto="0 0 0 0.25 0 0" size="0.03" rgba="0.7 0.7 0.7 1"/>
        
        <body name="l2" pos="0.25 0 0">
          <joint name="j2" type="hinge" axis="0 1 0" range="-2 2" damping="1"/>
          <geom type="capsule" fromto="0 0 0 0.2 0 0" size="0.025" rgba="0.7 0.7 0.7 1"/>
          
          <body name="wrist" pos="0.2 0 0">
            <joint name="j3" type="hinge" axis="0 0 1" range="-100 100" damping="0.3"/>
            <joint name="j4" type="hinge" axis="1 0 0" range="-1.57 1.57" damping="0.5"/>
            <site name="ee" pos="0 0 0"/>
            
            <!-- Screwdriver (rigidly attached) -->
            <body name="screwdriver" pos="0 0 -0.01">
              <geom name="sd_handle" type="cylinder" size="0.012 0.05" 
                    rgba="1 0.8 0.1 1" mass="0.08"/>
              <geom name="sd_shaft" type="cylinder" size="0.003 0.06" 
                    pos="0 0 -0.11" rgba="0.7 0.7 0.7 1" mass="0.02"/>
              <geom name="sd_tip" type="box" size="0.001 0.005 0.003" 
                    pos="0 0 -0.173" rgba="0.6 0.6 0.6 1" mass="0.005"
                    contype="1" conaffinity="1" friction="1 0.1 0.01"/>
              <site name="tip" pos="0 0 -0.176" size="0.002"/>
            </body>
          </body>
        </body>
      </body>
    </body>
  </worldbody>
  
  <actuator>
    <position name="a0" joint="j0" kp="200"/>
    <position name="a1" joint="j1" kp="200"/>
    <position name="a2" joint="j2" kp="150"/>
    <velocity name="a3" joint="j3" kv="20"/>
    <position name="a4" joint="j4" kp="80"/>
  </actuator>
  
  <sensor>
    <force name="tip_force" site="tip"/>
    <jointpos name="screw_pos" joint="screw_rot"/>
  </sensor>
</mujoco>
"""


class ScrewdriverEnv:
    """Screwdriver insertion environment."""
    
    def __init__(self):
        self.model = mujoco.MjModel.from_xml_string(SCREWDRIVER_XML)
        self.data = mujoco.MjData(self.model)
        self.max_steps = 500
        self.target_turns = 5
        self.target_angle = self.target_turns * 2 * np.pi
    
    def _get_obs(self):
        joint_pos = self.data.qpos[:5]
        joint_vel = self.data.qvel[:5]
        tip_pos = self.data.site_xpos[1]
        screw_pos = self.data.site_xpos[0]
        screw_angle = self.data.qpos[5]
        rel = tip_pos - screw_pos
        tip_force = self.data.sensor('tip_force').data.copy()
        
        return np.concatenate([
            joint_pos, joint_vel,
            tip_pos, screw_pos, rel,
            [screw_angle / self.target_angle],
            tip_force,
        ])
    
    def compute_reward(self):
        tip_pos = self.data.site_xpos[1]
        screw_pos = self.data.site_xpos[0]
        
        lateral_dist = np.linalg.norm(tip_pos[:2] - screw_pos[:2])
        height_dist = abs(tip_pos[2] - screw_pos[2])
        align_reward = 2.0 * (1.0 - np.tanh(20.0 * lateral_dist))
        
        contact_reward = 0.0
        if lateral_dist < 0.005 and height_dist < 0.005:
            contact_reward = 2.0
        
        screw_angle = self.data.qpos[5]
        progress = screw_angle / self.target_angle
        rotation_reward = 10.0 * progress
        
        if progress >= 0.95:
            complete_reward = 100.0
        else:
            complete_reward = 0.0
        
        force = np.linalg.norm(self.data.sensor('tip_force').data)
        force_penalty = -0.1 * max(0, force - 10.0)
        
        return (align_reward + contact_reward + 
                rotation_reward + complete_reward + force_penalty)

Tool Transfer: One Tool to Many

One of the most exciting goals is transferring skills between similar tools:

class ToolTransferTraining:
    """Train policy transfer between tools."""
    
    def __init__(self, source_tool, target_tools):
        self.source_tool = source_tool
        self.target_tools = target_tools
    
    def train_with_tool_embedding(self):
        """Use tool embedding to generalize."""
        
        tool_embeddings = {
            'screwdriver_flat': np.array([0.18, 0.005, 0.15, 1, 0]),
            'screwdriver_phillips': np.array([0.18, 0.005, 0.15, 0, 1]),
            'allen_key': np.array([0.12, 0.003, 0.08, 0, 0]),
            'hex_driver': np.array([0.15, 0.004, 0.12, 0, 0]),
        }
        # [length, tip_radius, mass, is_flat, is_phillips]
        
        return tool_embeddings

Source Tool	Target Tool	Success (no transfer)	Success (with transfer)
Flat screwdriver	Phillips screwdriver	15%	72%
Flat screwdriver	Allen key	8%	58%
Hammer	Mallet	22%	81%
Spatula	Paint scraper	12%	65%

Tool variety and transfer

Transfer learning reduces training time by up to 5x for tools in the same family.

References

DexMV: Imitation Learning for Dexterous Manipulation from Human Videos — Qin et al., 2022
RoboTool: Creative Robot Tool Use with Large Language Models — Xu et al., 2023
Tool Use and Understanding in Robotic Manipulation — Survey, 2022

Next in the Series

The final post — Multi-Step Manipulation: Curriculum Learning for Long-Horizon Tasks — tackles the hardest problem: chains of multiple sequential manipulation tasks (10+ steps), using hierarchical RL and automatic curriculum.

After mastering contact-rich manipulation, we enter the world of tool use — where robots do not just interact with objects, but interact with the world through objects.

Tool Use: Why Is It Hard?

Challenge	Description
Dual object	Must manage both tool and target object
Affordance	Know to grasp hammer handle, not the head
Extended kinematics	Tool changes workspace and dynamics
Multi-phase	Grasp tool, position, use, release
Force transmission	Force transmitted through tool is complex
Tool variety	Each tool has unique dynamics

Robot using tools

Affordance Learning

Affordance answers the question "which part of the tool should I grasp, and which part do I use?" For example:

Hammer: Grasp the handle, strike with the head
Screwdriver: Grasp the handle, rotate the tip into the screw
Spatula: Grasp the handle, slide the flat surface under food

import numpy as np

class ToolAffordance:
    """Define affordances for different tool types."""
    
    def __init__(self):
        self.tools = {
            'hammer': {
                'grasp_region': {'center': [0, 0, 0.1], 'radius': 0.03},
                'functional_region': {'center': [0, 0, -0.05], 'radius': 0.02},
                'grasp_orientation': 'perpendicular',
                'use_direction': [0, 0, -1],
                'mass': 0.5,
                'length': 0.25,
            },
            'screwdriver': {
                'grasp_region': {'center': [0, 0, 0.06], 'radius': 0.02},
                'functional_region': {'center': [0, 0, -0.08], 'radius': 0.005},
                'grasp_orientation': 'parallel',
                'use_direction': [0, 0, -1],
                'use_rotation': True,
                'mass': 0.15,
                'length': 0.18,
            },
            'spatula': {
                'grasp_region': {'center': [0, 0, 0.12], 'radius': 0.015},
                'functional_region': {'center': [0.04, 0, -0.01], 'radius': 0.04},
                'grasp_orientation': 'perpendicular',
                'use_direction': [1, 0, 0],
                'mass': 0.1,
                'length': 0.3,
            },
        }
    
    def get_grasp_reward(self, tool_name, grasp_pos, tool_frame):
        """Reward for grasping at the correct position."""
        tool = self.tools[tool_name]
        grasp_center = np.array(tool['grasp_region']['center'])
        
        grasp_world = tool_frame @ np.append(grasp_center, 1)
        
        dist = np.linalg.norm(grasp_pos - grasp_world[:3])
        radius = tool['grasp_region']['radius']
        
        if dist < radius:
            return 1.0
        else:
            return max(0, 1.0 - (dist - radius) / 0.05)
    
    def get_functional_alignment(self, tool_name, tool_tip_pos, 
                                  target_pos, tool_orientation):
        """Check if tool is correctly oriented for use."""
        tool = self.tools[tool_name]
        use_dir = np.array(tool['use_direction'])
        
        dist = np.linalg.norm(tool_tip_pos - target_pos)
        return 1.0 - np.tanh(5.0 * dist)

Two-Phase Learning: Grasp Tool, Then Use Tool

The most effective strategy for tool use is to split into 2 phases and train separately:

Phase 1: Tool Grasping Policy

class ToolGraspEnv:
    """Environment for learning to grasp tools correctly."""
    
    def __init__(self, tool_name="screwdriver"):
        self.tool_name = tool_name
        self.affordance = ToolAffordance()
    
    def compute_reward(self, gripper_pos, tool_qpos, tool_contacts):
        """Reward for grasping tool at correct position and orientation."""
        
        # 1. Reach grasp region
        grasp_reward = self.affordance.get_grasp_reward(
            self.tool_name, gripper_pos, self.tool_frame
        )
        
        # 2. Stable grasp — lift without dropping
        if self._is_grasped():
            lift_height = tool_qpos[2] - self.tool_init_height
            stable_reward = min(lift_height / 0.1, 1.0)
        else:
            stable_reward = 0.0
        
        return grasp_reward + 2.0 * stable_reward

Phase 2: Tool Use Policy

class ToolUseEnv:
    """Environment for learning to use a grasped tool."""
    
    def __init__(self, tool_name="screwdriver"):
        self.tool_name = tool_name
        # Tool is welded to gripper (skip grasping)
    
    def compute_reward(self, tool_tip_pos, target_pos,
                       force_on_target, task_progress):
        """Reward for using the tool."""
        
        # 1. Alignment — bring tool to target
        alignment = 1.0 - np.tanh(10.0 * np.linalg.norm(
            tool_tip_pos - target_pos
        ))
        
        # 2. Force application — apply force in correct direction
        if np.linalg.norm(tool_tip_pos - target_pos) < 0.01:
            force_reward = np.tanh(force_on_target / 5.0)
        else:
            force_reward = 0.0
        
        # 3. Task progress
        progress_reward = 10.0 * task_progress
        
        # 4. Completion
        if task_progress >= 0.95:
            completion = 50.0
        else:
            completion = 0.0
        
        return alignment + 3.0 * force_reward + progress_reward + completion

Screwdriver Task in MuJoCo

Here is a complete example — a robot uses a screwdriver to turn a screw:

import mujoco

SCREWDRIVER_XML = """
<mujoco model="screwdriver_task">
  <option timestep="0.002" gravity="0 0 -9.81"/>
  
  <worldbody>
    <light pos="0 0 2"/>
    <geom type="plane" size="1 1 0.1" rgba="0.9 0.9 0.9 1"/>
    
    <!-- Workbench -->
    <body name="bench" pos="0.5 0 0.4">
      <geom type="box" size="0.3 0.3 0.02" rgba="0.5 0.3 0.1 1" mass="100"/>
    </body>
    
    <!-- Screw -->
    <body name="screw" pos="0.5 0 0.42">
      <joint name="screw_rot" type="hinge" axis="0 0 1" 
             range="0 31.4" damping="0.5"/>
      <geom name="screw_head" type="cylinder" size="0.008 0.003" 
            rgba="0.7 0.7 0.7 1" contype="1" conaffinity="1"/>
      <geom name="screw_slot" type="box" size="0.001 0.006 0.001" 
            pos="0 0 0.003" rgba="0.5 0.5 0.5 1"/>
      <site name="screw_top" pos="0 0 0.003" size="0.002"/>
    </body>
    
    <!-- Robot arm with attached screwdriver -->
    <body name="arm_base" pos="0 0 0.42">
      <joint name="j0" type="hinge" axis="0 0 1" range="-3.14 3.14" damping="2"/>
      <geom type="cylinder" size="0.04 0.03" rgba="0.7 0.7 0.7 1"/>
      
      <body name="l1" pos="0 0 0.06">
        <joint name="j1" type="hinge" axis="0 1 0" range="-1.5 1.5" damping="1.5"/>
        <geom type="capsule" fromto="0 0 0 0.25 0 0" size="0.03" rgba="0.7 0.7 0.7 1"/>
        
        <body name="l2" pos="0.25 0 0">
          <joint name="j2" type="hinge" axis="0 1 0" range="-2 2" damping="1"/>
          <geom type="capsule" fromto="0 0 0 0.2 0 0" size="0.025" rgba="0.7 0.7 0.7 1"/>
          
          <body name="wrist" pos="0.2 0 0">
            <joint name="j3" type="hinge" axis="0 0 1" range="-100 100" damping="0.3"/>
            <joint name="j4" type="hinge" axis="1 0 0" range="-1.57 1.57" damping="0.5"/>
            <site name="ee" pos="0 0 0"/>
            
            <!-- Screwdriver (rigidly attached) -->
            <body name="screwdriver" pos="0 0 -0.01">
              <geom name="sd_handle" type="cylinder" size="0.012 0.05" 
                    rgba="1 0.8 0.1 1" mass="0.08"/>
              <geom name="sd_shaft" type="cylinder" size="0.003 0.06" 
                    pos="0 0 -0.11" rgba="0.7 0.7 0.7 1" mass="0.02"/>
              <geom name="sd_tip" type="box" size="0.001 0.005 0.003" 
                    pos="0 0 -0.173" rgba="0.6 0.6 0.6 1" mass="0.005"
                    contype="1" conaffinity="1" friction="1 0.1 0.01"/>
              <site name="tip" pos="0 0 -0.176" size="0.002"/>
            </body>
          </body>
        </body>
      </body>
    </body>
  </worldbody>
  
  <actuator>
    <position name="a0" joint="j0" kp="200"/>
    <position name="a1" joint="j1" kp="200"/>
    <position name="a2" joint="j2" kp="150"/>
    <velocity name="a3" joint="j3" kv="20"/>
    <position name="a4" joint="j4" kp="80"/>
  </actuator>
  
  <sensor>
    <force name="tip_force" site="tip"/>
    <jointpos name="screw_pos" joint="screw_rot"/>
  </sensor>
</mujoco>
"""


class ScrewdriverEnv:
    """Screwdriver insertion environment."""
    
    def __init__(self):
        self.model = mujoco.MjModel.from_xml_string(SCREWDRIVER_XML)
        self.data = mujoco.MjData(self.model)
        self.max_steps = 500
        self.target_turns = 5
        self.target_angle = self.target_turns * 2 * np.pi
    
    def _get_obs(self):
        joint_pos = self.data.qpos[:5]
        joint_vel = self.data.qvel[:5]
        tip_pos = self.data.site_xpos[1]
        screw_pos = self.data.site_xpos[0]
        screw_angle = self.data.qpos[5]
        rel = tip_pos - screw_pos
        tip_force = self.data.sensor('tip_force').data.copy()
        
        return np.concatenate([
            joint_pos, joint_vel,
            tip_pos, screw_pos, rel,
            [screw_angle / self.target_angle],
            tip_force,
        ])
    
    def compute_reward(self):
        tip_pos = self.data.site_xpos[1]
        screw_pos = self.data.site_xpos[0]
        
        lateral_dist = np.linalg.norm(tip_pos[:2] - screw_pos[:2])
        height_dist = abs(tip_pos[2] - screw_pos[2])
        align_reward = 2.0 * (1.0 - np.tanh(20.0 * lateral_dist))
        
        contact_reward = 0.0
        if lateral_dist < 0.005 and height_dist < 0.005:
            contact_reward = 2.0
        
        screw_angle = self.data.qpos[5]
        progress = screw_angle / self.target_angle
        rotation_reward = 10.0 * progress
        
        if progress >= 0.95:
            complete_reward = 100.0
        else:
            complete_reward = 0.0
        
        force = np.linalg.norm(self.data.sensor('tip_force').data)
        force_penalty = -0.1 * max(0, force - 10.0)
        
        return (align_reward + contact_reward + 
                rotation_reward + complete_reward + force_penalty)

Tool Transfer: One Tool to Many

One of the most exciting goals is transferring skills between similar tools:

class ToolTransferTraining:
    """Train policy transfer between tools."""
    
    def __init__(self, source_tool, target_tools):
        self.source_tool = source_tool
        self.target_tools = target_tools
    
    def train_with_tool_embedding(self):
        """Use tool embedding to generalize."""
        
        tool_embeddings = {
            'screwdriver_flat': np.array([0.18, 0.005, 0.15, 1, 0]),
            'screwdriver_phillips': np.array([0.18, 0.005, 0.15, 0, 1]),
            'allen_key': np.array([0.12, 0.003, 0.08, 0, 0]),
            'hex_driver': np.array([0.15, 0.004, 0.12, 0, 0]),
        }
        # [length, tip_radius, mass, is_flat, is_phillips]
        
        return tool_embeddings

Source Tool	Target Tool	Success (no transfer)	Success (with transfer)
Flat screwdriver	Phillips screwdriver	15%	72%
Flat screwdriver	Allen key	8%	58%
Hammer	Mallet	22%	81%
Spatula	Paint scraper	12%	65%

Tool variety and transfer

Transfer learning reduces training time by up to 5x for tools in the same family.

References

DexMV: Imitation Learning for Dexterous Manipulation from Human Videos — Qin et al., 2022
RoboTool: Creative Robot Tool Use with Large Language Models — Xu et al., 2023
Tool Use and Understanding in Robotic Manipulation — Survey, 2022

Tool Use: Teaching Robots to Use Tools with RL

Tool Use: Why Is It Hard?

Affordance Learning

Two-Phase Learning: Grasp Tool, Then Use Tool

Phase 1: Tool Grasping Policy

Phase 2: Tool Use Policy

Screwdriver Task in MuJoCo

Tool Transfer: One Tool to Many

References

Next in the Series

Nguyễn Anh Tuấn

Related Posts

Multi-Step Manipulation: Curriculum Learning cho Long-Horizon

Bê và vận chuyển vật thể: Ổn định trong di chuyển

Pick-and-Place chính xác: Position và Orientation Control

Tool Use: Teaching Robots to Use Tools with RL

Tool Use: Why Is It Hard?

Affordance Learning

Two-Phase Learning: Grasp Tool, Then Use Tool

Phase 1: Tool Grasping Policy

Phase 2: Tool Use Policy

Screwdriver Task in MuJoCo

Tool Transfer: One Tool to Many

References

Next in the Series

Nguyễn Anh Tuấn

Related Posts

Multi-Step Manipulation: Curriculum Learning cho Long-Horizon

Bê và vận chuyển vật thể: Ổn định trong di chuyển

Pick-and-Place chính xác: Position và Orientation Control