Carrying & Transporting Objects: Stability During Motion

Have you ever seen a robot carrying a food tray in a restaurant? It looks effortless, but behind the scenes lies an incredibly complex problem: the robot must move quickly (the customer is waiting), keep the items stable (no spilled soup), and avoid obstacles (tables, chairs, people). This is the carrying and transporting problem — and RL is a powerful tool to solve it.

After mastering grasping, force control, and precision placement, we now combine everything to solve real-world object transport.

Classifying Carrying Tasks

Task	Description	Difficulty	Key Challenge
Carry Flat	Carry flat objects (box, book)	Low	Prevent dropping
Carry Upright	Carry upright objects (bottle, cup)	Medium	Orientation stability
Carry on Tray	Carry tray with multiple items	High	Multi-object balance
Carry Through Obstacles	Carry through obstacle course	High	Planning + control
Pour	Pour liquid	Very high	Flow rate + tilt control

Multi-Objective Reward: Speed vs Stability

Carrying tasks are a textbook example of multi-objective optimization — you want speed but also stability, and these objectives often conflict.

Robot carrying objects in environment

import numpy as np

class CarryingReward:
    """Multi-objective reward for carrying tasks."""
    
    def __init__(self, task_type="carry_upright",
                 speed_weight=1.0, stability_weight=2.0):
        self.task_type = task_type
        self.speed_weight = speed_weight
        self.stability_weight = stability_weight
        
        self.max_tilt = np.radians(10)
        self.max_accel = 2.0
        self.spill_tilt = np.radians(25)
    
    def compute(self, state):
        """
        state: dict containing all necessary information
            - obj_pos, obj_quat, obj_vel, obj_angvel
            - ee_pos, ee_vel, ee_accel
            - goal_pos, goal_dist
            - grasping: bool
            - obstacles_nearby: list of distances
        """
        if not state['grasping']:
            return -100.0, {'dropped': True}
        
        rewards = {}
        
        # ---- STABILITY REWARDS ----
        
        # 1. Tilt penalty — keep object upright
        tilt = self._compute_tilt(state['obj_quat'])
        if tilt > self.spill_tilt:
            rewards['tilt'] = -50.0
        else:
            rewards['tilt'] = -3.0 * (tilt / self.max_tilt) ** 2
        
        # 2. Angular velocity — no spinning
        ang_vel = np.linalg.norm(state['obj_angvel'])
        rewards['angular_vel'] = -1.5 * np.tanh(2.0 * ang_vel)
        
        # 3. Acceleration penalty — smooth motion
        accel = np.linalg.norm(state['ee_accel'])
        rewards['accel'] = -0.5 * np.tanh(accel / self.max_accel)
        
        # 4. Jerk penalty
        if 'ee_jerk' in state:
            jerk = np.linalg.norm(state['ee_jerk'])
            rewards['jerk'] = -0.3 * np.tanh(jerk / 5.0)
        
        # ---- PROGRESS REWARDS ----
        
        # 5. Distance to goal
        goal_dist = state['goal_dist']
        rewards['progress'] = 2.0 * (1.0 - np.tanh(2.0 * goal_dist))
        
        # 6. Speed reward (when far from goal)
        speed = np.linalg.norm(state['ee_vel'])
        if goal_dist > 0.3:
            desired_speed = min(0.5, goal_dist)
            speed_error = abs(speed - desired_speed)
            rewards['speed'] = 0.5 * (1.0 - np.tanh(3.0 * speed_error))
        else:
            rewards['speed'] = -2.0 * speed
        
        # 7. Obstacle avoidance
        if state.get('obstacles_nearby'):
            min_obstacle_dist = min(state['obstacles_nearby'])
            if min_obstacle_dist < 0.1:
                rewards['obstacle'] = -5.0 * (1.0 - min_obstacle_dist / 0.1)
            else:
                rewards['obstacle'] = 0.0
        
        # 8. Success bonus
        if goal_dist < 0.05 and tilt < self.max_tilt:
            rewards['success'] = 30.0
        else:
            rewards['success'] = 0.0
        
        stability = (self.stability_weight * 
                     (rewards['tilt'] + rewards['angular_vel'] + 
                      rewards['accel']))
        progress = self.speed_weight * (rewards['progress'] + rewards['speed'])
        total = stability + progress + rewards['success']
        total += rewards.get('obstacle', 0) + rewards.get('jerk', 0)
        
        rewards['total'] = total
        return total, rewards
    
    def _compute_tilt(self, quat):
        """Compute tilt angle from quaternion."""
        w, x, y, z = quat
        up_z = 1 - 2*(x*x + y*y)
        return np.arccos(np.clip(up_z, -1, 1))

Pareto Front: Speed-Stability Trade-off

There is no single "optimal" reward function for carrying — instead, there is an entire Pareto front of trade-offs:

def train_pareto_front(n_points=5):
    """Train multiple policies with different weight ratios."""
    results = []
    
    for speed_w in np.linspace(0.5, 3.0, n_points):
        stability_w = 3.5 - speed_w
        
        env = CarryingEnv(
            speed_weight=speed_w,
            stability_weight=stability_w
        )
        
        model = SAC("MlpPolicy", env, ...)
        model.learn(total_timesteps=2_000_000)
        
        metrics = evaluate_carrying(model, env)
        results.append({
            'speed_weight': speed_w,
            'avg_time': metrics['avg_time'],
            'spill_rate': metrics['spill_rate'],
            'drop_rate': metrics['drop_rate'],
        })
    
    return results

Speed Weight	Stability Weight	Avg Time (s)	Spill Rate	Drop Rate
0.5	3.0	12.1	1%	0%
1.0	2.5	8.4	3%	1%
1.5	2.0	6.2	7%	2%
2.0	1.5	4.8	15%	4%
3.0	0.5	3.1	35%	12%

Task: Carry Cup Through Obstacle Course

import mujoco
import gymnasium as gym
from gymnasium import spaces

CARRY_OBSTACLE_XML = """
<mujoco model="carry_obstacles">
  <option timestep="0.002" gravity="0 0 -9.81"/>
  
  <worldbody>
    <light pos="0 0 3" dir="0 0 -1"/>
    <geom type="plane" size="3 3 0.1" rgba="0.85 0.85 0.85 1"/>
    
    <!-- Tables -->
    <body name="table_start" pos="0 0 0.35">
      <geom type="box" size="0.3 0.3 0.02" rgba="0.5 0.3 0.1 1" mass="100"/>
    </body>
    <body name="table_end" pos="1.5 0.8 0.35">
      <geom type="box" size="0.3 0.3 0.02" rgba="0.5 0.3 0.1 1" mass="100"/>
    </body>
    
    <!-- Obstacles -->
    <body name="obs1" pos="0.5 0.2 0.3">
      <geom type="box" size="0.05 0.3 0.3" rgba="0.8 0.2 0.2 0.7"/>
    </body>
    <body name="obs2" pos="1.0 0.5 0.3">
      <geom type="cylinder" size="0.08 0.3" rgba="0.8 0.2 0.2 0.7"/>
    </body>
    
    <!-- Mobile base + arm (simplified) -->
    <body name="mobile_base" pos="0 0 0.15">
      <freejoint name="base_free"/>
      <geom type="box" size="0.15 0.1 0.05" mass="10" rgba="0.3 0.3 0.8 1"/>
      
      <body name="arm_base" pos="0 0 0.05">
        <joint name="j0" type="hinge" axis="0 0 1" range="-3.14 3.14" damping="2"/>
        <geom type="cylinder" size="0.03 0.03" rgba="0.5 0.5 0.5 1"/>
        
        <body name="arm1" pos="0 0 0.06">
          <joint name="j1" type="hinge" axis="0 1 0" range="-1.5 1.5" damping="1.5"/>
          <geom type="capsule" fromto="0 0 0 0.2 0 0" size="0.025" rgba="0.6 0.6 0.6 1"/>
          
          <body name="arm2" pos="0.2 0 0">
            <joint name="j2" type="hinge" axis="0 1 0" range="-2 2" damping="1"/>
            <geom type="capsule" fromto="0 0 0 0.15 0 0" size="0.02" rgba="0.6 0.6 0.6 1"/>
            
            <body name="gripper_body" pos="0.15 0 0">
              <joint name="j3" type="hinge" axis="1 0 0" range="-1.57 1.57" damping="0.5"/>
              <site name="ee" pos="0 0 0"/>
              
              <body name="fl" pos="0 0.015 0">
                <joint name="jfl" type="slide" axis="0 1 0" range="0 0.03" damping="3"/>
                <geom type="box" size="0.008 0.003 0.03" rgba="0.8 0.3 0.3 1"
                      friction="2 0.5 0.01" contype="1" conaffinity="1"/>
              </body>
              <body name="fr" pos="0 -0.015 0">
                <joint name="jfr" type="slide" axis="0 -1 0" range="0 0.03" damping="3"/>
                <geom type="box" size="0.008 0.003 0.03" rgba="0.8 0.3 0.3 1"
                      friction="2 0.5 0.01" contype="1" conaffinity="1"/>
              </body>
            </body>
          </body>
        </body>
      </body>
    </body>
    
    <!-- Cup with liquid -->
    <body name="cup" pos="0 0 0.4">
      <freejoint name="cup_free"/>
      <geom type="cylinder" size="0.025 0.04" rgba="0.9 0.9 1 0.8" mass="0.1"
            contype="1" conaffinity="1"/>
      <body name="liquid" pos="0 0 0.02">
        <joint name="liq_x" type="slide" axis="1 0 0" range="-0.015 0.015" damping="3"/>
        <joint name="liq_y" type="slide" axis="0 1 0" range="-0.015 0.015" damping="3"/>
        <geom type="sphere" size="0.015" rgba="0.2 0.5 1 0.5" mass="0.15"
              contype="0" conaffinity="0"/>
      </body>
    </body>
    
    <!-- Goal marker -->
    <body name="goal" pos="1.5 0.8 0.45">
      <geom type="sphere" size="0.03" rgba="0 1 0 0.3" contype="0" conaffinity="0"/>
    </body>
  </worldbody>
  
  <actuator>
    <motor name="base_fwd" joint="base_free" gear="0 0 0 1 0 0" ctrlrange="-5 5"/>
    <motor name="base_strafe" joint="base_free" gear="0 0 0 0 1 0" ctrlrange="-3 3"/>
    <motor name="base_rot" joint="base_free" gear="0 0 0 0 0 1" ctrlrange="-2 2"/>
    <position name="a0" joint="j0" kp="100"/>
    <position name="a1" joint="j1" kp="100"/>
    <position name="a2" joint="j2" kp="80"/>
    <position name="a3" joint="j3" kp="50"/>
    <position name="afl" joint="jfl" kp="50"/>
    <position name="afr" joint="jfr" kp="50"/>
  </actuator>
</mujoco>
"""

Pouring: Liquid Pouring with RL

Pouring is one of the most delicate manipulation tasks — the robot must control tilt angle, flow rate, and stop at the right moment.

class PouringReward:
    """Reward function for pouring task."""
    
    def __init__(self, target_volume=0.8):
        self.target_volume = target_volume
        self.pour_started = False
    
    def compute(self, source_tilt, target_fill, flow_rate,
                spill_amount, source_pos, target_pos):
        """
        Args:
            source_tilt: Source container tilt angle (rad)
            target_fill: Fill level in target container (0-1)
            flow_rate: Pouring rate (ml/s approximation)
            spill_amount: Amount spilled
            source_pos: Source container position
            target_pos: Target container position
        """
        rewards = {}
        
        # 1. Alignment — source must be above target
        horizontal_dist = np.linalg.norm(source_pos[:2] - target_pos[:2])
        rewards['align'] = 1.0 - np.tanh(5.0 * horizontal_dist)
        
        # 2. Fill progress
        fill_error = abs(target_fill - self.target_volume)
        if target_fill <= self.target_volume:
            rewards['fill'] = 3.0 * target_fill / self.target_volume
        else:
            rewards['fill'] = -5.0 * (target_fill - self.target_volume)
        
        # 3. Flow rate — not too fast, not too slow
        if self.pour_started:
            ideal_flow = 0.1
            flow_error = abs(flow_rate - ideal_flow)
            rewards['flow'] = 1.0 - np.tanh(5.0 * flow_error)
        
        # 4. Spill penalty
        rewards['spill'] = -20.0 * spill_amount
        
        # 5. Completion bonus
        if abs(target_fill - self.target_volume) < 0.05:
            rewards['complete'] = 50.0
        else:
            rewards['complete'] = 0.0
        
        total = sum(rewards.values())
        return total, rewards

Pouring task visualization

Training Strategy: Two-Phase Learning

For complex carrying tasks, training in 2 separate phases is more effective than end-to-end:

# Phase 1: Learn stable carrying (no obstacles)
carry_env = CarryingEnv(obstacles=False)
carry_model = SAC("MlpPolicy", carry_env, ...)
carry_model.learn(total_timesteps=2_000_000)

# Phase 2: Fine-tune with obstacles (using pretrained weights)
obstacle_env = CarryingEnv(obstacles=True)
obstacle_model = SAC("MlpPolicy", obstacle_env, ...)
obstacle_model.policy.load_state_dict(carry_model.policy.state_dict())
obstacle_model.learning_rate = 1e-5
obstacle_model.learn(total_timesteps=1_000_000)

Method	Spill Rate	Collision Rate	Avg Time
Train from scratch	22%	18%	9.5s
Two-phase	8%	5%	7.2s
Two-phase + curriculum	4%	3%	6.8s

References

Whole-Body Manipulation Planning for Humanoid Robots — Murooka et al., 2021
A Survey on Dynamic Manipulation — Survey, 2022
Learning to Pour with a Robot Arm — IROS 2022

Next in the Series

Next up — Contact-Rich Manipulation: Assembly, Insertion & Peg-in-Hole — we enter the world of contact-rich tasks: assembly, peg insertion, and gear meshing — where friction and clearance determine everything.

After mastering grasping, force control, and precision placement, we now combine everything to solve real-world object transport.

Classifying Carrying Tasks

Task	Description	Difficulty	Key Challenge
Carry Flat	Carry flat objects (box, book)	Low	Prevent dropping
Carry Upright	Carry upright objects (bottle, cup)	Medium	Orientation stability
Carry on Tray	Carry tray with multiple items	High	Multi-object balance
Carry Through Obstacles	Carry through obstacle course	High	Planning + control
Pour	Pour liquid	Very high	Flow rate + tilt control

Multi-Objective Reward: Speed vs Stability

Carrying tasks are a textbook example of multi-objective optimization — you want speed but also stability, and these objectives often conflict.

Robot carrying objects in environment

import numpy as np

class CarryingReward:
    """Multi-objective reward for carrying tasks."""
    
    def __init__(self, task_type="carry_upright",
                 speed_weight=1.0, stability_weight=2.0):
        self.task_type = task_type
        self.speed_weight = speed_weight
        self.stability_weight = stability_weight
        
        self.max_tilt = np.radians(10)
        self.max_accel = 2.0
        self.spill_tilt = np.radians(25)
    
    def compute(self, state):
        """
        state: dict containing all necessary information
            - obj_pos, obj_quat, obj_vel, obj_angvel
            - ee_pos, ee_vel, ee_accel
            - goal_pos, goal_dist
            - grasping: bool
            - obstacles_nearby: list of distances
        """
        if not state['grasping']:
            return -100.0, {'dropped': True}
        
        rewards = {}
        
        # ---- STABILITY REWARDS ----
        
        # 1. Tilt penalty — keep object upright
        tilt = self._compute_tilt(state['obj_quat'])
        if tilt > self.spill_tilt:
            rewards['tilt'] = -50.0
        else:
            rewards['tilt'] = -3.0 * (tilt / self.max_tilt) ** 2
        
        # 2. Angular velocity — no spinning
        ang_vel = np.linalg.norm(state['obj_angvel'])
        rewards['angular_vel'] = -1.5 * np.tanh(2.0 * ang_vel)
        
        # 3. Acceleration penalty — smooth motion
        accel = np.linalg.norm(state['ee_accel'])
        rewards['accel'] = -0.5 * np.tanh(accel / self.max_accel)
        
        # 4. Jerk penalty
        if 'ee_jerk' in state:
            jerk = np.linalg.norm(state['ee_jerk'])
            rewards['jerk'] = -0.3 * np.tanh(jerk / 5.0)
        
        # ---- PROGRESS REWARDS ----
        
        # 5. Distance to goal
        goal_dist = state['goal_dist']
        rewards['progress'] = 2.0 * (1.0 - np.tanh(2.0 * goal_dist))
        
        # 6. Speed reward (when far from goal)
        speed = np.linalg.norm(state['ee_vel'])
        if goal_dist > 0.3:
            desired_speed = min(0.5, goal_dist)
            speed_error = abs(speed - desired_speed)
            rewards['speed'] = 0.5 * (1.0 - np.tanh(3.0 * speed_error))
        else:
            rewards['speed'] = -2.0 * speed
        
        # 7. Obstacle avoidance
        if state.get('obstacles_nearby'):
            min_obstacle_dist = min(state['obstacles_nearby'])
            if min_obstacle_dist < 0.1:
                rewards['obstacle'] = -5.0 * (1.0 - min_obstacle_dist / 0.1)
            else:
                rewards['obstacle'] = 0.0
        
        # 8. Success bonus
        if goal_dist < 0.05 and tilt < self.max_tilt:
            rewards['success'] = 30.0
        else:
            rewards['success'] = 0.0
        
        stability = (self.stability_weight * 
                     (rewards['tilt'] + rewards['angular_vel'] + 
                      rewards['accel']))
        progress = self.speed_weight * (rewards['progress'] + rewards['speed'])
        total = stability + progress + rewards['success']
        total += rewards.get('obstacle', 0) + rewards.get('jerk', 0)
        
        rewards['total'] = total
        return total, rewards
    
    def _compute_tilt(self, quat):
        """Compute tilt angle from quaternion."""
        w, x, y, z = quat
        up_z = 1 - 2*(x*x + y*y)
        return np.arccos(np.clip(up_z, -1, 1))

Pareto Front: Speed-Stability Trade-off

There is no single "optimal" reward function for carrying — instead, there is an entire Pareto front of trade-offs:

def train_pareto_front(n_points=5):
    """Train multiple policies with different weight ratios."""
    results = []
    
    for speed_w in np.linspace(0.5, 3.0, n_points):
        stability_w = 3.5 - speed_w
        
        env = CarryingEnv(
            speed_weight=speed_w,
            stability_weight=stability_w
        )
        
        model = SAC("MlpPolicy", env, ...)
        model.learn(total_timesteps=2_000_000)
        
        metrics = evaluate_carrying(model, env)
        results.append({
            'speed_weight': speed_w,
            'avg_time': metrics['avg_time'],
            'spill_rate': metrics['spill_rate'],
            'drop_rate': metrics['drop_rate'],
        })
    
    return results

Speed Weight	Stability Weight	Avg Time (s)	Spill Rate	Drop Rate
0.5	3.0	12.1	1%	0%
1.0	2.5	8.4	3%	1%
1.5	2.0	6.2	7%	2%
2.0	1.5	4.8	15%	4%
3.0	0.5	3.1	35%	12%

Task: Carry Cup Through Obstacle Course

import mujoco
import gymnasium as gym
from gymnasium import spaces

CARRY_OBSTACLE_XML = """
<mujoco model="carry_obstacles">
  <option timestep="0.002" gravity="0 0 -9.81"/>
  
  <worldbody>
    <light pos="0 0 3" dir="0 0 -1"/>
    <geom type="plane" size="3 3 0.1" rgba="0.85 0.85 0.85 1"/>
    
    <!-- Tables -->
    <body name="table_start" pos="0 0 0.35">
      <geom type="box" size="0.3 0.3 0.02" rgba="0.5 0.3 0.1 1" mass="100"/>
    </body>
    <body name="table_end" pos="1.5 0.8 0.35">
      <geom type="box" size="0.3 0.3 0.02" rgba="0.5 0.3 0.1 1" mass="100"/>
    </body>
    
    <!-- Obstacles -->
    <body name="obs1" pos="0.5 0.2 0.3">
      <geom type="box" size="0.05 0.3 0.3" rgba="0.8 0.2 0.2 0.7"/>
    </body>
    <body name="obs2" pos="1.0 0.5 0.3">
      <geom type="cylinder" size="0.08 0.3" rgba="0.8 0.2 0.2 0.7"/>
    </body>
    
    <!-- Mobile base + arm (simplified) -->
    <body name="mobile_base" pos="0 0 0.15">
      <freejoint name="base_free"/>
      <geom type="box" size="0.15 0.1 0.05" mass="10" rgba="0.3 0.3 0.8 1"/>
      
      <body name="arm_base" pos="0 0 0.05">
        <joint name="j0" type="hinge" axis="0 0 1" range="-3.14 3.14" damping="2"/>
        <geom type="cylinder" size="0.03 0.03" rgba="0.5 0.5 0.5 1"/>
        
        <body name="arm1" pos="0 0 0.06">
          <joint name="j1" type="hinge" axis="0 1 0" range="-1.5 1.5" damping="1.5"/>
          <geom type="capsule" fromto="0 0 0 0.2 0 0" size="0.025" rgba="0.6 0.6 0.6 1"/>
          
          <body name="arm2" pos="0.2 0 0">
            <joint name="j2" type="hinge" axis="0 1 0" range="-2 2" damping="1"/>
            <geom type="capsule" fromto="0 0 0 0.15 0 0" size="0.02" rgba="0.6 0.6 0.6 1"/>
            
            <body name="gripper_body" pos="0.15 0 0">
              <joint name="j3" type="hinge" axis="1 0 0" range="-1.57 1.57" damping="0.5"/>
              <site name="ee" pos="0 0 0"/>
              
              <body name="fl" pos="0 0.015 0">
                <joint name="jfl" type="slide" axis="0 1 0" range="0 0.03" damping="3"/>
                <geom type="box" size="0.008 0.003 0.03" rgba="0.8 0.3 0.3 1"
                      friction="2 0.5 0.01" contype="1" conaffinity="1"/>
              </body>
              <body name="fr" pos="0 -0.015 0">
                <joint name="jfr" type="slide" axis="0 -1 0" range="0 0.03" damping="3"/>
                <geom type="box" size="0.008 0.003 0.03" rgba="0.8 0.3 0.3 1"
                      friction="2 0.5 0.01" contype="1" conaffinity="1"/>
              </body>
            </body>
          </body>
        </body>
      </body>
    </body>
    
    <!-- Cup with liquid -->
    <body name="cup" pos="0 0 0.4">
      <freejoint name="cup_free"/>
      <geom type="cylinder" size="0.025 0.04" rgba="0.9 0.9 1 0.8" mass="0.1"
            contype="1" conaffinity="1"/>
      <body name="liquid" pos="0 0 0.02">
        <joint name="liq_x" type="slide" axis="1 0 0" range="-0.015 0.015" damping="3"/>
        <joint name="liq_y" type="slide" axis="0 1 0" range="-0.015 0.015" damping="3"/>
        <geom type="sphere" size="0.015" rgba="0.2 0.5 1 0.5" mass="0.15"
              contype="0" conaffinity="0"/>
      </body>
    </body>
    
    <!-- Goal marker -->
    <body name="goal" pos="1.5 0.8 0.45">
      <geom type="sphere" size="0.03" rgba="0 1 0 0.3" contype="0" conaffinity="0"/>
    </body>
  </worldbody>
  
  <actuator>
    <motor name="base_fwd" joint="base_free" gear="0 0 0 1 0 0" ctrlrange="-5 5"/>
    <motor name="base_strafe" joint="base_free" gear="0 0 0 0 1 0" ctrlrange="-3 3"/>
    <motor name="base_rot" joint="base_free" gear="0 0 0 0 0 1" ctrlrange="-2 2"/>
    <position name="a0" joint="j0" kp="100"/>
    <position name="a1" joint="j1" kp="100"/>
    <position name="a2" joint="j2" kp="80"/>
    <position name="a3" joint="j3" kp="50"/>
    <position name="afl" joint="jfl" kp="50"/>
    <position name="afr" joint="jfr" kp="50"/>
  </actuator>
</mujoco>
"""

Pouring: Liquid Pouring with RL

Pouring is one of the most delicate manipulation tasks — the robot must control tilt angle, flow rate, and stop at the right moment.

class PouringReward:
    """Reward function for pouring task."""
    
    def __init__(self, target_volume=0.8):
        self.target_volume = target_volume
        self.pour_started = False
    
    def compute(self, source_tilt, target_fill, flow_rate,
                spill_amount, source_pos, target_pos):
        """
        Args:
            source_tilt: Source container tilt angle (rad)
            target_fill: Fill level in target container (0-1)
            flow_rate: Pouring rate (ml/s approximation)
            spill_amount: Amount spilled
            source_pos: Source container position
            target_pos: Target container position
        """
        rewards = {}
        
        # 1. Alignment — source must be above target
        horizontal_dist = np.linalg.norm(source_pos[:2] - target_pos[:2])
        rewards['align'] = 1.0 - np.tanh(5.0 * horizontal_dist)
        
        # 2. Fill progress
        fill_error = abs(target_fill - self.target_volume)
        if target_fill <= self.target_volume:
            rewards['fill'] = 3.0 * target_fill / self.target_volume
        else:
            rewards['fill'] = -5.0 * (target_fill - self.target_volume)
        
        # 3. Flow rate — not too fast, not too slow
        if self.pour_started:
            ideal_flow = 0.1
            flow_error = abs(flow_rate - ideal_flow)
            rewards['flow'] = 1.0 - np.tanh(5.0 * flow_error)
        
        # 4. Spill penalty
        rewards['spill'] = -20.0 * spill_amount
        
        # 5. Completion bonus
        if abs(target_fill - self.target_volume) < 0.05:
            rewards['complete'] = 50.0
        else:
            rewards['complete'] = 0.0
        
        total = sum(rewards.values())
        return total, rewards

Pouring task visualization

Training Strategy: Two-Phase Learning

For complex carrying tasks, training in 2 separate phases is more effective than end-to-end:

# Phase 1: Learn stable carrying (no obstacles)
carry_env = CarryingEnv(obstacles=False)
carry_model = SAC("MlpPolicy", carry_env, ...)
carry_model.learn(total_timesteps=2_000_000)

# Phase 2: Fine-tune with obstacles (using pretrained weights)
obstacle_env = CarryingEnv(obstacles=True)
obstacle_model = SAC("MlpPolicy", obstacle_env, ...)
obstacle_model.policy.load_state_dict(carry_model.policy.state_dict())
obstacle_model.learning_rate = 1e-5
obstacle_model.learn(total_timesteps=1_000_000)

Method	Spill Rate	Collision Rate	Avg Time
Train from scratch	22%	18%	9.5s
Two-phase	8%	5%	7.2s
Two-phase + curriculum	4%	3%	6.8s

References

Whole-Body Manipulation Planning for Humanoid Robots — Murooka et al., 2021
A Survey on Dynamic Manipulation — Survey, 2022
Learning to Pour with a Robot Arm — IROS 2022

Carrying & Transporting Objects: Stability During Motion

Classifying Carrying Tasks

Multi-Objective Reward: Speed vs Stability

Pareto Front: Speed-Stability Trade-off

Task: Carry Cup Through Obstacle Course

Pouring: Liquid Pouring with RL

Training Strategy: Two-Phase Learning

References

Next in the Series

Nguyễn Anh Tuấn

Related Posts

Multi-Step Manipulation: Curriculum Learning cho Long-Horizon

Tool Use: Robot học sử dụng dụng cụ bằng RL

Pick-and-Place chính xác: Position và Orientation Control

Carrying & Transporting Objects: Stability During Motion

Classifying Carrying Tasks

Multi-Objective Reward: Speed vs Stability

Pareto Front: Speed-Stability Trade-off

Task: Carry Cup Through Obstacle Course

Pouring: Liquid Pouring with RL

Training Strategy: Two-Phase Learning

References

Next in the Series

Nguyễn Anh Tuấn

Related Posts

Multi-Step Manipulation: Curriculum Learning cho Long-Horizon

Tool Use: Robot học sử dụng dụng cụ bằng RL

Pick-and-Place chính xác: Position và Orientation Control