← Back to Blog
manipulationrlcarryingmanipulationstability

Carrying & Transporting Objects: Stability During Motion

Train robots to carry objects stably with RL — from carry-flat to pouring, multi-objective reward balancing speed vs stability.

Nguyễn Anh Tuấn26 tháng 3, 20268 min read
Carrying & Transporting Objects: Stability During Motion

Have you ever seen a robot carrying a food tray in a restaurant? It looks effortless, but behind the scenes lies an incredibly complex problem: the robot must move quickly (the customer is waiting), keep the items stable (no spilled soup), and avoid obstacles (tables, chairs, people). This is the carrying and transporting problem — and RL is a powerful tool to solve it.

After mastering grasping, force control, and precision placement, we now combine everything to solve real-world object transport.

Classifying Carrying Tasks

Task Description Difficulty Key Challenge
Carry Flat Carry flat objects (box, book) Low Prevent dropping
Carry Upright Carry upright objects (bottle, cup) Medium Orientation stability
Carry on Tray Carry tray with multiple items High Multi-object balance
Carry Through Obstacles Carry through obstacle course High Planning + control
Pour Pour liquid Very high Flow rate + tilt control

Multi-Objective Reward: Speed vs Stability

Carrying tasks are a textbook example of multi-objective optimization — you want speed but also stability, and these objectives often conflict.

Robot carrying objects in environment

import numpy as np

class CarryingReward:
    """Multi-objective reward for carrying tasks."""
    
    def __init__(self, task_type="carry_upright",
                 speed_weight=1.0, stability_weight=2.0):
        self.task_type = task_type
        self.speed_weight = speed_weight
        self.stability_weight = stability_weight
        
        self.max_tilt = np.radians(10)
        self.max_accel = 2.0
        self.spill_tilt = np.radians(25)
    
    def compute(self, state):
        """
        state: dict containing all necessary information
            - obj_pos, obj_quat, obj_vel, obj_angvel
            - ee_pos, ee_vel, ee_accel
            - goal_pos, goal_dist
            - grasping: bool
            - obstacles_nearby: list of distances
        """
        if not state['grasping']:
            return -100.0, {'dropped': True}
        
        rewards = {}
        
        # ---- STABILITY REWARDS ----
        
        # 1. Tilt penalty — keep object upright
        tilt = self._compute_tilt(state['obj_quat'])
        if tilt > self.spill_tilt:
            rewards['tilt'] = -50.0
        else:
            rewards['tilt'] = -3.0 * (tilt / self.max_tilt) ** 2
        
        # 2. Angular velocity — no spinning
        ang_vel = np.linalg.norm(state['obj_angvel'])
        rewards['angular_vel'] = -1.5 * np.tanh(2.0 * ang_vel)
        
        # 3. Acceleration penalty — smooth motion
        accel = np.linalg.norm(state['ee_accel'])
        rewards['accel'] = -0.5 * np.tanh(accel / self.max_accel)
        
        # 4. Jerk penalty
        if 'ee_jerk' in state:
            jerk = np.linalg.norm(state['ee_jerk'])
            rewards['jerk'] = -0.3 * np.tanh(jerk / 5.0)
        
        # ---- PROGRESS REWARDS ----
        
        # 5. Distance to goal
        goal_dist = state['goal_dist']
        rewards['progress'] = 2.0 * (1.0 - np.tanh(2.0 * goal_dist))
        
        # 6. Speed reward (when far from goal)
        speed = np.linalg.norm(state['ee_vel'])
        if goal_dist > 0.3:
            desired_speed = min(0.5, goal_dist)
            speed_error = abs(speed - desired_speed)
            rewards['speed'] = 0.5 * (1.0 - np.tanh(3.0 * speed_error))
        else:
            rewards['speed'] = -2.0 * speed
        
        # 7. Obstacle avoidance
        if state.get('obstacles_nearby'):
            min_obstacle_dist = min(state['obstacles_nearby'])
            if min_obstacle_dist < 0.1:
                rewards['obstacle'] = -5.0 * (1.0 - min_obstacle_dist / 0.1)
            else:
                rewards['obstacle'] = 0.0
        
        # 8. Success bonus
        if goal_dist < 0.05 and tilt < self.max_tilt:
            rewards['success'] = 30.0
        else:
            rewards['success'] = 0.0
        
        stability = (self.stability_weight * 
                     (rewards['tilt'] + rewards['angular_vel'] + 
                      rewards['accel']))
        progress = self.speed_weight * (rewards['progress'] + rewards['speed'])
        total = stability + progress + rewards['success']
        total += rewards.get('obstacle', 0) + rewards.get('jerk', 0)
        
        rewards['total'] = total
        return total, rewards
    
    def _compute_tilt(self, quat):
        """Compute tilt angle from quaternion."""
        w, x, y, z = quat
        up_z = 1 - 2*(x*x + y*y)
        return np.arccos(np.clip(up_z, -1, 1))

Pareto Front: Speed-Stability Trade-off

There is no single "optimal" reward function for carrying — instead, there is an entire Pareto front of trade-offs:

def train_pareto_front(n_points=5):
    """Train multiple policies with different weight ratios."""
    results = []
    
    for speed_w in np.linspace(0.5, 3.0, n_points):
        stability_w = 3.5 - speed_w
        
        env = CarryingEnv(
            speed_weight=speed_w,
            stability_weight=stability_w
        )
        
        model = SAC("MlpPolicy", env, ...)
        model.learn(total_timesteps=2_000_000)
        
        metrics = evaluate_carrying(model, env)
        results.append({
            'speed_weight': speed_w,
            'avg_time': metrics['avg_time'],
            'spill_rate': metrics['spill_rate'],
            'drop_rate': metrics['drop_rate'],
        })
    
    return results
Speed Weight Stability Weight Avg Time (s) Spill Rate Drop Rate
0.5 3.0 12.1 1% 0%
1.0 2.5 8.4 3% 1%
1.5 2.0 6.2 7% 2%
2.0 1.5 4.8 15% 4%
3.0 0.5 3.1 35% 12%

Task: Carry Cup Through Obstacle Course

import mujoco
import gymnasium as gym
from gymnasium import spaces

CARRY_OBSTACLE_XML = """
<mujoco model="carry_obstacles">
  <option timestep="0.002" gravity="0 0 -9.81"/>
  
  <worldbody>
    <light pos="0 0 3" dir="0 0 -1"/>
    <geom type="plane" size="3 3 0.1" rgba="0.85 0.85 0.85 1"/>
    
    <!-- Tables -->
    <body name="table_start" pos="0 0 0.35">
      <geom type="box" size="0.3 0.3 0.02" rgba="0.5 0.3 0.1 1" mass="100"/>
    </body>
    <body name="table_end" pos="1.5 0.8 0.35">
      <geom type="box" size="0.3 0.3 0.02" rgba="0.5 0.3 0.1 1" mass="100"/>
    </body>
    
    <!-- Obstacles -->
    <body name="obs1" pos="0.5 0.2 0.3">
      <geom type="box" size="0.05 0.3 0.3" rgba="0.8 0.2 0.2 0.7"/>
    </body>
    <body name="obs2" pos="1.0 0.5 0.3">
      <geom type="cylinder" size="0.08 0.3" rgba="0.8 0.2 0.2 0.7"/>
    </body>
    
    <!-- Mobile base + arm (simplified) -->
    <body name="mobile_base" pos="0 0 0.15">
      <freejoint name="base_free"/>
      <geom type="box" size="0.15 0.1 0.05" mass="10" rgba="0.3 0.3 0.8 1"/>
      
      <body name="arm_base" pos="0 0 0.05">
        <joint name="j0" type="hinge" axis="0 0 1" range="-3.14 3.14" damping="2"/>
        <geom type="cylinder" size="0.03 0.03" rgba="0.5 0.5 0.5 1"/>
        
        <body name="arm1" pos="0 0 0.06">
          <joint name="j1" type="hinge" axis="0 1 0" range="-1.5 1.5" damping="1.5"/>
          <geom type="capsule" fromto="0 0 0 0.2 0 0" size="0.025" rgba="0.6 0.6 0.6 1"/>
          
          <body name="arm2" pos="0.2 0 0">
            <joint name="j2" type="hinge" axis="0 1 0" range="-2 2" damping="1"/>
            <geom type="capsule" fromto="0 0 0 0.15 0 0" size="0.02" rgba="0.6 0.6 0.6 1"/>
            
            <body name="gripper_body" pos="0.15 0 0">
              <joint name="j3" type="hinge" axis="1 0 0" range="-1.57 1.57" damping="0.5"/>
              <site name="ee" pos="0 0 0"/>
              
              <body name="fl" pos="0 0.015 0">
                <joint name="jfl" type="slide" axis="0 1 0" range="0 0.03" damping="3"/>
                <geom type="box" size="0.008 0.003 0.03" rgba="0.8 0.3 0.3 1"
                      friction="2 0.5 0.01" contype="1" conaffinity="1"/>
              </body>
              <body name="fr" pos="0 -0.015 0">
                <joint name="jfr" type="slide" axis="0 -1 0" range="0 0.03" damping="3"/>
                <geom type="box" size="0.008 0.003 0.03" rgba="0.8 0.3 0.3 1"
                      friction="2 0.5 0.01" contype="1" conaffinity="1"/>
              </body>
            </body>
          </body>
        </body>
      </body>
    </body>
    
    <!-- Cup with liquid -->
    <body name="cup" pos="0 0 0.4">
      <freejoint name="cup_free"/>
      <geom type="cylinder" size="0.025 0.04" rgba="0.9 0.9 1 0.8" mass="0.1"
            contype="1" conaffinity="1"/>
      <body name="liquid" pos="0 0 0.02">
        <joint name="liq_x" type="slide" axis="1 0 0" range="-0.015 0.015" damping="3"/>
        <joint name="liq_y" type="slide" axis="0 1 0" range="-0.015 0.015" damping="3"/>
        <geom type="sphere" size="0.015" rgba="0.2 0.5 1 0.5" mass="0.15"
              contype="0" conaffinity="0"/>
      </body>
    </body>
    
    <!-- Goal marker -->
    <body name="goal" pos="1.5 0.8 0.45">
      <geom type="sphere" size="0.03" rgba="0 1 0 0.3" contype="0" conaffinity="0"/>
    </body>
  </worldbody>
  
  <actuator>
    <motor name="base_fwd" joint="base_free" gear="0 0 0 1 0 0" ctrlrange="-5 5"/>
    <motor name="base_strafe" joint="base_free" gear="0 0 0 0 1 0" ctrlrange="-3 3"/>
    <motor name="base_rot" joint="base_free" gear="0 0 0 0 0 1" ctrlrange="-2 2"/>
    <position name="a0" joint="j0" kp="100"/>
    <position name="a1" joint="j1" kp="100"/>
    <position name="a2" joint="j2" kp="80"/>
    <position name="a3" joint="j3" kp="50"/>
    <position name="afl" joint="jfl" kp="50"/>
    <position name="afr" joint="jfr" kp="50"/>
  </actuator>
</mujoco>
"""

Pouring: Liquid Pouring with RL

Pouring is one of the most delicate manipulation tasks — the robot must control tilt angle, flow rate, and stop at the right moment.

class PouringReward:
    """Reward function for pouring task."""
    
    def __init__(self, target_volume=0.8):
        self.target_volume = target_volume
        self.pour_started = False
    
    def compute(self, source_tilt, target_fill, flow_rate,
                spill_amount, source_pos, target_pos):
        """
        Args:
            source_tilt: Source container tilt angle (rad)
            target_fill: Fill level in target container (0-1)
            flow_rate: Pouring rate (ml/s approximation)
            spill_amount: Amount spilled
            source_pos: Source container position
            target_pos: Target container position
        """
        rewards = {}
        
        # 1. Alignment — source must be above target
        horizontal_dist = np.linalg.norm(source_pos[:2] - target_pos[:2])
        rewards['align'] = 1.0 - np.tanh(5.0 * horizontal_dist)
        
        # 2. Fill progress
        fill_error = abs(target_fill - self.target_volume)
        if target_fill <= self.target_volume:
            rewards['fill'] = 3.0 * target_fill / self.target_volume
        else:
            rewards['fill'] = -5.0 * (target_fill - self.target_volume)
        
        # 3. Flow rate — not too fast, not too slow
        if self.pour_started:
            ideal_flow = 0.1
            flow_error = abs(flow_rate - ideal_flow)
            rewards['flow'] = 1.0 - np.tanh(5.0 * flow_error)
        
        # 4. Spill penalty
        rewards['spill'] = -20.0 * spill_amount
        
        # 5. Completion bonus
        if abs(target_fill - self.target_volume) < 0.05:
            rewards['complete'] = 50.0
        else:
            rewards['complete'] = 0.0
        
        total = sum(rewards.values())
        return total, rewards

Pouring task visualization

Training Strategy: Two-Phase Learning

For complex carrying tasks, training in 2 separate phases is more effective than end-to-end:

# Phase 1: Learn stable carrying (no obstacles)
carry_env = CarryingEnv(obstacles=False)
carry_model = SAC("MlpPolicy", carry_env, ...)
carry_model.learn(total_timesteps=2_000_000)

# Phase 2: Fine-tune with obstacles (using pretrained weights)
obstacle_env = CarryingEnv(obstacles=True)
obstacle_model = SAC("MlpPolicy", obstacle_env, ...)
obstacle_model.policy.load_state_dict(carry_model.policy.state_dict())
obstacle_model.learning_rate = 1e-5
obstacle_model.learn(total_timesteps=1_000_000)
Method Spill Rate Collision Rate Avg Time
Train from scratch 22% 18% 9.5s
Two-phase 8% 5% 7.2s
Two-phase + curriculum 4% 3% 6.8s

References

  1. Whole-Body Manipulation Planning for Humanoid Robots — Murooka et al., 2021
  2. A Survey on Dynamic Manipulation — Survey, 2022
  3. Learning to Pour with a Robot Arm — IROS 2022

Next in the Series

Next up — Contact-Rich Manipulation: Assembly, Insertion & Peg-in-Hole — we enter the world of contact-rich tasks: assembly, peg insertion, and gear meshing — where friction and clearance determine everything.

Related Posts

Related Posts

TutorialSim-to-Real cho Humanoid: Deployment Best Practices
sim2realhumanoiddeploymentrlunitreePart 10

Sim-to-Real cho Humanoid: Deployment Best Practices

Pipeline hoàn chỉnh deploy RL locomotion policy lên robot humanoid thật — domain randomization, system identification, safety, và Unitree SDK.

9/4/202611 min read
ResearchUnifoLM-VLA-0: Mô hình VLA cho Manipulation trên Unitree G1
vlaunitreeg1manipulationhumanoid

UnifoLM-VLA-0: Mô hình VLA cho Manipulation trên Unitree G1

Phân tích và hướng dẫn triển khai UnifoLM-VLA-0 — mô hình VLA open-source đầu tiên chạy trực tiếp trên G1 humanoid

8/4/202623 min read
ResearchWholeBodyVLA: VLA Toàn Thân cho Humanoid Loco-Manipulation
vlahumanoidloco-manipulationiclrrl

WholeBodyVLA: VLA Toàn Thân cho Humanoid Loco-Manipulation

ICLR 2026 — học manipulation từ egocentric video, kết hợp VLA + RL cho locomotion-aware control

7/4/202613 min read