Bê và vận chuyển vật thể: Ổn định trong di chuyển

Bạn đã từng thấy robot bê khay đồ ăn trong nhà hàng chưa? Trông thì nhẹ nhàng, nhưng đằng sau đó là một bài toán cực kỳ phức tạp: robot phải vừa di chuyển nhanh (khách đang đợi), vừa giữ đồ ổn định (không đổ soup), vừa tránh chướng ngại (bàn ghế, người qua lại). Đây là bài toán carrying and transporting — và RL là công cụ mạnh mẽ để giải quyết nó.

Sau khi đã thành thạo grasping, force control, và precision placement, bây giờ chúng ta kết hợp tất cả để giải bài toán vận chuyển vật thể trong thế giới thực.

Phân loại Carrying Tasks

Task	Mô tả	Độ khó	Thách thức chính
Carry Flat	Bê vật phẳng (hộp, sách)	Thấp	Giữ không rơi
Carry Upright	Bê vật thẳng đứng (chai, cốc)	Trung bình	Orientation stability
Carry on Tray	Bê khay nhiều vật	Cao	Multi-object balance
Carry Through Obstacles	Bê qua chướng ngại	Cao	Planning + control
Pour	Rót nước/chất lỏng	Rất cao	Flow rate + tilt control

Multi-Objective Reward: Speed vs Stability

Carrying task là ví dụ điển hình của multi-objective optimization — bạn muốn nhanh nhưng cũng muốn ổn định, và hai mục tiêu này thường xung đột nhau.

import numpy as np

class CarryingReward:
    """Multi-objective reward cho carrying tasks."""
    
    def __init__(self, task_type="carry_upright",
                 speed_weight=1.0, stability_weight=2.0):
        self.task_type = task_type
        self.speed_weight = speed_weight
        self.stability_weight = stability_weight
        
        # Thresholds
        self.max_tilt = np.radians(10)      # Max nghiêng 10 độ
        self.max_accel = 2.0                 # Max gia tốc m/s^2
        self.spill_tilt = np.radians(25)     # Đổ nước
    
    def compute(self, state):
        """
        state: dict chứa tất cả thông tin cần thiết
            - obj_pos, obj_quat, obj_vel, obj_angvel
            - ee_pos, ee_vel, ee_accel
            - goal_pos, goal_dist
            - grasping: bool
            - obstacles_nearby: list of distances
        """
        if not state['grasping']:
            return -100.0, {'dropped': True}
        
        rewards = {}
        
        # ---- STABILITY REWARDS ----
        
        # 1. Tilt penalty — giữ vật thẳng đứng
        tilt = self._compute_tilt(state['obj_quat'])
        if tilt > self.spill_tilt:
            rewards['tilt'] = -50.0  # Catastrophic failure
        else:
            rewards['tilt'] = -3.0 * (tilt / self.max_tilt) ** 2
        
        # 2. Angular velocity — không xoay
        ang_vel = np.linalg.norm(state['obj_angvel'])
        rewards['angular_vel'] = -1.5 * np.tanh(2.0 * ang_vel)
        
        # 3. Acceleration penalty — chuyển động mượt
        accel = np.linalg.norm(state['ee_accel'])
        rewards['accel'] = -0.5 * np.tanh(accel / self.max_accel)
        
        # 4. Jerk penalty — đạo hàm gia tốc
        if 'ee_jerk' in state:
            jerk = np.linalg.norm(state['ee_jerk'])
            rewards['jerk'] = -0.3 * np.tanh(jerk / 5.0)
        
        # ---- PROGRESS REWARDS ----
        
        # 5. Distance to goal
        goal_dist = state['goal_dist']
        rewards['progress'] = 2.0 * (1.0 - np.tanh(2.0 * goal_dist))
        
        # 6. Speed reward (khi xa đích)
        speed = np.linalg.norm(state['ee_vel'])
        if goal_dist > 0.3:
            # Khuyến khích di chuyển nhanh khi xa đích
            desired_speed = min(0.5, goal_dist)
            speed_error = abs(speed - desired_speed)
            rewards['speed'] = 0.5 * (1.0 - np.tanh(3.0 * speed_error))
        else:
            # Chậm lại khi gần đích
            rewards['speed'] = -2.0 * speed
        
        # 7. Obstacle avoidance
        if state.get('obstacles_nearby'):
            min_obstacle_dist = min(state['obstacles_nearby'])
            if min_obstacle_dist < 0.1:
                rewards['obstacle'] = -5.0 * (1.0 - min_obstacle_dist / 0.1)
            else:
                rewards['obstacle'] = 0.0
        
        # 8. Success bonus
        if goal_dist < 0.05 and tilt < self.max_tilt:
            rewards['success'] = 30.0
        else:
            rewards['success'] = 0.0
        
        # Weighted sum
        stability = (self.stability_weight * 
                     (rewards['tilt'] + rewards['angular_vel'] + 
                      rewards['accel']))
        progress = self.speed_weight * (rewards['progress'] + rewards['speed'])
        total = stability + progress + rewards['success']
        total += rewards.get('obstacle', 0) + rewards.get('jerk', 0)
        
        rewards['total'] = total
        return total, rewards
    
    def _compute_tilt(self, quat):
        """Tính góc nghiêng từ quaternion."""
        # Quaternion to up-vector
        w, x, y, z = quat
        up_x = 2 * (x*z + w*y)
        up_y = 2 * (y*z - w*x)
        up_z = 1 - 2*(x*x + y*y)
        cos_tilt = up_z
        return np.arccos(np.clip(cos_tilt, -1, 1))

Pareto Front: Speed-Stability Trade-off

Không có một reward function "tối ưu" cho carrying — mà có cả một Pareto front của trade-offs:

def train_pareto_front(n_points=5):
    """Train nhiều policies với weight ratios khác nhau."""
    results = []
    
    for speed_w in np.linspace(0.5, 3.0, n_points):
        stability_w = 3.5 - speed_w  # Tổng weights = 3.5
        
        env = CarryingEnv(
            speed_weight=speed_w,
            stability_weight=stability_w
        )
        
        model = SAC("MlpPolicy", env, ...)
        model.learn(total_timesteps=2_000_000)
        
        # Evaluate
        metrics = evaluate_carrying(model, env)
        results.append({
            'speed_weight': speed_w,
            'avg_time': metrics['avg_time'],
            'spill_rate': metrics['spill_rate'],
            'drop_rate': metrics['drop_rate'],
        })
    
    return results

Speed Weight	Stability Weight	Avg Time (s)	Spill Rate	Drop Rate
0.5	3.0	12.1	1%	0%
1.0	2.5	8.4	3%	1%
1.5	2.0	6.2	7%	2%
2.0	1.5	4.8	15%	4%
3.0	0.5	3.1	35%	12%

Task: Carry Cup Through Obstacle Course

import mujoco
import gymnasium as gym
from gymnasium import spaces

CARRY_OBSTACLE_XML = """
<mujoco model="carry_obstacles">
  <option timestep="0.002" gravity="0 0 -9.81"/>
  
  <worldbody>
    <light pos="0 0 3" dir="0 0 -1"/>
    <geom type="plane" size="3 3 0.1" rgba="0.85 0.85 0.85 1"/>
    
    <!-- Tables -->
    <body name="table_start" pos="0 0 0.35">
      <geom type="box" size="0.3 0.3 0.02" rgba="0.5 0.3 0.1 1" mass="100"/>
    </body>
    <body name="table_end" pos="1.5 0.8 0.35">
      <geom type="box" size="0.3 0.3 0.02" rgba="0.5 0.3 0.1 1" mass="100"/>
    </body>
    
    <!-- Obstacles -->
    <body name="obs1" pos="0.5 0.2 0.3">
      <geom type="box" size="0.05 0.3 0.3" rgba="0.8 0.2 0.2 0.7"/>
    </body>
    <body name="obs2" pos="1.0 0.5 0.3">
      <geom type="cylinder" size="0.08 0.3" rgba="0.8 0.2 0.2 0.7"/>
    </body>
    
    <!-- Mobile base + arm (simplified) -->
    <body name="mobile_base" pos="0 0 0.15">
      <freejoint name="base_free"/>
      <geom type="box" size="0.15 0.1 0.05" mass="10" rgba="0.3 0.3 0.8 1"/>
      
      <body name="arm_base" pos="0 0 0.05">
        <joint name="j0" type="hinge" axis="0 0 1" range="-3.14 3.14" damping="2"/>
        <geom type="cylinder" size="0.03 0.03" rgba="0.5 0.5 0.5 1"/>
        
        <body name="arm1" pos="0 0 0.06">
          <joint name="j1" type="hinge" axis="0 1 0" range="-1.5 1.5" damping="1.5"/>
          <geom type="capsule" fromto="0 0 0 0.2 0 0" size="0.025" rgba="0.6 0.6 0.6 1"/>
          
          <body name="arm2" pos="0.2 0 0">
            <joint name="j2" type="hinge" axis="0 1 0" range="-2 2" damping="1"/>
            <geom type="capsule" fromto="0 0 0 0.15 0 0" size="0.02" rgba="0.6 0.6 0.6 1"/>
            
            <body name="gripper_body" pos="0.15 0 0">
              <joint name="j3" type="hinge" axis="1 0 0" range="-1.57 1.57" damping="0.5"/>
              <site name="ee" pos="0 0 0"/>
              
              <body name="fl" pos="0 0.015 0">
                <joint name="jfl" type="slide" axis="0 1 0" range="0 0.03" damping="3"/>
                <geom type="box" size="0.008 0.003 0.03" rgba="0.8 0.3 0.3 1"
                      friction="2 0.5 0.01" contype="1" conaffinity="1"/>
              </body>
              <body name="fr" pos="0 -0.015 0">
                <joint name="jfr" type="slide" axis="0 -1 0" range="0 0.03" damping="3"/>
                <geom type="box" size="0.008 0.003 0.03" rgba="0.8 0.3 0.3 1"
                      friction="2 0.5 0.01" contype="1" conaffinity="1"/>
              </body>
            </body>
          </body>
        </body>
      </body>
    </body>
    
    <!-- Cup with liquid -->
    <body name="cup" pos="0 0 0.4">
      <freejoint name="cup_free"/>
      <geom type="cylinder" size="0.025 0.04" rgba="0.9 0.9 1 0.8" mass="0.1"
            contype="1" conaffinity="1"/>
      <body name="liquid" pos="0 0 0.02">
        <joint name="liq_x" type="slide" axis="1 0 0" range="-0.015 0.015" damping="3"/>
        <joint name="liq_y" type="slide" axis="0 1 0" range="-0.015 0.015" damping="3"/>
        <geom type="sphere" size="0.015" rgba="0.2 0.5 1 0.5" mass="0.15"
              contype="0" conaffinity="0"/>
      </body>
    </body>
    
    <!-- Goal marker -->
    <body name="goal" pos="1.5 0.8 0.45">
      <geom type="sphere" size="0.03" rgba="0 1 0 0.3" contype="0" conaffinity="0"/>
    </body>
  </worldbody>
  
  <actuator>
    <!-- Mobile base: forward, strafe, rotate -->
    <motor name="base_fwd" joint="base_free" gear="0 0 0 1 0 0" ctrlrange="-5 5"/>
    <motor name="base_strafe" joint="base_free" gear="0 0 0 0 1 0" ctrlrange="-3 3"/>
    <motor name="base_rot" joint="base_free" gear="0 0 0 0 0 1" ctrlrange="-2 2"/>
    <!-- Arm joints -->
    <position name="a0" joint="j0" kp="100"/>
    <position name="a1" joint="j1" kp="100"/>
    <position name="a2" joint="j2" kp="80"/>
    <position name="a3" joint="j3" kp="50"/>
    <position name="afl" joint="jfl" kp="50"/>
    <position name="afr" joint="jfr" kp="50"/>
  </actuator>
</mujoco>
"""

Pouring: Rót nước bằng RL

Pouring là một trong những tác vụ manipulation tinh tế nhất — robot phải kiểm soát góc nghiêng, tốc độ rót, và dừng đúng lúc.

class PouringReward:
    """Reward function cho pouring task."""
    
    def __init__(self, target_volume=0.8):
        self.target_volume = target_volume  # 80% capacity
        self.pour_started = False
    
    def compute(self, source_tilt, target_fill, flow_rate,
                spill_amount, source_pos, target_pos):
        """
        Args:
            source_tilt: Góc nghiêng bình nguồn (rad)
            target_fill: Mức nước trong bình đích (0-1)
            flow_rate: Tốc độ rót (ml/s approximation)
            spill_amount: Lượng đổ ra ngoài
            source_pos: Vị trí bình nguồn
            target_pos: Vị trí bình đích
        """
        rewards = {}
        
        # 1. Alignment — bình nguồn phải ở trên bình đích
        horizontal_dist = np.linalg.norm(source_pos[:2] - target_pos[:2])
        rewards['align'] = 1.0 - np.tanh(5.0 * horizontal_dist)
        
        # 2. Fill progress — đổ dần
        fill_error = abs(target_fill - self.target_volume)
        if target_fill <= self.target_volume:
            rewards['fill'] = 3.0 * target_fill / self.target_volume
        else:
            rewards['fill'] = -5.0 * (target_fill - self.target_volume)
        
        # 3. Flow rate — không quá nhanh, không quá chậm
        if self.pour_started:
            ideal_flow = 0.1  # ml/s
            flow_error = abs(flow_rate - ideal_flow)
            rewards['flow'] = 1.0 - np.tanh(5.0 * flow_error)
        
        # 4. Spill penalty
        rewards['spill'] = -20.0 * spill_amount
        
        # 5. Completion bonus
        if abs(target_fill - self.target_volume) < 0.05:
            rewards['complete'] = 50.0
        else:
            rewards['complete'] = 0.0
        
        total = sum(rewards.values())
        return total, rewards

Training Strategy: Two-Phase Learning

Cho carrying tasks phức tạp, train 2 phases riêng biệt hiệu quả hơn end-to-end:

# Phase 1: Learn stable carrying (không có obstacles)
carry_env = CarryingEnv(obstacles=False)
carry_model = SAC("MlpPolicy", carry_env, ...)
carry_model.learn(total_timesteps=2_000_000)

# Phase 2: Fine-tune với obstacles (dùng pretrained weights)
obstacle_env = CarryingEnv(obstacles=True)
obstacle_model = SAC("MlpPolicy", obstacle_env, ...)
# Load pretrained policy
obstacle_model.policy.load_state_dict(carry_model.policy.state_dict())
# Lower learning rate cho fine-tuning
obstacle_model.learning_rate = 1e-5
obstacle_model.learn(total_timesteps=1_000_000)

Method	Spill Rate	Collision Rate	Avg Time
Train from scratch	22%	18%	9.5s
Two-phase	8%	5%	7.2s
Two-phase + curriculum	4%	3%	6.8s

Tài liệu tham khảo

Whole-Body Manipulation Planning for Humanoid Robots — Murooka et al., 2021
A Survey on Dynamic Manipulation — Survey, 2022
Learning to Pour with a Robot Arm — IROS 2022

Tiếp theo trong Series

Bài tiếp — Contact-Rich Manipulation: Assembly, Insertion và Peg-in-Hole — chúng ta bước vào thế giới contact-rich: lắp ráp, cắm chốt, và meshing gear — nơi ma sát và clearance quyết định tất cả.

Bê và vận chuyển vật thể: Ổn định trong di chuyển

Phân loại Carrying Tasks

Multi-Objective Reward: Speed vs Stability

Pareto Front: Speed-Stability Trade-off

Task: Carry Cup Through Obstacle Course

Pouring: Rót nước bằng RL

Training Strategy: Two-Phase Learning

Tài liệu tham khảo

Tiếp theo trong Series

Bài viết liên quan

Nguyễn Anh Tuấn

Bài viết liên quan

X-VLA ICLR 2026: Soft-Prompted VLA 0.9B cho beginner LeRobot

Multitask DiT Policy LeRobot v0.5: 1 model nhiều task

ABot-M0: VLA Foundation Model với Action Manifold