Dual-Arm Robot: Setup và Calibration với LeRobot

Giới thiệu: Tại sao cần hai tay?

Hãy thử gấp một cái khăn bằng một tay. Hay rót nước từ bình vào cốc bằng một tay. Khó đúng không? Có rất nhiều tác vụ trong đời thường yêu cầu hai tay phối hợp — và đó chính là lý do dual-arm robot ngày càng quan trọng.

Trong series này, chúng ta đã đi từ single-arm tasks (bài 3) đến multi-object (bài 4) và long-horizon planning (bài 5). Giờ là lúc mở rộng sang bimanual manipulation — lĩnh vực nghiên cứu hot nhất trong robotics hiện tại, được thúc đẩy bởi sự thành công của ALOHA.

Tổng quan hệ thống Dual-Arm

Các nền tảng phần cứng phổ biến

Hệ thống	Arms	DOF/arm	Giá	Đặc điểm
ALOHA	2x ViperX 300	6+1	~$20K	Gold standard cho research
2x SO-100	2x SO-100	6	~$500	Budget-friendly nhất
Mobile ALOHA	2x ViperX 300 + base	6+1+2	~$32K	Mobile + manipulation
Franka Dual	2x Franka Panda	7+1	~$100K+	Industrial-grade
ALOHA 2	2x ViperX 300 6S	6+1	~$25K	Upgraded ALOHA

Cấu trúc hệ thống trong LeRobot

from lerobot.common.robot_devices.robots.manipulator import ManipulatorRobot
from lerobot.common.robot_devices.motors.feetech import FeetechMotorsBus
from lerobot.common.robot_devices.cameras.opencv import OpenCVCamera

def setup_dual_arm_robot():
    """Thiết lập hệ thống dual-arm với LeRobot.
    
    Hệ thống gồm:
    - 2 leader arms (người điều khiển)
    - 2 follower arms (thực hiện task)
    - 3 cameras (top, left_wrist, right_wrist)
    """
    robot = ManipulatorRobot(
        robot_type="aloha",
        
        # Leader arms — người điều khiển dùng
        leader_arms={
            "left": FeetechMotorsBus(
                port="/dev/ttyACM0",
                motors={
                    "shoulder_pan": (1, "sts3215"),
                    "shoulder_lift": (2, "sts3215"),
                    "elbow_flex": (3, "sts3215"),
                    "wrist_flex": (4, "sts3215"),
                    "wrist_roll": (5, "sts3215"),
                    "gripper": (6, "sts3215"),
                },
            ),
            "right": FeetechMotorsBus(
                port="/dev/ttyACM1",
                motors={
                    "shoulder_pan": (1, "sts3215"),
                    "shoulder_lift": (2, "sts3215"),
                    "elbow_flex": (3, "sts3215"),
                    "wrist_flex": (4, "sts3215"),
                    "wrist_roll": (5, "sts3215"),
                    "gripper": (6, "sts3215"),
                },
            ),
        },
        
        # Follower arms — thực hiện task
        follower_arms={
            "left": FeetechMotorsBus(
                port="/dev/ttyACM2",
                motors={
                    "shoulder_pan": (1, "sts3215"),
                    "shoulder_lift": (2, "sts3215"),
                    "elbow_flex": (3, "sts3215"),
                    "wrist_flex": (4, "sts3215"),
                    "wrist_roll": (5, "sts3215"),
                    "gripper": (6, "sts3215"),
                },
            ),
            "right": FeetechMotorsBus(
                port="/dev/ttyACM3",
                motors={
                    "shoulder_pan": (1, "sts3215"),
                    "shoulder_lift": (2, "sts3215"),
                    "elbow_flex": (3, "sts3215"),
                    "wrist_flex": (4, "sts3215"),
                    "wrist_roll": (5, "sts3215"),
                    "gripper": (6, "sts3215"),
                },
            ),
        },
        
        # Cameras
        cameras={
            "top": OpenCVCamera(
                camera_index=0,
                fps=30,
                width=640,
                height=480,
            ),
            "left_wrist": OpenCVCamera(
                camera_index=1,
                fps=30,
                width=640,
                height=480,
            ),
            "right_wrist": OpenCVCamera(
                camera_index=2,
                fps=30,
                width=640,
                height=480,
            ),
        },
    )
    
    return robot


# Kết nối robot
robot = setup_dual_arm_robot()
robot.connect()
print("Dual-arm robot connected!")
print(f"Leader arms: {list(robot.leader_arms.keys())}")
print(f"Follower arms: {list(robot.follower_arms.keys())}")
print(f"Cameras: {list(robot.cameras.keys())}")

Calibration: Bước không thể bỏ qua

Calibration là bước quan trọng nhất khi setup dual-arm. Nếu calibration sai, leader và follower sẽ không khớp, dẫn đến dữ liệu kém chất lượng.

Calibration quy trình

def calibrate_dual_arm(robot):
    """Quy trình calibration cho dual-arm system.
    
    Bước 1: Calibrate từng arm riêng biệt
    Bước 2: Calibrate coordinate frame giữa 2 arms
    Bước 3: Calibrate cameras
    """
    
    # === Bước 1: Calibrate từng arm ===
    print("=== Step 1: Individual Arm Calibration ===")
    print("Move each arm to its home position.")
    print("Left leader arm → home position")
    input("Press Enter when ready...")
    
    # Đọc và lưu home position
    left_home = robot.leader_arms["left"].read("Present_Position")
    print(f"Left leader home: {left_home}")
    
    print("Right leader arm → home position")
    input("Press Enter when ready...")
    right_home = robot.leader_arms["right"].read("Present_Position")
    print(f"Right leader home: {right_home}")
    
    # === Bước 2: Leader-Follower mapping ===
    print("\n=== Step 2: Leader-Follower Mapping ===")
    print("Testing leader-follower sync...")
    
    for arm_name in ["left", "right"]:
        leader = robot.leader_arms[arm_name]
        follower = robot.follower_arms[arm_name]
        
        # Đọc leader position
        leader_pos = leader.read("Present_Position")
        
        # Set follower to match
        follower.write("Goal_Position", leader_pos)
        
        # Verify
        import time
        time.sleep(0.5)
        follower_pos = follower.read("Present_Position")
        
        error = np.abs(np.array(leader_pos) - np.array(follower_pos))
        print(f"  {arm_name} arm sync error: {error.mean():.2f} "
              f"(max: {error.max():.2f})")
        
        if error.max() > 5.0:  # > 5 degrees
            print(f"  WARNING: {arm_name} arm has high sync error!")
    
    # === Bước 3: Camera calibration ===
    print("\n=== Step 3: Camera Calibration ===")
    for cam_name, camera in robot.cameras.items():
        frame = camera.read()
        if frame is not None:
            print(f"  {cam_name}: {frame.shape} — OK")
        else:
            print(f"  {cam_name}: FAILED — check connection!")
    
    print("\nCalibration complete!")
    return True


calibrate_dual_arm(robot)

Kiểm tra coordinate frames

import numpy as np

def verify_coordinate_frames(robot):
    """Verify coordinate frames giữa left và right arm.
    
    Hai arms phải chia sẻ cùng coordinate frame (world frame)
    để bimanual coordination hoạt động đúng.
    """
    # Di chuyển cả hai arms đến cùng 1 điểm trong workspace
    test_point = np.array([0.3, 0.0, 0.2])  # Điểm giữa workspace
    
    print("Moving both arms to test point...")
    
    # Forward kinematics → joint angles cho mỗi arm
    left_joints = inverse_kinematics(test_point, arm="left")
    right_joints = inverse_kinematics(test_point, arm="right")
    
    # Set positions
    robot.follower_arms["left"].write("Goal_Position", left_joints)
    robot.follower_arms["right"].write("Goal_Position", right_joints)
    
    import time
    time.sleep(2.0)
    
    # Đọc actual positions
    left_actual = robot.follower_arms["left"].read("Present_Position")
    right_actual = robot.follower_arms["right"].read("Present_Position")
    
    # Forward kinematics → end-effector positions
    left_ee = forward_kinematics(left_actual, arm="left")
    right_ee = forward_kinematics(right_actual, arm="right")
    
    distance = np.linalg.norm(left_ee - right_ee)
    print(f"Left EE: {left_ee}")
    print(f"Right EE: {right_ee}")
    print(f"Distance: {distance*100:.1f} cm")
    
    if distance < 0.02:  # < 2cm
        print("Coordinate frames OK!")
    else:
        print("WARNING: Coordinate frames may be misaligned!")

Action Space Design cho Bimanual

Joint Space vs Task Space

Approach	Action dim	Ưu điểm	Nhược điểm
Independent Joint	2 × (6+1) = 14	Đơn giản, full control	Khó coordinate
Coupled Joint	14 (shared model)	Implicit coordination	Lớn action space
Task Space	2 × 7 = 14	Intuitive (xyz + rpy + grip)	Cần IK solver
Relative	14 (base + delta)	Compact	Phức tạp hơn

class BimanualActionSpace:
    """Thiết kế action space cho dual-arm manipulation.
    
    Mỗi arm: [x, y, z, rx, ry, rz, gripper] = 7D
    Tổng: 14D action space
    """
    
    def __init__(self, mode="independent"):
        self.mode = mode
        self.left_dim = 7   # 6 DOF + gripper
        self.right_dim = 7
        self.total_dim = 14
    
    def split_action(self, action: np.ndarray) -> tuple:
        """Tách action thành left và right components."""
        assert len(action) == self.total_dim
        left_action = action[:self.left_dim]
        right_action = action[self.left_dim:]
        return left_action, right_action
    
    def merge_observations(self, left_obs: dict, right_obs: dict) -> dict:
        """Ghép observations từ 2 arms thành 1."""
        merged = {}
        
        # Joint states
        merged["observation.state"] = np.concatenate([
            left_obs["joint_positions"],    # 6D
            left_obs["gripper_position"],   # 1D
            right_obs["joint_positions"],   # 6D
            right_obs["gripper_position"],  # 1D
        ])  # Total: 14D
        
        # Images từ 3 cameras
        merged["observation.images.top"] = left_obs["top_image"]
        merged["observation.images.left_wrist"] = left_obs["wrist_image"]
        merged["observation.images.right_wrist"] = right_obs["wrist_image"]
        
        return merged
    
    def apply_action(self, robot, action: np.ndarray):
        """Thực thi action lên cả 2 arms đồng thời."""
        left_action, right_action = self.split_action(action)
        
        # QUAN TRỌNG: Gửi commands đồng thời để sync
        robot.follower_arms["left"].write("Goal_Position", left_action[:6])
        robot.follower_arms["right"].write("Goal_Position", right_action[:6])
        
        # Gripper
        robot.follower_arms["left"].write("Goal_Position", 
                                           {"gripper": left_action[6]})
        robot.follower_arms["right"].write("Goal_Position", 
                                            {"gripper": right_action[6]})

Data Collection cho Bimanual

Thu thập dữ liệu cho bimanual tasks khó hơn single-arm vì người điều khiển phải dùng cả 2 tay đồng thời:

def record_bimanual_dataset(robot, repo_id, num_episodes=50, fps=30):
    """Thu thập dataset cho bimanual tasks.
    
    Người dùng điều khiển 2 leader arms đồng thời,
    2 follower arms copy theo real-time.
    """
    from lerobot.common.datasets.lerobot_dataset import LeRobotDataset
    
    dataset = LeRobotDataset.create(
        repo_id=repo_id,
        fps=fps,
        robot_type="aloha",
        features={
            "observation.images.top": {
                "dtype": "video",
                "shape": (480, 640, 3),
                "names": ["height", "width", "channels"],
            },
            "observation.images.left_wrist": {
                "dtype": "video",
                "shape": (480, 640, 3),
                "names": ["height", "width", "channels"],
            },
            "observation.images.right_wrist": {
                "dtype": "video",
                "shape": (480, 640, 3),
                "names": ["height", "width", "channels"],
            },
            "observation.state": {
                "dtype": "float32",
                "shape": (14,),
                "names": ["bimanual_joint_positions"],
            },
            "action": {
                "dtype": "float32",
                "shape": (14,),
                "names": ["bimanual_joint_targets"],
            },
        },
    )
    
    action_space = BimanualActionSpace()
    
    for ep in range(num_episodes):
        print(f"\nEpisode {ep+1}/{num_episodes}")
        print("Move leader arms to start position, then press Enter...")
        input()
        
        step = 0
        recording = True
        
        while recording:
            # Đọc leader positions (observation)
            left_leader = robot.leader_arms["left"].read("Present_Position")
            right_leader = robot.leader_arms["right"].read("Present_Position")
            
            # Đọc follower positions (current state)
            left_follower = robot.follower_arms["left"].read("Present_Position")
            right_follower = robot.follower_arms["right"].read("Present_Position")
            
            # Đọc cameras
            top_image = robot.cameras["top"].read()
            left_wrist_image = robot.cameras["left_wrist"].read()
            right_wrist_image = robot.cameras["right_wrist"].read()
            
            # State = follower positions hiện tại
            state = np.concatenate([left_follower, right_follower])
            
            # Action = leader positions (target cho follower)
            action = np.concatenate([left_leader, right_leader])
            
            # Follower copy leader
            robot.follower_arms["left"].write("Goal_Position", left_leader)
            robot.follower_arms["right"].write("Goal_Position", right_leader)
            
            # Lưu frame
            dataset.add_frame({
                "observation.images.top": top_image,
                "observation.images.left_wrist": left_wrist_image,
                "observation.images.right_wrist": right_wrist_image,
                "observation.state": state,
                "action": action,
            })
            
            step += 1
            
            # Kiểm tra stop condition
            if step > 500:  # Max steps
                recording = False
        
        dataset.save_episode()
        print(f"  Saved episode {ep+1} ({step} frames)")
    
    dataset.push_to_hub()
    return dataset

Mẹo teleop 2 tay

# Tips cho người điều khiển bimanual teleop:
TELEOP_TIPS = {
    "practice_first": """
        Luyện tập không ghi dữ liệu trước. Cần ít nhất 30 phút 
        để quen điều khiển 2 arms đồng thời.
    """,
    "consistent_speed": """
        Giữ tốc độ đều. Không di chuyển quá nhanh — policy khó học
        từ movements quá nhanh (>0.5 rad/s per joint).
    """,
    "one_arm_lead": """
        Cho 1 arm lead (ví dụ: right giữ object), arm còn lại support.
        Tránh di chuyển cả 2 arms cùng lúc trừ khi cần thiết.
    """,
    "pause_between_phases": """
        Dừng ngắn (~0.5s) giữa các phases (approach, grasp, manipulate).
        Giúp policy nhận biết transitions.
    """,
    "filter_episodes": """
        Ghi thêm 20% episodes dự phòng. Lọc bỏ episodes kém chất lượng
        (va chạm, timing sai, etc.) sau khi ghi.
    """,
}

LeRobot Config cho Bimanual Setup

# Config đầy đủ cho training bimanual policy
from lerobot.common.policies.act.configuration_act import ACTConfig

bimanual_act_config = ACTConfig(
    input_shapes={
        "observation.images.top": [3, 480, 640],
        "observation.images.left_wrist": [3, 480, 640],
        "observation.images.right_wrist": [3, 480, 640],
        "observation.state": [14],  # 7 joints per arm x 2
    },
    output_shapes={
        "action": [14],  # 7 targets per arm x 2
    },
    input_normalization_modes={
        "observation.images.top": "mean_std",
        "observation.images.left_wrist": "mean_std",
        "observation.images.right_wrist": "mean_std",
        "observation.state": "min_max",
    },
    output_normalization_modes={
        "action": "min_max",
    },
    
    # ACT cho bimanual — larger model
    chunk_size=100,
    n_action_steps=100,
    dim_model=512,
    n_heads=8,
    n_layers=4,       # Nhiều layers hơn cho coordination
    
    # CVAE
    use_vae=True,
    latent_dim=32,
    kl_weight=10.0,
    
    # Vision
    vision_backbone="resnet18",
    pretrained_backbone=True,
)

print(f"Total input dim: 14 (state) + 3 cameras")
print(f"Total output dim: 14 (bimanual action)")

Papers tham khảo

ALOHA: Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware — Zhao et al., RSS 2023 — Paper gốc của ALOHA system
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation — Fu et al., 2024 — ALOHA + mobile base
ALOHA 2: An Enhanced Low-Cost Hardware for Bimanual Teleoperation — Aldaco et al., 2024 — Phiên bản nâng cấp ALOHA

Kết luận và bước tiếp theo

Dual-arm setup đòi hỏi sự cẩn thận trong calibration và action space design. Key takeaways:

Calibration phải thực hiện kỹ — sai 1 degree = accumulate thành sai lớn
Action space 14D tuy lớn nhưng ACT xử lý tốt nhờ chunking
Teleop 2 tay cần luyện tập — đừng vội thu thập data ngay

Bài tiếp theo — Bimanual Tasks: Folding, Pouring và Assembly — sẽ đưa hệ thống dual-arm vào hoạt động thực tế với các task folding khăn, rót nước, và lắp ráp.