Humanoid Manipulation: Whole-body Control with LeRobot

Introduction: The Pinnacle of Manipulation

Throughout this series, we've progressed from single-arm (post 3), through dual-arm (posts 6-7), to mobile manipulation (post 8). Each step added complexity — more DOF, more coordination, more perception.

This post takes us to the pinnacle: humanoid robots. With 20-50+ DOF, two dexterous hands, bipedal locomotion, and human-like form — humanoid robots are the ultimate manipulation platform and also the greatest challenge in robotics.

Humanoid robot for manipulation

Why Humanoid for Manipulation?

Advantage	Explanation
Human environment compatible	The world is designed for humans — doors, tables, handrails
Dexterous hands	5 fingers x 4 DOF = 20 DOF per hand
Large workspace	Entire body is the workspace — large reach
Whole-body dynamics	Use momentum, gravity to assist manipulation
Transfer from human data	Can learn from real human video

Challenge	Description
Extreme DOF	30-50+ joints to control simultaneously
Balance	Must maintain balance during manipulation
Contact-rich	Both feet and hands contact the environment
Compute	Inference must be real-time on edge devices

High-DOF Action Spaces

Anatomy of a Humanoid Action Space

import numpy as np
from dataclasses import dataclass

@dataclass
class HumanoidActionSpace:
    """Action space for humanoid robot manipulation.
    
    Example: Unitree H1
    - Base: 6 DOF (floating base controlled by legs)
    - Left leg: 6 DOF
    - Right leg: 6 DOF
    - Torso: 3 DOF (waist yaw, pitch, roll)
    - Left arm: 7 DOF (shoulder 3 + elbow 1 + wrist 3)
    - Right arm: 7 DOF
    - Left hand: 12 DOF (Inspire hand)
    - Right hand: 12 DOF
    Total: ~59 DOF
    """
    
    leg_dof: int = 12
    torso_dof: int = 3
    arm_dof: int = 14
    hand_dof: int = 24
    
    @property
    def total_dof(self):
        return self.leg_dof + self.torso_dof + self.arm_dof + self.hand_dof
    
    @property
    def manipulation_dof(self):
        """DOF directly related to manipulation."""
        return self.torso_dof + self.arm_dof + self.hand_dof
    
    def split_action(self, action):
        """Split action into joint groups."""
        idx = 0
        legs = action[idx:idx + self.leg_dof]; idx += self.leg_dof
        torso = action[idx:idx + self.torso_dof]; idx += self.torso_dof
        arms = action[idx:idx + self.arm_dof]; idx += self.arm_dof
        hands = action[idx:idx + self.hand_dof]; idx += self.hand_dof
        
        return {
            "legs": legs,
            "torso": torso,
            "left_arm": arms[:7],
            "right_arm": arms[7:],
            "left_hand": hands[:12],
            "right_hand": hands[12:],
        }


action_space = HumanoidActionSpace()
print(f"Total DOF: {action_space.total_dof}")
print(f"Manipulation DOF: {action_space.manipulation_dof}")

Reducing Dimensionality for Learning

class ActionSpaceReduction:
    """Reduce humanoid action space for easier training.
    
    Strategies:
    1. Fixed legs: Stand still, train only upper body
    2. Coupled joints: Group related joints
    3. Latent actions: VAE encode action space
    4. Primitive-based: Compose from motion primitives
    """
    
    def __init__(self, mode="fixed_legs"):
        self.mode = mode
    
    def reduce(self, full_action):
        """Reduce action space."""
        parts = HumanoidActionSpace().split_action(full_action)
        
        if self.mode == "fixed_legs":
            return np.concatenate([
                parts["torso"],
                parts["left_arm"],
                parts["right_arm"],
                parts["left_hand"][:6],
                parts["right_hand"][:6],
            ])
            # Total: 29 DOF (reduced from 53)
        
        elif self.mode == "coupled_fingers":
            left_hand_simple = np.array([
                parts["left_hand"][:4].mean(),
                parts["left_hand"][4:8].mean(),
                parts["left_hand"][8:12].mean(),
            ])
            right_hand_simple = np.array([
                parts["right_hand"][:4].mean(),
                parts["right_hand"][4:8].mean(),
                parts["right_hand"][8:12].mean(),
            ])
            
            return np.concatenate([
                parts["torso"],
                parts["left_arm"],
                parts["right_arm"],
                left_hand_simple,
                right_hand_simple,
            ])
            # Total: 23 DOF
        
        return full_action

Observation Design for Humanoid

Egocentric Cameras

class HumanoidObservation:
    """Observation design for humanoid manipulation.
    
    Cameras:
    - Head camera: Bird's eye view, navigation
    - Chest camera: Mid-range manipulation view
    - Left/Right wrist cameras: Close-up manipulation
    
    Proprioception:
    - Joint positions (all joints)
    - Joint velocities
    - IMU data (body orientation)
    - Contact forces (feet + hands)
    """
    
    def __init__(self):
        self.camera_configs = {
            "head": {"resolution": (480, 640), "fov": 90, "purpose": "Navigation + overview"},
            "chest": {"resolution": (480, 640), "fov": 70, "purpose": "Mid-range manipulation"},
            "left_wrist": {"resolution": (480, 640), "fov": 90, "purpose": "Left hand close-up"},
            "right_wrist": {"resolution": (480, 640), "fov": 90, "purpose": "Right hand close-up"},
        }
    
    def get_observation(self, robot) -> dict:
        """Collect observation from humanoid robot."""
        obs = {}
        
        for cam_name, config in self.camera_configs.items():
            obs[f"observation.images.{cam_name}"] = robot.get_camera_image(
                cam_name, config["resolution"]
            )
        
        obs["observation.state"] = np.concatenate([
            robot.get_joint_positions(),
            robot.get_joint_velocities(),
            robot.get_imu_data(),
            robot.get_contact_forces(),
        ])
        
        return obs
    
    @property
    def state_dim(self):
        return 53 + 53 + 4 + 8  # positions + velocities + IMU + contacts = 118D

Humanoid robot egocentric view

Humanoid Foundation Models

GR00T-Inspired Architecture

class GR00TInspired:
    """Architecture inspired by NVIDIA GR00T N1.5.
    
    Components:
    - Vision encoder: Pretrained ViT
    - Language encoder: CLIP text encoder  
    - Action head: Diffusion-based action generation
    - Dual-system architecture:
      * System 1 (fast): Reactive policy, low latency
      * System 2 (slow): Deliberative planning
    
    Reference: https://arxiv.org/abs/2503.14734
    """
    
    def __init__(self):
        self.vision_encoder = None
        self.language_encoder = None
        self.action_head = None
    
    def forward(self, images, language_instruction, proprioception):
        """
        Args:
            images: Dict of camera images
            language_instruction: Text instruction
            proprioception: Joint states + IMU
        
        Returns:
            action: Full-body action (53D+)
        """
        visual_tokens = []
        for cam_name, image in images.items():
            tokens = self.vision_encoder(image)
            visual_tokens.append(tokens)
        
        lang_tokens = self.language_encoder(language_instruction)
        all_tokens = torch.cat(visual_tokens + [lang_tokens], dim=1)
        
        action = self.action_head.sample(
            condition=all_tokens,
            proprioception=proprioception,
        )
        
        return action

Training Humanoid Manipulation with LeRobot

from lerobot.common.policies.act.configuration_act import ACTConfig
from lerobot.common.policies.act.modeling_act import ACTPolicy

def create_humanoid_manip_policy():
    """Create policy for humanoid manipulation.
    
    Strategy: Fixed legs, train only upper body (29 DOF)
    """
    config = ACTConfig(
        input_shapes={
            "observation.images.head": [3, 480, 640],
            "observation.images.left_wrist": [3, 480, 640],
            "observation.images.right_wrist": [3, 480, 640],
            "observation.state": [70],
        },
        output_shapes={"action": [29]},
        input_normalization_modes={
            "observation.images.head": "mean_std",
            "observation.images.left_wrist": "mean_std",
            "observation.images.right_wrist": "mean_std",
            "observation.state": "min_max",
        },
        output_normalization_modes={"action": "min_max"},
        
        chunk_size=50,
        n_action_steps=50,
        dim_model=768,
        n_heads=12,
        n_layers=6,
        use_vae=True,
        latent_dim=64,
        kl_weight=10.0,
        vision_backbone="resnet18",
        pretrained_backbone=True,
    )
    
    policy = ACTPolicy(config)
    print(f"Parameters: {sum(p.numel() for p in policy.parameters()):,}")
    return policy

Simulation with Unitree H1/G1

import mujoco

class UnitreeH1Sim:
    """Simulation for Unitree H1 humanoid in MuJoCo.
    
    H1 specs:
    - Height: 1.8m
    - Weight: 47kg
    - DOF: 19 (legs 10 + waist 1 + arms 8)
    - Payload: 3kg per arm
    """
    
    def __init__(self, xml_path="unitree_h1/scene.xml"):
        self.model = mujoco.MjModel.from_xml_path(xml_path)
        self.data = mujoco.MjData(self.model)
        
        self.leg_joints = list(range(10))
        self.waist_joint = [10]
        self.arm_joints = list(range(11, 19))
    
    def reset(self, standing=True):
        """Reset robot to standing or custom position."""
        mujoco.mj_resetData(self.model, self.data)
        
        if standing:
            self.data.qpos[0:3] = [0, 0, 1.0]
            self.data.qpos[3:7] = [1, 0, 0, 0]
        
        mujoco.mj_forward(self.model, self.data)
        return self._get_obs()
    
    def step(self, action):
        """Execute action."""
        self.data.ctrl[:] = action
        
        for _ in range(4):
            mujoco.mj_step(self.model, self.data)
        
        obs = self._get_obs()
        reward = self._compute_reward()
        done = self._check_done()
        
        return obs, reward, done, {}
    
    def _check_done(self):
        """Check if robot has fallen."""
        head_height = self.data.xpos[self.model.body("head").id][2]
        return head_height < 0.5


env = UnitreeH1Sim()
obs = env.reset(standing=True)
policy = create_humanoid_manip_policy()

for step in range(500):
    action = policy.predict(obs)
    obs, reward, done, info = env.step(action)
    if done:
        print(f"Fell down at step {step}!")
        break

Balance During Manipulation

class BalanceAwarePolicy:
    """Policy that maintains balance during manipulation.
    
    Uses ZMP (Zero Moment Point) to ensure stability.
    """
    
    def __init__(self, manip_policy, balance_controller):
        self.manip_policy = manip_policy
        self.balance_controller = balance_controller
        self.stability_threshold = 0.05
    
    def act(self, observation):
        """Generate action considering balance."""
        manip_action = self.manip_policy.predict(observation)
        
        predicted_zmp = self.balance_controller.predict_zmp(
            observation["qpos"],
            observation["qvel"],
            manip_action,
        )
        
        support_polygon = self.balance_controller.get_support_polygon(
            observation["contact_forces"]
        )
        
        if self.balance_controller.is_stable(predicted_zmp, support_polygon):
            leg_action = self.balance_controller.maintain_balance(observation)
        else:
            print("WARNING: Adjusting for balance!")
            manip_action *= 0.5
            leg_action = self.balance_controller.recover_balance(observation)
        
        full_action = np.concatenate([leg_action, manip_action])
        return full_action

Reference Papers

HumanPlus: Humanoid Shadowing and Imitation from Humans — Fu et al., 2024 — Human-to-humanoid transfer learning
GR00T N1: Open Foundation Model for Humanoid Robots — NVIDIA, 2024 — Foundation model for humanoid
Humanoid-Gym: Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real Transfer — 2024 — Sim2real for humanoid locomotion

Conclusion and Next Steps

Humanoid manipulation is the frontier of robotics research. Key insights:

Action space reduction is critical — don't train all 50+ DOF at once
Balance + manipulation must be coordinated — humanoid falling = complete failure
Foundation models (GR00T, HumanPlus) are the future — pretrain on large data, fine-tune for specific tasks

The final post in this series — Sim-to-Real Transfer: Deploying VLA Policies to Real Robots — will bring everything we've learned into the real world.

Mobile Manipulation — Foundation before humanoid
Humanoid Locomotion + Manipulation — Humanoid loco-manip overview
VLA Models for Robots — Foundation models for robots

Introduction: The Pinnacle of Manipulation

Humanoid robot for manipulation

Why Humanoid for Manipulation?

Advantage	Explanation
Human environment compatible	The world is designed for humans — doors, tables, handrails
Dexterous hands	5 fingers x 4 DOF = 20 DOF per hand
Large workspace	Entire body is the workspace — large reach
Whole-body dynamics	Use momentum, gravity to assist manipulation
Transfer from human data	Can learn from real human video

Challenge	Description
Extreme DOF	30-50+ joints to control simultaneously
Balance	Must maintain balance during manipulation
Contact-rich	Both feet and hands contact the environment
Compute	Inference must be real-time on edge devices

High-DOF Action Spaces

Anatomy of a Humanoid Action Space

import numpy as np
from dataclasses import dataclass

@dataclass
class HumanoidActionSpace:
    """Action space for humanoid robot manipulation.
    
    Example: Unitree H1
    - Base: 6 DOF (floating base controlled by legs)
    - Left leg: 6 DOF
    - Right leg: 6 DOF
    - Torso: 3 DOF (waist yaw, pitch, roll)
    - Left arm: 7 DOF (shoulder 3 + elbow 1 + wrist 3)
    - Right arm: 7 DOF
    - Left hand: 12 DOF (Inspire hand)
    - Right hand: 12 DOF
    Total: ~59 DOF
    """
    
    leg_dof: int = 12
    torso_dof: int = 3
    arm_dof: int = 14
    hand_dof: int = 24
    
    @property
    def total_dof(self):
        return self.leg_dof + self.torso_dof + self.arm_dof + self.hand_dof
    
    @property
    def manipulation_dof(self):
        """DOF directly related to manipulation."""
        return self.torso_dof + self.arm_dof + self.hand_dof
    
    def split_action(self, action):
        """Split action into joint groups."""
        idx = 0
        legs = action[idx:idx + self.leg_dof]; idx += self.leg_dof
        torso = action[idx:idx + self.torso_dof]; idx += self.torso_dof
        arms = action[idx:idx + self.arm_dof]; idx += self.arm_dof
        hands = action[idx:idx + self.hand_dof]; idx += self.hand_dof
        
        return {
            "legs": legs,
            "torso": torso,
            "left_arm": arms[:7],
            "right_arm": arms[7:],
            "left_hand": hands[:12],
            "right_hand": hands[12:],
        }


action_space = HumanoidActionSpace()
print(f"Total DOF: {action_space.total_dof}")
print(f"Manipulation DOF: {action_space.manipulation_dof}")

Reducing Dimensionality for Learning

class ActionSpaceReduction:
    """Reduce humanoid action space for easier training.
    
    Strategies:
    1. Fixed legs: Stand still, train only upper body
    2. Coupled joints: Group related joints
    3. Latent actions: VAE encode action space
    4. Primitive-based: Compose from motion primitives
    """
    
    def __init__(self, mode="fixed_legs"):
        self.mode = mode
    
    def reduce(self, full_action):
        """Reduce action space."""
        parts = HumanoidActionSpace().split_action(full_action)
        
        if self.mode == "fixed_legs":
            return np.concatenate([
                parts["torso"],
                parts["left_arm"],
                parts["right_arm"],
                parts["left_hand"][:6],
                parts["right_hand"][:6],
            ])
            # Total: 29 DOF (reduced from 53)
        
        elif self.mode == "coupled_fingers":
            left_hand_simple = np.array([
                parts["left_hand"][:4].mean(),
                parts["left_hand"][4:8].mean(),
                parts["left_hand"][8:12].mean(),
            ])
            right_hand_simple = np.array([
                parts["right_hand"][:4].mean(),
                parts["right_hand"][4:8].mean(),
                parts["right_hand"][8:12].mean(),
            ])
            
            return np.concatenate([
                parts["torso"],
                parts["left_arm"],
                parts["right_arm"],
                left_hand_simple,
                right_hand_simple,
            ])
            # Total: 23 DOF
        
        return full_action

Observation Design for Humanoid

Egocentric Cameras

class HumanoidObservation:
    """Observation design for humanoid manipulation.
    
    Cameras:
    - Head camera: Bird's eye view, navigation
    - Chest camera: Mid-range manipulation view
    - Left/Right wrist cameras: Close-up manipulation
    
    Proprioception:
    - Joint positions (all joints)
    - Joint velocities
    - IMU data (body orientation)
    - Contact forces (feet + hands)
    """
    
    def __init__(self):
        self.camera_configs = {
            "head": {"resolution": (480, 640), "fov": 90, "purpose": "Navigation + overview"},
            "chest": {"resolution": (480, 640), "fov": 70, "purpose": "Mid-range manipulation"},
            "left_wrist": {"resolution": (480, 640), "fov": 90, "purpose": "Left hand close-up"},
            "right_wrist": {"resolution": (480, 640), "fov": 90, "purpose": "Right hand close-up"},
        }
    
    def get_observation(self, robot) -> dict:
        """Collect observation from humanoid robot."""
        obs = {}
        
        for cam_name, config in self.camera_configs.items():
            obs[f"observation.images.{cam_name}"] = robot.get_camera_image(
                cam_name, config["resolution"]
            )
        
        obs["observation.state"] = np.concatenate([
            robot.get_joint_positions(),
            robot.get_joint_velocities(),
            robot.get_imu_data(),
            robot.get_contact_forces(),
        ])
        
        return obs
    
    @property
    def state_dim(self):
        return 53 + 53 + 4 + 8  # positions + velocities + IMU + contacts = 118D

Humanoid robot egocentric view

Humanoid Foundation Models

GR00T-Inspired Architecture

class GR00TInspired:
    """Architecture inspired by NVIDIA GR00T N1.5.
    
    Components:
    - Vision encoder: Pretrained ViT
    - Language encoder: CLIP text encoder  
    - Action head: Diffusion-based action generation
    - Dual-system architecture:
      * System 1 (fast): Reactive policy, low latency
      * System 2 (slow): Deliberative planning
    
    Reference: https://arxiv.org/abs/2503.14734
    """
    
    def __init__(self):
        self.vision_encoder = None
        self.language_encoder = None
        self.action_head = None
    
    def forward(self, images, language_instruction, proprioception):
        """
        Args:
            images: Dict of camera images
            language_instruction: Text instruction
            proprioception: Joint states + IMU
        
        Returns:
            action: Full-body action (53D+)
        """
        visual_tokens = []
        for cam_name, image in images.items():
            tokens = self.vision_encoder(image)
            visual_tokens.append(tokens)
        
        lang_tokens = self.language_encoder(language_instruction)
        all_tokens = torch.cat(visual_tokens + [lang_tokens], dim=1)
        
        action = self.action_head.sample(
            condition=all_tokens,
            proprioception=proprioception,
        )
        
        return action

Training Humanoid Manipulation with LeRobot

from lerobot.common.policies.act.configuration_act import ACTConfig
from lerobot.common.policies.act.modeling_act import ACTPolicy

def create_humanoid_manip_policy():
    """Create policy for humanoid manipulation.
    
    Strategy: Fixed legs, train only upper body (29 DOF)
    """
    config = ACTConfig(
        input_shapes={
            "observation.images.head": [3, 480, 640],
            "observation.images.left_wrist": [3, 480, 640],
            "observation.images.right_wrist": [3, 480, 640],
            "observation.state": [70],
        },
        output_shapes={"action": [29]},
        input_normalization_modes={
            "observation.images.head": "mean_std",
            "observation.images.left_wrist": "mean_std",
            "observation.images.right_wrist": "mean_std",
            "observation.state": "min_max",
        },
        output_normalization_modes={"action": "min_max"},
        
        chunk_size=50,
        n_action_steps=50,
        dim_model=768,
        n_heads=12,
        n_layers=6,
        use_vae=True,
        latent_dim=64,
        kl_weight=10.0,
        vision_backbone="resnet18",
        pretrained_backbone=True,
    )
    
    policy = ACTPolicy(config)
    print(f"Parameters: {sum(p.numel() for p in policy.parameters()):,}")
    return policy

Simulation with Unitree H1/G1

import mujoco

class UnitreeH1Sim:
    """Simulation for Unitree H1 humanoid in MuJoCo.
    
    H1 specs:
    - Height: 1.8m
    - Weight: 47kg
    - DOF: 19 (legs 10 + waist 1 + arms 8)
    - Payload: 3kg per arm
    """
    
    def __init__(self, xml_path="unitree_h1/scene.xml"):
        self.model = mujoco.MjModel.from_xml_path(xml_path)
        self.data = mujoco.MjData(self.model)
        
        self.leg_joints = list(range(10))
        self.waist_joint = [10]
        self.arm_joints = list(range(11, 19))
    
    def reset(self, standing=True):
        """Reset robot to standing or custom position."""
        mujoco.mj_resetData(self.model, self.data)
        
        if standing:
            self.data.qpos[0:3] = [0, 0, 1.0]
            self.data.qpos[3:7] = [1, 0, 0, 0]
        
        mujoco.mj_forward(self.model, self.data)
        return self._get_obs()
    
    def step(self, action):
        """Execute action."""
        self.data.ctrl[:] = action
        
        for _ in range(4):
            mujoco.mj_step(self.model, self.data)
        
        obs = self._get_obs()
        reward = self._compute_reward()
        done = self._check_done()
        
        return obs, reward, done, {}
    
    def _check_done(self):
        """Check if robot has fallen."""
        head_height = self.data.xpos[self.model.body("head").id][2]
        return head_height < 0.5


env = UnitreeH1Sim()
obs = env.reset(standing=True)
policy = create_humanoid_manip_policy()

for step in range(500):
    action = policy.predict(obs)
    obs, reward, done, info = env.step(action)
    if done:
        print(f"Fell down at step {step}!")
        break

Balance During Manipulation

class BalanceAwarePolicy:
    """Policy that maintains balance during manipulation.
    
    Uses ZMP (Zero Moment Point) to ensure stability.
    """
    
    def __init__(self, manip_policy, balance_controller):
        self.manip_policy = manip_policy
        self.balance_controller = balance_controller
        self.stability_threshold = 0.05
    
    def act(self, observation):
        """Generate action considering balance."""
        manip_action = self.manip_policy.predict(observation)
        
        predicted_zmp = self.balance_controller.predict_zmp(
            observation["qpos"],
            observation["qvel"],
            manip_action,
        )
        
        support_polygon = self.balance_controller.get_support_polygon(
            observation["contact_forces"]
        )
        
        if self.balance_controller.is_stable(predicted_zmp, support_polygon):
            leg_action = self.balance_controller.maintain_balance(observation)
        else:
            print("WARNING: Adjusting for balance!")
            manip_action *= 0.5
            leg_action = self.balance_controller.recover_balance(observation)
        
        full_action = np.concatenate([leg_action, manip_action])
        return full_action

Reference Papers

HumanPlus: Humanoid Shadowing and Imitation from Humans — Fu et al., 2024 — Human-to-humanoid transfer learning
GR00T N1: Open Foundation Model for Humanoid Robots — NVIDIA, 2024 — Foundation model for humanoid
Humanoid-Gym: Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real Transfer — 2024 — Sim2real for humanoid locomotion

Conclusion and Next Steps

Humanoid manipulation is the frontier of robotics research. Key insights:

Action space reduction is critical — don't train all 50+ DOF at once
Balance + manipulation must be coordinated — humanoid falling = complete failure
Foundation models (GR00T, HumanPlus) are the future — pretrain on large data, fine-tune for specific tasks

The final post in this series — Sim-to-Real Transfer: Deploying VLA Policies to Real Robots — will bring everything we've learned into the real world.

Mobile Manipulation — Foundation before humanoid
Humanoid Locomotion + Manipulation — Humanoid loco-manip overview
VLA Models for Robots — Foundation models for robots

Humanoid Manipulation: Whole-body Control with LeRobot

Introduction: The Pinnacle of Manipulation

Why Humanoid for Manipulation?

High-DOF Action Spaces

Anatomy of a Humanoid Action Space

Reducing Dimensionality for Learning

Observation Design for Humanoid

Egocentric Cameras

Humanoid Foundation Models

GR00T-Inspired Architecture

Training Humanoid Manipulation with LeRobot

Simulation with Unitree H1/G1

Balance During Manipulation

Reference Papers

Conclusion and Next Steps

Nguyễn Anh Tuấn

Related Posts

LeRobot v0.5: Pi0-FAST + G1 Whole-Body Control

SARM trong LeRobot: Reward Model cho VLA

Làm synthetic data cho GR00T VLA

Humanoid Manipulation: Whole-body Control with LeRobot

Introduction: The Pinnacle of Manipulation

Why Humanoid for Manipulation?

High-DOF Action Spaces

Anatomy of a Humanoid Action Space

Reducing Dimensionality for Learning

Observation Design for Humanoid

Egocentric Cameras

Humanoid Foundation Models

GR00T-Inspired Architecture

Training Humanoid Manipulation with LeRobot

Simulation with Unitree H1/G1

Balance During Manipulation

Reference Papers

Conclusion and Next Steps

Nguyễn Anh Tuấn

Related Posts

LeRobot v0.5: Pi0-FAST + G1 Whole-Body Control

SARM trong LeRobot: Reward Model cho VLA

Làm synthetic data cho GR00T VLA