humanoidgrailhumanoidunitree-g1retargetingsmpl-xloco-manipulation

Retarget SMPL-X Motions to Unitree G1

Trace GRAIL retargeting: SMPL-X to G1, hand actions, table geometry, lift thresholds, and BPS embeddings.

Nguyễn Anh TuấnJune 7, 202614 min read
Retarget SMPL-X Motions to Unitree G1

In part 3, we obtained 4D HOI output: an SMPL-X human, 6-DoF object poses, contact data, and visualization. In part 4, we separated two data regimes: manipulation with moving objects, and locomotion/sitting with fixed scene geometry. Part 5 is the bridge from "human reconstruction" to "Unitree G1 motion library".

The key point is simple: the downstream policy does not train directly on SMPL-X. SMPL-X is a human body model; Unitree G1 has a different topology, joint limits, hands, legs, body proportions, and kinematic constraints. GRAIL therefore needs a retargeting step that turns human-object trajectories into robot-object trajectories. The central script is grail/retargeting/scripts/retarget_pipeline.sh. It is not one opaque command. It calls three explicit stages: grail.retargeting.retarget, grail.retargeting.process, and grail.retargeting.compute_bps.

Technical sources used for this walkthrough:

Series Roadmap

  1. 3D Assets and Terrain for GRAIL: asset generation, object prompts, sharding, and downstream file contracts.
  2. 2D HOI Videos with Blender and Kling: conditioning renders, camera/depth output, and video foundation model generation.
  3. 4D HOI Reconstruction: GEM, SAM2, MoGe: human pose, object tracking, optimization, filtering, and visualization.
  4. Static Terrain Locomotion: curbs, slopes, stairs, and sitting as static-scene 4D HOI.
  5. Retargeting Trajectories to Unitree G1: this article.
  6. Training and Data Export: packaging demonstrations, training trackers/policies, and preparing sim-to-real data.

For broader whole-body policy context, see Humanoid loco-manipulation and G1 terrain walking with reinforcement learning. Those articles explain why retargeting is not just renaming joints: the data must preserve balance, timing, contact, and hand-object relationships.

What You Will Learn

By the end of this article, you should know:

  • Where valid 4dhoi_recon_valid inputs come from and how they enter retarget_pipeline.sh.
  • Why stage 1 uses grail.retargeting.retarget to convert SMPL-X motion to unitree_g1.
  • How stage 2, grail.retargeting.process, adds hand_action_left, hand_action_right, table geometry, and contact filtering around the lift frame.
  • How --lift_threshold and --grasp_anticipation_frames directly affect hand-close timing.
  • How stage 3, grail.retargeting.compute_bps, creates object shape embeddings under bps/.
  • When to use --zero_out_wrist and when to skip process.sh.

The recommended end-to-end command in the GRAIL docs looks like this:

conda activate sonic
export DISPLAY=:1

bash grail/retargeting/scripts/retarget_pipeline.sh \
  data/genhoi/benchmark_v3/generation/4dhoi_recon_valid/Hunyuan \
  benchmark_v3_0203

The first argument is the folder containing valid reconstructions. The second argument is the motion-library folder name under data/motion_lib/. If the output name is benchmark_v3_0203, the first stage writes:

data/motion_lib/benchmark_v3_0203/
  robot/
  objects/
  object_usd/
  meta/

After processing, GRAIL creates a hand-action version:

data/motion_lib/benchmark_v3_0203_ha/
  robot/
  objects/
  meta/

If the dataset has multiple object USDs, the BPS stage creates:

data/motion_lib/benchmark_v3_0203/
  bps/
    _basis.npy
    <object_name>.npy

Input-Output Map

For beginners, this pipeline is best understood as a format conversion step with a small amount of physical bookkeeping. 4dhoi_recon_valid is the reconstruction output. It still speaks the language of human-object interaction: SMPL-X poses, object poses, object meshes, contact points, and scene metadata. data/motion_lib/<name> is more robot-ready: G1 joint trajectories, object trajectories, USD assets for Isaac Lab, and metadata that task-general trackers can read.

Stage Main input Main output Meaning
retarget 4dhoi_recon_valid/... data/motion_lib/<name>/{robot,objects,object_usd,meta} Convert SMPL-X human poses to Unitree G1 motion and convert object meshes to USD
process data/motion_lib/<name> data/motion_lib/<name>_ha/{robot,objects,meta} Add hand actions, table pose/size, filtered contacts, and per-motion files
compute_bps data/motion_lib/<name>/object_usd/*.usd data/motion_lib/<name>/bps/*.npy Encode object shape as a fixed vector for policy observations

The motion-library name matters. Avoid names like test if you run many batches. Prefer names that capture dataset, date, or config: comasset_pickup_g1_20260607, terrain_v6_zero_wrist, benchmark_v3_0203. When something fails, the folder name will tell you which run it came from.

What retarget_pipeline.sh Actually Does

retarget_pipeline.sh is short, but it encodes the default path:

bash grail/retargeting/scripts/retarget_pipeline.sh \
  <DATA_DIR> \
  <OUTPUT_FOLDER> \
  [extra_retarget_args...]

The arguments are:

  • DATA_DIR: the filtered reconstruction folder, for example data/genhoi/.../generation/4dhoi_recon_valid/Hunyuan.
  • OUTPUT_FOLDER: the folder name under data/motion_lib/, for example benchmark_v3_0203.
  • Remaining arguments are usually forwarded to the retargeting stage.

There is one important exception: --treat_hands_equally or --treat-hands-equally is separated and forwarded to the processing stage, because it controls hand-action post-processing rather than IK retargeting. Other flags, such as --zero_out_wrist, go to grail.retargeting.retarget.

A simplified flow:

# 1. SMPL-X -> Unitree G1
bash grail/retargeting/scripts/retarget.sh \
  "${DATA_DIR}" "${OUTPUT_FOLDER}" "${RETARGET_ARGS[@]}"

# 2. Hand actions + table geometry
bash grail/retargeting/scripts/process.sh \
  "${OUTPUT_FOLDER}" "${PROCESS_ARGS[@]}"

# 3. BPS if multiple object USDs exist
USD_COUNT=$(find "data/motion_lib/${OUTPUT_FOLDER}/object_usd" -name "*.usd" | wc -l)
if [[ "${USD_COUNT}" -gt 1 ]]; then
  bash grail/retargeting/scripts/compute_bps.sh "${OUTPUT_FOLDER}"
fi

The USD_COUNT > 1 condition is intentional. For a single-object dataset, BPS does not add much discriminative information because the object shape is constant across motions. For multi-object datasets such as ComAsset or RoboCasa-style data, the policy needs to know whether it is handling a bottle, box, drill, mug, or another shape. BPS provides that shape signal as a fixed-size observation.

Stage 1: grail.retargeting.retarget

The first stage is called through retarget.sh:

python -m grail.retargeting.retarget \
  --data_dir "${DATA_DIR}" \
  --all \
  --robot unitree_g1 \
  --output_dir "data/motion_lib/${OUTPUT_FOLDER}" \
  --no_viewer

It reads SMPL-X human poses and object trajectories from grail.pipelines.recon_4dhoi output, then creates a G1 motion library. According to the retarget.py docstring, this stage does three main things.

First, it can bake G1 body proportions into the SMPL-X model through G1ProportionSMPLX. This reduces mismatch between the human in the video and the target robot. If the source body is much taller, has much longer arms, or uses proportions unlike G1, the IK solver is forced to chase targets that may not be feasible. Earlier GRAIL stages are designed to keep the character closer to G1 proportions, which makes this stage more reliable than retargeting arbitrary in-the-wild human videos.

Second, it calls GMR through grail.adapters.gmr. GMR is kinematics-based retargeting: it takes motion on the source embodiment, solves inverse kinematics on the target robot, and applies temporal smoothing so the motion is not jittery. For G1, the target is not just hand position. Retargeting must keep the pelvis/root, feet, knees, ankles, torso, and arms in a plausible relationship so a tracker can follow the reference in simulation.

Third, it converts object meshes to USD and attaches object trajectories. This part is easy to underestimate. If you only have robot joint trajectories but no object USD, the policy does not know what shape it is interacting with or where that object lives in the simulator. The object_usd/ output provides Isaac Lab-ready assets, while objects/ stores the object's 6-DoF trajectory over time.

After this stage, each motion typically has matching robot, object, metadata, and asset files:

data/motion_lib/<name>/
  robot/<motion_id>.pkl      # G1 root, pose_aa, dof, root_rot, smpl_joints, fps
  objects/<motion_id>.pkl    # object root_pos, root_quat, contact points, fps
  meta/<motion_id>.pkl       # object_name, table info when available
  object_usd/<object>.usd    # object mesh converted for the simulator

Before moving on, count the files:

find data/motion_lib/benchmark_v3_0203/robot -name "*.pkl" | wc -l
find data/motion_lib/benchmark_v3_0203/objects -name "*.pkl" | wc -l
find data/motion_lib/benchmark_v3_0203/meta -name "*.pkl" | wc -l
find data/motion_lib/benchmark_v3_0203/object_usd -name "*.usd" | wc -l

If robot/ has fewer files than expected, the failure is in retargeting or the input reconstruction. If object_usd/ is empty, mesh conversion failed or the input mesh is not where the stage expects it.

--zero_out_wrist for Terrain and Sitting

Not every dataset needs hand IK. For pickup and tabletop manipulation, wrist pose and hand contact are task-critical. For stairs, curbs, slopes, and many sitting clips, the object or scene is large environment geometry; the hands are not controlling a grasped object. GRAIL's retargeting docs recommend --zero_out_wrist for terrain/sitting data to skip hand IK:

bash grail/retargeting/scripts/retarget.sh \
  data/genhoi/results_terrain_v6/generation/4dhoi_recon_valid/Hunyuan \
  terrain_v6 \
  --zero_out_wrist

For terrain, you usually skip process.sh as well, because you do not need hand_action_left/right. A locomotion policy cares about the root, legs, terrain geometry, and ground contact more than hand open/close commands. If you still run process.sh, it may search for an object lift frame. Terrain does not lift from its initial pose, so --skip_no_lift can reject motions or send you debugging in the wrong direction.

A practical rule:

Dataset Need wrist/hand actions? Recommended path
Pick up object from table Yes Full retarget_pipeline.sh
Pick up object from ground Yes Full retarget_pipeline.sh, inspect lift threshold
Carry or push a moving object Yes, but may need task-specific tuning Retarget + process, inspect contacts carefully
Stairs/curb/slope No retarget.sh --zero_out_wrist, skip process.sh
Sitting on a fixed chair Usually no retarget.sh --zero_out_wrist, consider skipping process.sh

Stage 2: grail.retargeting.process

The processing stage turns retargeted motion into a version with hand actions and cleaned table geometry:

python -m grail.retargeting.process \
  --input data/motion_lib/benchmark_v3_0203 \
  --output data/motion_lib/benchmark_v3_0203_ha \
  --meta_pkl data/g1_smplx/g1_skeleton_meta.pkl \
  --include_contact_points \
  --grasp_from_lift \
  --lift_threshold 0.02 \
  --grasp_anticipation_frames 10 \
  --skip_no_lift \
  --per_object

The _ha suffix means "hand action". In robot/<motion_id>.pkl, this stage adds:

hand_action_left   # shape (T,), usually -1.0 means open and 1.0 means closed
hand_action_right  # shape (T,), usually -1.0 means open and 1.0 means closed

The default legacy path assumes right-hand pickup. It zeroes the left arm into a default pose, keeps hand_action_left open, and closes hand_action_right around the grasp. If you add --treat_hands_equally, the pipeline preserves both arms and derives left/right hand actions symmetrically from each hand's contact timing. This is useful for two-handed interactions or for generated datasets where you cannot assume the right hand always grasps first.

Quick comparison:

Mode Left arm hand_action_left hand_action_right Use case
Default Moved to default pose Open Closes from lift/contact timing Right-hand pickup, legacy path
--treat_hands_equally Preserved Derived from left-hand contact Derived from right-hand contact Bimanual data or uncertain grasping hand

The stage also handles table geometry. If meta/<motion_id>.pkl contains table_pos and table_quat, the _ha/meta/ output keeps table_pos, table_quat, and table_size. The parser default for --table_size is [1.5, 0.7, 0.04] meters. The code also includes logic to adjust the table to reduce lower-body/table intersection in some cases: it uses forward kinematics from skeleton metadata, checks whether lower-body segments intersect the table plane, and shifts the table along the y-axis if needed. As a beginner, you usually do not need to tune this immediately. Just remember that the table is not decorative. It is geometry used by pickup-from-table tasks, so a bad table pose can teach the robot to grasp from a surface that does not match the object.

--lift_threshold: When Is the Object Lifted?

The wrapper default for --lift_threshold is 0.02, or 2 cm. When --grasp_from_lift is enabled, process.py reads object_motion[f]["root_pos"][:, 0, 2], stores the initial z value, and finds the first frame satisfying:

obj_z > initial_z + lift_threshold

That frame becomes lift_frame. If no frame crosses the threshold and --skip_no_lift is enabled, the motion is skipped. That is why logs can say "object never lifted above 0.02m".

A 2 cm threshold is reasonable for pickup because it prevents tiny z-axis noise from being treated as a real lift. But it is not universal. If the object is very small, the reconstruction is slightly low, or the task is sliding rather than lifting, understand the task before lowering the threshold.

Symptom Possible cause Action
Many motions are skipped as "no lift" Object never rises by 2 cm, or reconstructed z is too low Inspect video/reconstruction; try --lift_threshold 0.01 only if the task is truly a light pickup
Hand closes too late Lift frame happens after true contact Increase --grasp_anticipation_frames or use contact timing with --treat_hands_equally
Hand closes too early Contact or lift is detected too early from noise Keep threshold at 0.02, reduce anticipation, inspect contact points
Terrain gets skipped Terrain is not a lifted object Do not run process.sh; use --zero_out_wrist during retargeting

Do not lower --lift_threshold only to get more output files. If the object never leaves the table, the motion may not be a successful pickup. Keeping failed data can teach the policy that the hand closes while the object does not follow, or that the object moves without a reliable grasp.

--grasp_anticipation_frames: Closing Before Lift

A robot cannot wait until the object is already airborne before closing its hand. In pickup data, the hand should close before or near the lift moment. --grasp_anticipation_frames controls this timing. process.sh reads the value from GRAIL_GRASP_ANTICIPATION_FRAMES, with a default of 10.

The simplified logic is:

grasp_start_frame = lift_frame - grasp_anticipation_frames
hand_action_right[grasp_start_frame:] = 1.0

With --treat_hands_equally, each hand gets a separate action based on that hand's contact before lift. If one hand has no contact at or before lift, it can remain open.

At 50 FPS:

grasp_anticipation_frames Approximate time Effect
0 0 ms Closes at the lift frame, often too late
5 100 ms Slightly early close, useful for short motions
10 200 ms Default balanced setting for common pickup
20 400 ms Earlier close, useful for large objects but may create fake grasps

Start with the default 10. Change it only after watching replay or visualization and seeing that the hand command is clearly misaligned with contact.

Stage 3: grail.retargeting.compute_bps

BPS stands for Basis Point Set. The idea from Prokudin and colleagues is to avoid feeding an entire point cloud or mesh into a network. Instead, choose a fixed set of basis points and encode an object by the distance from each basis point to the nearest surface point. GRAIL uses a compact version: by default --num_basis 10, it samples the object surface, centers it at the centroid, normalizes it to a unit bounding sphere, and saves a 10-D distance vector.

The wrapper command is:

python -m grail.retargeting.compute_bps \
  --object_usd_dir data/motion_lib/benchmark_v3_0203/object_usd \
  --output_dir data/motion_lib/benchmark_v3_0203/bps

Output:

data/motion_lib/benchmark_v3_0203/bps/
  _basis.npy
  cordless_drill.npy
  mug.npy
  bottle.npy
  ...

_basis.npy makes the basis reproducible. Each <object>.npy is the shape embedding for that object. During training, a policy can receive object pose, robot state, and BPS embedding to distinguish small, long, round, or box-like objects without parsing USD meshes inside the RL loop.

BPS does not replace collision geometry. The USD is still needed for simulation and contact. BPS is a compact observation for the neural network. If the dataset has only one object, the wrapper skips this stage because shape does not vary across motions.

Debug by Folder

When the pipeline fails, do not read logs randomly. Debug by output folder:

Folder If missing Check
robot/ SMPL-X to G1 retargeting failed Input 4dhoi_recon_valid, GMR installation, --robot unitree_g1, viewer/display
objects/ Object trajectory was not packaged Whether reconstruction contains object pose and the category path is correct
object_usd/ Mesh conversion failed Source mesh, IsaacLab/USD dependencies, output permissions
meta/ Task metadata is missing Reconstruction metadata, object name, table data
<name>_ha/robot/ Processing skipped motions --lift_threshold, --skip_no_lift, contact points
bps/ BPS not created Whether more than one USD exists and whether pxr/usd-core imports correctly

A good smoke test is to run 3-5 motions end-to-end, then inspect pkl keys with joblib:

import joblib

motion = joblib.load("data/motion_lib/benchmark_v3_0203_ha/robot/<motion_id>.pkl")
key = list(motion.keys())[0]
print(motion[key].keys())

obj = joblib.load("data/motion_lib/benchmark_v3_0203_ha/objects/<motion_id>.pkl")
print(obj[key].keys())

For manipulation, the robot motion should contain dof, pose_aa, root_trans_offset, root_rot, fps, and hand_action_left/right. The object motion should contain root_pos, root_quat, fps, and contact points if contact export is enabled.

Conclusion

retarget_pipeline.sh is the most important conversion step before training. It takes 4dhoi_recon_valid reconstruction output and creates the motion-library layout that trackers and policies understand: robot, objects, object_usd, meta, and for multi-object datasets, bps. The retarget stage solves the morphology problem from SMPL-X to Unitree G1. The process stage turns contact/lift timing into hand_action_left/right, adds table geometry, and rejects motions without a real lift. The compute_bps stage turns object shape into a fixed observation embedding.

The key decision is to classify the dataset before running the command. For pickup/manipulation, run the full pipeline and inspect --lift_threshold, --grasp_anticipation_frames, and --treat_hands_equally. For terrain/sitting, use --zero_out_wrist during retargeting and usually skip hand-action processing. In part 6, we will use this motion library to train trackers/policies and prepare data for sim-to-real.

NT

Nguyễn Anh Tuấn

Robotics & AI Engineer. Building VnRobo — sharing knowledge about robot learning, VLA models, and automation.

Khám phá VnRobo

Related Posts

Train SONIC, export và đánh giá GRAIL
humanoid

Train SONIC, export và đánh giá GRAIL

6/7/202614 min read
NT
Tái dựng 4D HOI: GEM, SAM2, MoGe
humanoid

Tái dựng 4D HOI: GEM, SAM2, MoGe

6/7/202616 min read
NT
Tạo asset 3D và terrain cho GRAIL
humanoid

Tạo asset 3D và terrain cho GRAIL

6/7/202616 min read
NT