VnRobo
AboutPricingBlogContact
🇻🇳VISign InStart Free Trial
🇻🇳VI
VnRobo logo

AI infrastructure for next-generation industrial robots.

Product

  • Features
  • Pricing
  • Knowledge Base
  • Services

Company

  • About Us
  • Blog
  • Contact

Legal

  • Privacy Policy
  • Terms of Service

© 2026 VnRobo. All rights reserved.

Made with♥in Vietnam
VnRobo
AboutPricingBlogContact
🇻🇳VISign InStart Free Trial
🇻🇳VI
  1. Home
  2. Blog
  3. unifolm-vla + Unitree G1 (Post 5): deploy inference server, SSH tunnel, and parallel locomotion
humanoidhumanoidvladeployfastapiunitree-g1whole-bodylocomotiontutorial

unifolm-vla + Unitree G1 (Post 5): deploy inference server, SSH tunnel, and parallel locomotion

Final series post: start the FastAPI inference server, connect G1 via SSH tunnel, send action commands, run arm VLA and locomotion simultaneously — with safety checklist and debug guide for common real-hardware failures.

Nguyễn Anh TuấnJune 7, 20267 min readUpdated: Jun 14, 2026
unifolm-vla + Unitree G1 (Post 5): deploy inference server, SSH tunnel, and parallel locomotion

unifolm-vla + Unitree G1 (Post 5): deploy inference server, SSH tunnel, and parallel locomotion

This is the final post of the unifolm-vla + Unitree G1 series. The previous post produced a checkpoint. This post: deploying on a real G1, running inference, and combining parallel locomotion for whole-body behavior.

Before reading: deploying on a real robot always carries risk. Follow the order: sim first → offline test → test with robot in safe mode → full deploy.

Deploy architecture

[Workstation GPU]                [Unitree G1]
                                 
  FastAPI Server                  Robot SDK
  run_real_eval_server.py         (arm control)
  port 8777                  ←→   192.168.123.xxx
       ↑                          
  POST /act                        
  {image, state, instruction}     (leg control)
       ↓                          unitree_rl_gym
  {action: [28 joints]}           motion.pt

Three parallel processes:

  1. VLA inference (workstation): receive image → predict action → send to G1
  2. Arm control (G1): receive joint commands from VLA, execute
  3. Locomotion (G1): run motion.pt policy independently, controls legs

Step 1: Start the inference server

conda activate unifolm
cd ~/unifolm_ws/unifolm-vla

# With Approach A checkpoint (8-GPU full fine-tune)
python run_real_eval_server.py \
  --ckpt_path /home/user/checkpoints/unifolm_g1_pickplace/best_checkpoint.pt \
  --vlm_pretrained_path /home/user/models/Qwen2.5-VL-7B-Instruct \
  --unnorm_key new_embodiment \
  --host 0.0.0.0 \
  --port 8777 \
  --use_bf16

# With Approach B checkpoint (LoRA single-GPU)
python run_real_eval_server.py \
  --ckpt_path /home/user/checkpoints/lora_g1_pickplace \
  --vlm_pretrained_path /home/user/models/Qwen2.5-VL-7B-Instruct \
  --lora_mode true \
  --unnorm_key new_embodiment \
  --host 0.0.0.0 \
  --port 8777 \
  --use_bf16

Expected terminal output when server is ready:

Loading VLM from: /home/user/models/Qwen2.5-VL-7B-Instruct
Loading checkpoint: /home/user/checkpoints/unifolm_g1_pickplace/best_checkpoint.pt
Model loaded. VRAM usage: 14.3 GB / 24 GB
Starting FastAPI server on 0.0.0.0:8777
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8777 (Press CTRL+C to quit)

Verify server is working:

curl -X POST http://localhost:8777/act \
  -H "Content-Type: application/json" \
  -d '{
    "full_image": null,
    "state": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
               0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
    "instruction": "pick up the red cup"
  }'

# Expected response:
# {"action": [0.12, -0.05, 0.33, ...], "timestamp": 1234567890.123}

Step 2: Robot client on G1

G1 has an onboard computer (Jetson Orin NX). The robot client runs there (or on the workstation if connected via LAN).

SSH connection setup

# Test connection to G1 onboard computer
# G1 default IP: 192.168.123.18 (may differ)
ping 192.168.123.18

# SSH into G1 onboard computer
ssh [email protected]
# Default password: "123" or check G1 docs

Robot client script

Create robot_client.py:

"""
Robot client: capture image from G1 camera, send to inference server,
receive action, execute on G1 arm.
"""

import requests
import numpy as np
import time
import base64
import cv2
from unitree_sdk2py.core.channel import ChannelFactory
from unitree_sdk2py.idl.unitree_hg.msg.dds_ import LowCmd_

INFERENCE_SERVER = "http://192.168.1.100:8777"  # workstation IP
TASK_INSTRUCTION = "pick up the red cup"
CONTROL_FREQ_HZ = 5   # ~5Hz, matches VLM inference latency

def encode_image(frame: np.ndarray) -> str:
    _, buffer = cv2.imencode('.jpg', frame, [cv2.IMWRITE_JPEG_QUALITY, 85])
    return base64.b64encode(buffer).decode('utf-8')

def get_joint_states(sdk_interface) -> list:
    state = sdk_interface.get_low_state()
    arm_joints = [state.motor_state[i].q for i in range(13, 27)]
    return arm_joints

def send_action(sdk_interface, actions: list):
    cmd = LowCmd_()
    for i, joint_idx in enumerate(range(13, 27)):   # arm joints only
        if i < len(actions):
            cmd.motor_cmd[joint_idx].q = float(actions[i])
            cmd.motor_cmd[joint_idx].kp = 60.0
            cmd.motor_cmd[joint_idx].kd = 3.0
            cmd.motor_cmd[joint_idx].dq = 0.0
            cmd.motor_cmd[joint_idx].tau = 0.0
    sdk_interface.publish_low_cmd(cmd)

def main():
    ChannelFactory.Instance().Init(0, "eth0")
    cap = cv2.VideoCapture(0)   # left wrist camera
    
    print(f"Starting robot client. Instruction: '{TASK_INSTRUCTION}'")
    
    period = 1.0 / CONTROL_FREQ_HZ
    prev_action = [0.0] * 28

    while True:
        t_start = time.time()
        
        ret, frame = cap.read()
        if not ret:
            continue
        
        joint_states = get_joint_states(sdk_interface=None)
        
        payload = {
            "full_image": encode_image(frame),
            "state": joint_states,
            "instruction": TASK_INSTRUCTION,
        }
        
        try:
            response = requests.post(
                f"{INFERENCE_SERVER}/act",
                json=payload,
                timeout=0.5
            )
            
            if response.status_code == 200:
                actions = response.json()["action"]
                
                # EMA smoothing to reduce jerkiness
                alpha = 0.7
                smoothed = [alpha * a + (1 - alpha) * p
                           for a, p in zip(actions, prev_action)]
                send_action(sdk_interface=None, actions=smoothed)
                prev_action = smoothed
                
        except requests.exceptions.Timeout:
            pass   # keep previous action
        except requests.exceptions.ConnectionError:
            print("Cannot connect to inference server")
        
        elapsed = time.time() - t_start
        sleep_time = period - elapsed
        if sleep_time > 0:
            time.sleep(sleep_time)

if __name__ == "__main__":
    main()

Run robot client:

python robot_client.py

Step 3: Parallel locomotion with motion.pt

While arm VLA is running, locomotion runs in parallel.

Option A: G1 built-in locomotion (simplest)

G1 has a built-in locomotion controller. Use the Unitree App or remote controller to move the robot while arm VLA handles the arms independently.

# Send stand command via SDK so G1 stays in place during arm tasks
python -c "
from unitree_sdk2py.core.channel import ChannelFactory
from unitree_sdk2py.idl.unitree_hg.msg.dds_ import SportModeCmd_

ChannelFactory.Instance().Init(0, 'eth0')
# send stand mode command...
print('G1 standing — arm VLA running in parallel')
"

Option B: unitree_rl_gym motion.pt

For custom locomotion while arms are working:

conda activate loco
cd ~/unifolm_ws/unitree_rl_gym

# Run pretrained motion policy
python legged_gym/scripts/play.py \
  --task g1 \
  --load_run pretrained \
  --checkpoint motion

Key insight — why they don't conflict:

Arm VLA (robot_client.py):
  → Commands joints 13-26 (arm + gripper)
  → Frequency: ~5Hz
  
Locomotion (motion.pt):
  → Commands joints 0-11 (legs) + 12 (waist)
  → Frequency: 200-500Hz

Different joint indices → NO conflict.

Safety checklist before running on real G1

BEFORE STARTING:
[ ] E-stop (emergency stop) tested and within reach
[ ] G1 standing on flat floor, no obstacles within 1m
[ ] Joint position limits set in robot_client.py
[ ] Inference server running and verified with curl test
[ ] Camera connected and stable frame rate (>10 FPS)

WHILE RUNNING:
[ ] Keep eyes on G1 at all times
[ ] Second person ready to E-stop if needed
[ ] Log terminal for debugging

STOP WHEN:
[ ] G1 loses balance or joints oscillate
[ ] Joint torque exceeds threshold
[ ] Arm moves unpredictably (policy hallucination)
[ ] Inference server timeout more than 3 times in a row

Troubleshooting

Server not receiving images (full_image: null)

# Test camera capture separately
import cv2
cap = cv2.VideoCapture(0)
ret, frame = cap.read()
print("Camera OK:", ret, "Frame shape:", frame.shape if ret else None)

Arm moves jerkily

Cause: inconsistent inference latency
Fix: EMA smoothing is already in the client above (alpha=0.7)
     If still jerky, reduce alpha to 0.5 (smoother, slightly less responsive)

Policy doesn't perform the task correctly

Debug checklist in order:

1. Is lighting the same as during training? (different lights = distribution shift)
2. Is object at the right position? (shifted too far = out-of-distribution)
3. Is camera still calibrated correctly? (wrist camera shifted?)
4. Is instruction text identical to training instructions?
5. Is this the best checkpoint? (use the one with lowest val loss)

G1 arm resets to home position after a few seconds

Cause: G1 SDK has a safety timeout — if no command received within N seconds,
       arm returns to home position
Fix: ensure robot_client.py sends commands continuously even when holding still
     (send action = previous_action if timeout)

Full series summary

After 5 posts, you have a complete pipeline:

Post Input Output Tools
1: Architecture — Understanding the system 3 repos
2: Data collection G1 + Meta Quest 3 50+ JSON demos xr_teleoperate
3: Data pipeline JSON demos RLDS dataset unitree_IL_lerobot + unifolm-vla
4: Fine-tune RLDS dataset VLA checkpoint DeepSpeed / LoRA
5: Deploy (this post) Checkpoint + G1 Robot performs task FastAPI + Unitree SDK

Next steps to improve:

  1. More data diversity: 200+ demos with lighting variation and varied object positions
  2. Multi-task: collect data for multiple tasks, train a single model on all of them
  3. Deeper locomotion integration: train a policy that walks to objects before picking up
  4. When Unifolm-VLM-0 becomes public: fine-tune from that checkpoint → ~15-20% better performance

References

  • unifolm-vla GitHub
  • unitree_sdk2_python GitHub
  • unitree_rl_gym GitHub
  • FastAPI documentation

Tool recommendations

VLA train/deploy stack

Train on cloud/workstation, then deploy optimized models to Jetson or the robot computer.

Cloud GPU for VLA / policy training Use for imitation learning, diffusion policies, RL, and robotics model fine-tuning. View cloud GPU → NVIDIA Jetson Orin NX / Orin Nano Edge deployment hardware for perception, logging, and optimized inference. View Jetson → Hugging Face / robotics dataset hosting Host datasets, checkpoints, and model cards for cleaner LeRobot/VLA workflows. View platform →

Related posts

  • Post 4: Fine-tune unifolm-vla
  • Post 1: Full system architecture
  • GR00T N1 + G1: Deploy GEAR+SONIC (alternative approach)
NT

Nguyễn Anh Tuấn

Robotics & AI Engineer. Building VnRobo — sharing knowledge about robot learning, VLA models, and automation.

Khám phá VnRobo

Fleet MonitoringROS 2 IntegrationAMR Solutions
unifolm-vla-g1-series — Phần 5/5
← unifolm-vla + Unitree G1 (Post 4): fine-tuning from Qwen2.5-VL-7B — 8-GPU and single-GPU LoRA

Related Posts

Tutorial
unifolm-vla + Unitree G1 (Bài 1): kiến trúc hệ thống WBC+VLA từ dữ liệu đến robot thật
humanoidvlawhole-bodyPart 1
humanoid

unifolm-vla + Unitree G1 (Bài 1): kiến trúc hệ thống WBC+VLA từ dữ liệu đến robot thật

Tổng quan kiến trúc 3 repo — unifolm-vla (VLA), xr_teleoperate (thu thập dữ liệu), unitree_rl_gym (locomotion) — và cách chúng kết nối thành pipeline whole-body control cho Unitree G1 trên phần cứng thật.

5/31/20267 min read
NT
Tutorial
unifolm-vla + Unitree G1 (Bài 4): fine-tune từ Qwen2.5-VL-7B — 8-GPU và single-GPU LoRA
humanoidvlafine-tuningPart 4
humanoid

unifolm-vla + Unitree G1 (Bài 4): fine-tune từ Qwen2.5-VL-7B — 8-GPU và single-GPU LoRA

Hướng dẫn fine-tune unifolm-vla từ checkpoint công khai Qwen2.5-VL-7B-Instruct (vì Unifolm-VLM-0 chưa public) — gồm approach chính thức 8-GPU DeepSpeed và workaround single-GPU với QLoRA cho beginner.

6/6/20268 min read
NT
Tutorial
GR00T N1 + G1 (Bài 4): deploy GR00T-WBC trên Unitree G1 — GEAR + SONIC
humanoidvlawhole-bodyPart 4
humanoid

GR00T N1 + G1 (Bài 4): deploy GR00T-WBC trên Unitree G1 — GEAR + SONIC

Hướng dẫn deploy GR00T-WBC stack (GEAR upper body 50Hz + SONIC locomotion 200Hz) trên Unitree G1 với checkpoint GR00T N1 đã fine-tune — joint mapping, PD tuning, safety, và adapt cho robot khác.

6/5/20266 min read
NT
VnRobo logo

AI infrastructure for next-generation industrial robots.

Product

  • Features
  • Pricing
  • Knowledge Base
  • Services

Company

  • About Us
  • Blog
  • Contact

Legal

  • Privacy Policy
  • Terms of Service

© 2026 VnRobo. All rights reserved.

Made with♥in Vietnam