unifolm-vla + Unitree G1 (Post 5): deploy inference server, SSH tunnel, and parallel locomotion

This is the final post of the unifolm-vla + Unitree G1 series. The previous post produced a checkpoint. This post: deploying on a real G1, running inference, and combining parallel locomotion for whole-body behavior.

Before reading: deploying on a real robot always carries risk. Follow the order: sim first → offline test → test with robot in safe mode → full deploy.

Deploy architecture

[Workstation GPU]                [Unitree G1]
                                 
  FastAPI Server                  Robot SDK
  run_real_eval_server.py         (arm control)
  port 8777                  ←→   192.168.123.xxx
       ↑                          
  POST /act                        
  {image, state, instruction}     (leg control)
       ↓                          unitree_rl_gym
  {action: [28 joints]}           motion.pt

Three parallel processes:

VLA inference (workstation): receive image → predict action → send to G1
Arm control (G1): receive joint commands from VLA, execute
Locomotion (G1): run motion.pt policy independently, controls legs

Step 1: Start the inference server

conda activate unifolm
cd ~/unifolm_ws/unifolm-vla

# With Approach A checkpoint (8-GPU full fine-tune)
python run_real_eval_server.py \
  --ckpt_path /home/user/checkpoints/unifolm_g1_pickplace/best_checkpoint.pt \
  --vlm_pretrained_path /home/user/models/Qwen2.5-VL-7B-Instruct \
  --unnorm_key new_embodiment \
  --host 0.0.0.0 \
  --port 8777 \
  --use_bf16

# With Approach B checkpoint (LoRA single-GPU)
python run_real_eval_server.py \
  --ckpt_path /home/user/checkpoints/lora_g1_pickplace \
  --vlm_pretrained_path /home/user/models/Qwen2.5-VL-7B-Instruct \
  --lora_mode true \
  --unnorm_key new_embodiment \
  --host 0.0.0.0 \
  --port 8777 \
  --use_bf16

Expected terminal output when server is ready:

Loading VLM from: /home/user/models/Qwen2.5-VL-7B-Instruct
Loading checkpoint: /home/user/checkpoints/unifolm_g1_pickplace/best_checkpoint.pt
Model loaded. VRAM usage: 14.3 GB / 24 GB
Starting FastAPI server on 0.0.0.0:8777
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8777 (Press CTRL+C to quit)

Verify server is working:

curl -X POST http://localhost:8777/act \
  -H "Content-Type: application/json" \
  -d '{
    "full_image": null,
    "state": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
               0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
    "instruction": "pick up the red cup"
  }'

# Expected response:
# {"action": [0.12, -0.05, 0.33, ...], "timestamp": 1234567890.123}

Step 2: Robot client on G1

G1 has an onboard computer (Jetson Orin NX). The robot client runs there (or on the workstation if connected via LAN).

SSH connection setup

# Test connection to G1 onboard computer
# G1 default IP: 192.168.123.18 (may differ)
ping 192.168.123.18

# SSH into G1 onboard computer
ssh [email protected]
# Default password: "123" or check G1 docs

Robot client script

Create robot_client.py:

"""
Robot client: capture image from G1 camera, send to inference server,
receive action, execute on G1 arm.
"""

import requests
import numpy as np
import time
import base64
import cv2
from unitree_sdk2py.core.channel import ChannelFactory
from unitree_sdk2py.idl.unitree_hg.msg.dds_ import LowCmd_

INFERENCE_SERVER = "http://192.168.1.100:8777"  # workstation IP
TASK_INSTRUCTION = "pick up the red cup"
CONTROL_FREQ_HZ = 5   # ~5Hz, matches VLM inference latency

def encode_image(frame: np.ndarray) -> str:
    _, buffer = cv2.imencode('.jpg', frame, [cv2.IMWRITE_JPEG_QUALITY, 85])
    return base64.b64encode(buffer).decode('utf-8')

def get_joint_states(sdk_interface) -> list:
    state = sdk_interface.get_low_state()
    arm_joints = [state.motor_state[i].q for i in range(13, 27)]
    return arm_joints

def send_action(sdk_interface, actions: list):
    cmd = LowCmd_()
    for i, joint_idx in enumerate(range(13, 27)):   # arm joints only
        if i < len(actions):
            cmd.motor_cmd[joint_idx].q = float(actions[i])
            cmd.motor_cmd[joint_idx].kp = 60.0
            cmd.motor_cmd[joint_idx].kd = 3.0
            cmd.motor_cmd[joint_idx].dq = 0.0
            cmd.motor_cmd[joint_idx].tau = 0.0
    sdk_interface.publish_low_cmd(cmd)

def main():
    ChannelFactory.Instance().Init(0, "eth0")
    cap = cv2.VideoCapture(0)   # left wrist camera
    
    print(f"Starting robot client. Instruction: '{TASK_INSTRUCTION}'")
    
    period = 1.0 / CONTROL_FREQ_HZ
    prev_action = [0.0] * 28

    while True:
        t_start = time.time()
        
        ret, frame = cap.read()
        if not ret:
            continue
        
        joint_states = get_joint_states(sdk_interface=None)
        
        payload = {
            "full_image": encode_image(frame),
            "state": joint_states,
            "instruction": TASK_INSTRUCTION,
        }
        
        try:
            response = requests.post(
                f"{INFERENCE_SERVER}/act",
                json=payload,
                timeout=0.5
            )
            
            if response.status_code == 200:
                actions = response.json()["action"]
                
                # EMA smoothing to reduce jerkiness
                alpha = 0.7
                smoothed = [alpha * a + (1 - alpha) * p
                           for a, p in zip(actions, prev_action)]
                send_action(sdk_interface=None, actions=smoothed)
                prev_action = smoothed
                
        except requests.exceptions.Timeout:
            pass   # keep previous action
        except requests.exceptions.ConnectionError:
            print("Cannot connect to inference server")
        
        elapsed = time.time() - t_start
        sleep_time = period - elapsed
        if sleep_time > 0:
            time.sleep(sleep_time)

if __name__ == "__main__":
    main()

Run robot client:

python robot_client.py

Step 3: Parallel locomotion with motion.pt

While arm VLA is running, locomotion runs in parallel.

Option A: G1 built-in locomotion (simplest)

G1 has a built-in locomotion controller. Use the Unitree App or remote controller to move the robot while arm VLA handles the arms independently.

# Send stand command via SDK so G1 stays in place during arm tasks
python -c "
from unitree_sdk2py.core.channel import ChannelFactory
from unitree_sdk2py.idl.unitree_hg.msg.dds_ import SportModeCmd_

ChannelFactory.Instance().Init(0, 'eth0')
# send stand mode command...
print('G1 standing — arm VLA running in parallel')
"

Option B: unitree_rl_gym motion.pt

For custom locomotion while arms are working:

conda activate loco
cd ~/unifolm_ws/unitree_rl_gym

# Run pretrained motion policy
python legged_gym/scripts/play.py \
  --task g1 \
  --load_run pretrained \
  --checkpoint motion

Key insight — why they don't conflict:

Arm VLA (robot_client.py):
  → Commands joints 13-26 (arm + gripper)
  → Frequency: ~5Hz
  
Locomotion (motion.pt):
  → Commands joints 0-11 (legs) + 12 (waist)
  → Frequency: 200-500Hz

Different joint indices → NO conflict.

Safety checklist before running on real G1

BEFORE STARTING:
[ ] E-stop (emergency stop) tested and within reach
[ ] G1 standing on flat floor, no obstacles within 1m
[ ] Joint position limits set in robot_client.py
[ ] Inference server running and verified with curl test
[ ] Camera connected and stable frame rate (>10 FPS)

WHILE RUNNING:
[ ] Keep eyes on G1 at all times
[ ] Second person ready to E-stop if needed
[ ] Log terminal for debugging

STOP WHEN:
[ ] G1 loses balance or joints oscillate
[ ] Joint torque exceeds threshold
[ ] Arm moves unpredictably (policy hallucination)
[ ] Inference server timeout more than 3 times in a row

Troubleshooting

Server not receiving images (full_image: null)

# Test camera capture separately
import cv2
cap = cv2.VideoCapture(0)
ret, frame = cap.read()
print("Camera OK:", ret, "Frame shape:", frame.shape if ret else None)

Arm moves jerkily

Cause: inconsistent inference latency
Fix: EMA smoothing is already in the client above (alpha=0.7)
     If still jerky, reduce alpha to 0.5 (smoother, slightly less responsive)

Policy doesn't perform the task correctly

Debug checklist in order:

1. Is lighting the same as during training? (different lights = distribution shift)
2. Is object at the right position? (shifted too far = out-of-distribution)
3. Is camera still calibrated correctly? (wrist camera shifted?)
4. Is instruction text identical to training instructions?
5. Is this the best checkpoint? (use the one with lowest val loss)

G1 arm resets to home position after a few seconds

Cause: G1 SDK has a safety timeout — if no command received within N seconds,
       arm returns to home position
Fix: ensure robot_client.py sends commands continuously even when holding still
     (send action = previous_action if timeout)

Full series summary

After 5 posts, you have a complete pipeline:

Post	Input	Output	Tools
1: Architecture	—	Understanding the system	3 repos
2: Data collection	G1 + Meta Quest 3	50+ JSON demos	xr_teleoperate
3: Data pipeline	JSON demos	RLDS dataset	unitree_IL_lerobot + unifolm-vla
4: Fine-tune	RLDS dataset	VLA checkpoint	DeepSpeed / LoRA
5: Deploy (this post)	Checkpoint + G1	Robot performs task	FastAPI + Unitree SDK

Next steps to improve:

More data diversity: 200+ demos with lighting variation and varied object positions
Multi-task: collect data for multiple tasks, train a single model on all of them
Deeper locomotion integration: train a policy that walks to objects before picking up
When Unifolm-VLM-0 becomes public: fine-tune from that checkpoint → ~15-20% better performance

References

unifolm-vla + Unitree G1 (Post 5): deploy inference server, SSH tunnel, and parallel locomotion

Before reading: deploying on a real robot always carries risk. Follow the order: sim first → offline test → test with robot in safe mode → full deploy.

Deploy architecture

[Workstation GPU]                [Unitree G1]
                                 
  FastAPI Server                  Robot SDK
  run_real_eval_server.py         (arm control)
  port 8777                  ←→   192.168.123.xxx
       ↑                          
  POST /act                        
  {image, state, instruction}     (leg control)
       ↓                          unitree_rl_gym
  {action: [28 joints]}           motion.pt

Three parallel processes:

VLA inference (workstation): receive image → predict action → send to G1
Arm control (G1): receive joint commands from VLA, execute
Locomotion (G1): run motion.pt policy independently, controls legs

Step 1: Start the inference server

conda activate unifolm
cd ~/unifolm_ws/unifolm-vla

# With Approach A checkpoint (8-GPU full fine-tune)
python run_real_eval_server.py \
  --ckpt_path /home/user/checkpoints/unifolm_g1_pickplace/best_checkpoint.pt \
  --vlm_pretrained_path /home/user/models/Qwen2.5-VL-7B-Instruct \
  --unnorm_key new_embodiment \
  --host 0.0.0.0 \
  --port 8777 \
  --use_bf16

# With Approach B checkpoint (LoRA single-GPU)
python run_real_eval_server.py \
  --ckpt_path /home/user/checkpoints/lora_g1_pickplace \
  --vlm_pretrained_path /home/user/models/Qwen2.5-VL-7B-Instruct \
  --lora_mode true \
  --unnorm_key new_embodiment \
  --host 0.0.0.0 \
  --port 8777 \
  --use_bf16

Expected terminal output when server is ready:

Loading VLM from: /home/user/models/Qwen2.5-VL-7B-Instruct
Loading checkpoint: /home/user/checkpoints/unifolm_g1_pickplace/best_checkpoint.pt
Model loaded. VRAM usage: 14.3 GB / 24 GB
Starting FastAPI server on 0.0.0.0:8777
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8777 (Press CTRL+C to quit)

Verify server is working:

curl -X POST http://localhost:8777/act \
  -H "Content-Type: application/json" \
  -d '{
    "full_image": null,
    "state": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
               0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
    "instruction": "pick up the red cup"
  }'

# Expected response:
# {"action": [0.12, -0.05, 0.33, ...], "timestamp": 1234567890.123}

Step 2: Robot client on G1

G1 has an onboard computer (Jetson Orin NX). The robot client runs there (or on the workstation if connected via LAN).

SSH connection setup

# Test connection to G1 onboard computer
# G1 default IP: 192.168.123.18 (may differ)
ping 192.168.123.18

# SSH into G1 onboard computer
ssh [email protected]
# Default password: "123" or check G1 docs

Robot client script

Create robot_client.py:

"""
Robot client: capture image from G1 camera, send to inference server,
receive action, execute on G1 arm.
"""

import requests
import numpy as np
import time
import base64
import cv2
from unitree_sdk2py.core.channel import ChannelFactory
from unitree_sdk2py.idl.unitree_hg.msg.dds_ import LowCmd_

INFERENCE_SERVER = "http://192.168.1.100:8777"  # workstation IP
TASK_INSTRUCTION = "pick up the red cup"
CONTROL_FREQ_HZ = 5   # ~5Hz, matches VLM inference latency

def encode_image(frame: np.ndarray) -> str:
    _, buffer = cv2.imencode('.jpg', frame, [cv2.IMWRITE_JPEG_QUALITY, 85])
    return base64.b64encode(buffer).decode('utf-8')

def get_joint_states(sdk_interface) -> list:
    state = sdk_interface.get_low_state()
    arm_joints = [state.motor_state[i].q for i in range(13, 27)]
    return arm_joints

def send_action(sdk_interface, actions: list):
    cmd = LowCmd_()
    for i, joint_idx in enumerate(range(13, 27)):   # arm joints only
        if i < len(actions):
            cmd.motor_cmd[joint_idx].q = float(actions[i])
            cmd.motor_cmd[joint_idx].kp = 60.0
            cmd.motor_cmd[joint_idx].kd = 3.0
            cmd.motor_cmd[joint_idx].dq = 0.0
            cmd.motor_cmd[joint_idx].tau = 0.0
    sdk_interface.publish_low_cmd(cmd)

def main():
    ChannelFactory.Instance().Init(0, "eth0")
    cap = cv2.VideoCapture(0)   # left wrist camera
    
    print(f"Starting robot client. Instruction: '{TASK_INSTRUCTION}'")
    
    period = 1.0 / CONTROL_FREQ_HZ
    prev_action = [0.0] * 28

    while True:
        t_start = time.time()
        
        ret, frame = cap.read()
        if not ret:
            continue
        
        joint_states = get_joint_states(sdk_interface=None)
        
        payload = {
            "full_image": encode_image(frame),
            "state": joint_states,
            "instruction": TASK_INSTRUCTION,
        }
        
        try:
            response = requests.post(
                f"{INFERENCE_SERVER}/act",
                json=payload,
                timeout=0.5
            )
            
            if response.status_code == 200:
                actions = response.json()["action"]
                
                # EMA smoothing to reduce jerkiness
                alpha = 0.7
                smoothed = [alpha * a + (1 - alpha) * p
                           for a, p in zip(actions, prev_action)]
                send_action(sdk_interface=None, actions=smoothed)
                prev_action = smoothed
                
        except requests.exceptions.Timeout:
            pass   # keep previous action
        except requests.exceptions.ConnectionError:
            print("Cannot connect to inference server")
        
        elapsed = time.time() - t_start
        sleep_time = period - elapsed
        if sleep_time > 0:
            time.sleep(sleep_time)

if __name__ == "__main__":
    main()

Run robot client:

python robot_client.py

Step 3: Parallel locomotion with motion.pt

While arm VLA is running, locomotion runs in parallel.

Option A: G1 built-in locomotion (simplest)

G1 has a built-in locomotion controller. Use the Unitree App or remote controller to move the robot while arm VLA handles the arms independently.

# Send stand command via SDK so G1 stays in place during arm tasks
python -c "
from unitree_sdk2py.core.channel import ChannelFactory
from unitree_sdk2py.idl.unitree_hg.msg.dds_ import SportModeCmd_

ChannelFactory.Instance().Init(0, 'eth0')
# send stand mode command...
print('G1 standing — arm VLA running in parallel')
"

Option B: unitree_rl_gym motion.pt

For custom locomotion while arms are working:

conda activate loco
cd ~/unifolm_ws/unitree_rl_gym

# Run pretrained motion policy
python legged_gym/scripts/play.py \
  --task g1 \
  --load_run pretrained \
  --checkpoint motion

Key insight — why they don't conflict:

Arm VLA (robot_client.py):
  → Commands joints 13-26 (arm + gripper)
  → Frequency: ~5Hz
  
Locomotion (motion.pt):
  → Commands joints 0-11 (legs) + 12 (waist)
  → Frequency: 200-500Hz

Different joint indices → NO conflict.

Safety checklist before running on real G1

BEFORE STARTING:
[ ] E-stop (emergency stop) tested and within reach
[ ] G1 standing on flat floor, no obstacles within 1m
[ ] Joint position limits set in robot_client.py
[ ] Inference server running and verified with curl test
[ ] Camera connected and stable frame rate (>10 FPS)

WHILE RUNNING:
[ ] Keep eyes on G1 at all times
[ ] Second person ready to E-stop if needed
[ ] Log terminal for debugging

STOP WHEN:
[ ] G1 loses balance or joints oscillate
[ ] Joint torque exceeds threshold
[ ] Arm moves unpredictably (policy hallucination)
[ ] Inference server timeout more than 3 times in a row

Troubleshooting

Server not receiving images (full_image: null)

# Test camera capture separately
import cv2
cap = cv2.VideoCapture(0)
ret, frame = cap.read()
print("Camera OK:", ret, "Frame shape:", frame.shape if ret else None)

Arm moves jerkily

Cause: inconsistent inference latency
Fix: EMA smoothing is already in the client above (alpha=0.7)
     If still jerky, reduce alpha to 0.5 (smoother, slightly less responsive)

Policy doesn't perform the task correctly

Debug checklist in order:

1. Is lighting the same as during training? (different lights = distribution shift)
2. Is object at the right position? (shifted too far = out-of-distribution)
3. Is camera still calibrated correctly? (wrist camera shifted?)
4. Is instruction text identical to training instructions?
5. Is this the best checkpoint? (use the one with lowest val loss)

G1 arm resets to home position after a few seconds

Cause: G1 SDK has a safety timeout — if no command received within N seconds,
       arm returns to home position
Fix: ensure robot_client.py sends commands continuously even when holding still
     (send action = previous_action if timeout)

Full series summary

After 5 posts, you have a complete pipeline:

Post	Input	Output	Tools
1: Architecture	—	Understanding the system	3 repos
2: Data collection	G1 + Meta Quest 3	50+ JSON demos	xr_teleoperate
3: Data pipeline	JSON demos	RLDS dataset	unitree_IL_lerobot + unifolm-vla
4: Fine-tune	RLDS dataset	VLA checkpoint	DeepSpeed / LoRA
5: Deploy (this post)	Checkpoint + G1	Robot performs task	FastAPI + Unitree SDK

Next steps to improve:

More data diversity: 200+ demos with lighting variation and varied object positions
Multi-task: collect data for multiple tasks, train a single model on all of them
Deeper locomotion integration: train a policy that walks to objects before picking up
When Unifolm-VLM-0 becomes public: fine-tune from that checkpoint → ~15-20% better performance

unifolm-vla + Unitree G1 (Post 5): deploy inference server, SSH tunnel, and parallel locomotion

Deploy architecture

Step 1: Start the inference server

Step 2: Robot client on G1

SSH connection setup

Robot client script

Step 3: Parallel locomotion with motion.pt

Option A: G1 built-in locomotion (simplest)

Option B: unitree_rl_gym motion.pt

Safety checklist before running on real G1

Troubleshooting

Server not receiving images (full_image: null)

Arm moves jerkily

Policy doesn't perform the task correctly

G1 arm resets to home position after a few seconds

Full series summary

References

Related posts

Nguyễn Anh Tuấn

Related Posts

unifolm-vla + Unitree G1 (Bài 1): kiến trúc hệ thống WBC+VLA từ dữ liệu đến robot thật

unifolm-vla + Unitree G1 (Bài 4): fine-tune từ Qwen2.5-VL-7B — 8-GPU và single-GPU LoRA

GR00T N1 + G1 (Bài 4): deploy GR00T-WBC trên Unitree G1 — GEAR + SONIC

unifolm-vla + Unitree G1 (Post 5): deploy inference server, SSH tunnel, and parallel locomotion

Deploy architecture

Step 1: Start the inference server

Step 2: Robot client on G1

SSH connection setup

Robot client script

Step 3: Parallel locomotion with motion.pt

Option A: G1 built-in locomotion (simplest)

Option B: unitree_rl_gym motion.pt

Safety checklist before running on real G1

Troubleshooting

Server not receiving images (full_image: null)

Arm moves jerkily

Policy doesn't perform the task correctly

G1 arm resets to home position after a few seconds

Full series summary

References

Related posts

Nguyễn Anh Tuấn

Related Posts

unifolm-vla + Unitree G1 (Bài 1): kiến trúc hệ thống WBC+VLA từ dữ liệu đến robot thật

unifolm-vla + Unitree G1 (Bài 4): fine-tune từ Qwen2.5-VL-7B — 8-GPU và single-GPU LoRA

GR00T N1 + G1 (Bài 4): deploy GR00T-WBC trên Unitree G1 — GEAR + SONIC