aiark-frameworkpythonimitation-learningactdiffusion-policysim2realros

Ark v1.5: Python Framework for Robot Learning Sim-to-Real

A step-by-step guide to Ark v1.5 — Python-first robot learning framework with a Gym-style API, native ROS, ACT and Diffusion Policy out of the box.

Nguyễn Anh Tuấn8 tháng 5, 202610 phút đọc
Ark v1.5: Python Framework for Robot Learning Sim-to-Real

Have you ever trained an ACT policy in simulation, only to spend a week rewriting half of your pipeline when you tried to move it to a real robot — different ROS messages, different joint encoders, different frame names? That pain point is exactly why Ark exists. In April 2026, the team at Huawei Noah's Ark Lab — together with partners from University College London, University of Oxford, and TU Darmstadt — released Ark v1.5, which they describe as "PyTorch + Gym for robotics".

This article walks through Ark end-to-end: the paper's idea, the client-server architecture, installation, training an ACT policy in simulation, and switching to a real robot by changing a single config flag. If you're already familiar with LeRobot or have read the Imitation Learning series, you'll feel right at home.

Robotic arm in industrial setting

What is Ark and why does it matter in 2026?

Before Ark, building an imitation learning policy for a robot meant gluing together at least four stacks:

  • ROS 2 for hardware interfaces and messaging.
  • PyBullet/MuJoCo/Isaac for simulation (each with its own API).
  • PyTorch for training (with custom dataset formats).
  • Glue code in every direction to bridge these.

The result: students spend 70% of their time writing boilerplate, and only 30% on the research that matters. Worse, "switching from sim to real" usually means rewriting the whole data pipeline because ROS topics and simulator state behave differently.

The paper Ark: An Open-source Python-based Framework for Robot Learning — Dierking et al., 2026 — makes four core contributions:

  1. A unified Gym-style environment — the same env.reset() / env.step(action) API works for both simulation and real hardware.
  2. A lightweight client-server architecture with publisher-subscriber networking that scales across a LAN.
  3. Modular components for control, SLAM, motion planning, system identification, and visualization, all plug-and-play.
  4. Native ROS interoperability — built-in bridges to publish and subscribe to ROS 2 topics without rosbridge.

Version 1.5 (April 2026) also adds optional C/C++ bindings to deliver real-time performance when you need it (for example, a 1kHz control loop on a legged robot).

High-level architecture

Ark has three layers:

┌─────────────────────────────────────────────┐
│  Application Layer                          │
│  (your training script, Gym env, RL agent)  │
└─────────────────────────────────────────────┘
                    ↕ Python API
┌─────────────────────────────────────────────┐
│  Ark Core                                   │
│  - Environment abstraction (sim ↔ real)     │
│  - Pub/Sub messaging                        │
│  - Modules: control, SLAM, planning, ID     │
└─────────────────────────────────────────────┘
                    ↕ Drivers
┌─────────────────────────────────────────────┐
│  Backend Layer                              │
│  - Sim: PyBullet (default), MuJoCo (opt)    │
│  - Real: ROS 2 bridge, custom drivers       │
└─────────────────────────────────────────────┘

The key idea: in your application code you only see the Environment interface. Whether the environment is running in PyBullet or controlling a real ViperX300s, the code is identical. Ark calls this an embodiment-agnostic API — analogous to torch.device('cuda') vs torch.device('cpu') in PyTorch.

The pub/sub messaging design is intentional. Instead of tight coupling like a ROS service call, Ark uses asynchronous topics. Each sensor (camera, IMU, joint encoder) is a publisher; each controller is a subscriber. This buys you:

  • Distributed training: data collection runs on a Jetson Orin near the robot, training runs on a workstation with an RTX 4090 — just configure the IP.
  • Hot-swappable hardware: replace a RealSense with a Zed without changing your training code.
  • Replay: record the pub/sub stream, replay it bit-for-bit during debugging.

Quick comparison with other frameworks

Framework Sim/Real unified Imitation algos ROS bridge Hardware swap
Ark v1.5 ✅ Native ACT, Diffusion Policy Native Easy
LeRobot Partial ACT, Diffusion, π0 External Medium
Isaac Lab Sim only RL focus Limited Sim-only
ROS 2 + PyTorch Manual glue DIY Native Hard

Ark doesn't replace LeRobot — they complement each other. LeRobot is strong on the model zoo and Hugging Face Hub integration; Ark is stronger at sim/real switching and ROS-native deployment. Plenty of teams run both.

Installing Ark v1.5

Hardware requirements: GPU with 8GB+ VRAM (RTX 3060 or better), Ubuntu 22.04 (recommended), Python 3.10. Apple silicon Macs are supported via Python 3.11 + conda PyBullet.

# Workspace
mkdir Ark && cd Ark

# Conda env
conda create -n ark_env python=3.10 -y
conda activate ark_env

# Clone and install ark_framework
git clone https://github.com/Robotics-Ark/ark_framework.git
cd ark_framework
pip install -e .
cd ..

# Install ark_types (message definitions)
git clone https://github.com/Robotics-Ark/ark_types.git
cd ark_types
pip install -e .
cd ..

# Verify
ark --help

For macOS:

conda create -n ark_env python=3.11 -y
conda activate ark_env
git clone https://github.com/Robotics-Ark/ark_framework.git && cd ark_framework
pip install -e .
conda install -c conda-forge pybullet -y

Optional: install MuJoCo if you want stricter physics for contact-rich tasks:

pip install mujoco==3.2.0

For a deeper comparison of the two simulators, see MuJoCo vs Isaac Lab Deep Dive.

Example 1: Push task with ViperX300s

This is the original demo from the Ark paper. You will:

  1. Run a ViperX300s in PyBullet with a block on the table.
  2. Collect 50 demonstrations via keyboard teleop.
  3. Train an ACT policy.
  4. Run inference in sim, then switch to the real robot by changing one config line.

Step 1 — Create the environment

from ark.envs import make_env

# Sim mode
env = make_env(
    "viperx_push-v0",
    backend="pybullet",     # change to "real" when deploying
    render=True,
    obs_modalities=["joint_pos", "rgb"],
)

obs = env.reset(seed=42)
print(obs.keys())  # dict_keys(['joint_pos', 'rgb'])

Step 2 — Collect data via teleop

from ark.teleop import KeyboardTeleop

teleop = KeyboardTeleop(env)
recorder = env.recorder("data/push_demos.zarr")

for episode in range(50):
    obs = env.reset()
    recorder.start_episode()
    done = False
    while not done:
        action = teleop.get_action()      # (joint_pos_delta,)
        obs, reward, done, info = env.step(action)
        recorder.add_step(obs, action, reward)
    recorder.end_episode()

The output is a .zarr folder containing observations, actions, and rewards — the same format as lerobot-dataset, so you can reuse the data across frameworks.

Step 3 — Train an ACT policy

Ark ships with YAML configs for ACT and Diffusion Policy. Read the detailed ACT walkthrough or the Diffusion Policy tutorial if these architectures are new to you.

ark train \
    --config configs/act_viperx_push.yaml \
    --dataset data/push_demos.zarr \
    --output checkpoints/act_push

act_viperx_push.yaml (trimmed):

algo: act
horizon: 16            # action chunk length
hidden_dim: 512
nheads: 8
enc_layers: 4
dec_layers: 6
batch_size: 64
epochs: 200
lr: 1e-4
optimizer: adamw
loss: l1
backbone: resnet18

On an RTX 4090, training 200 epochs on 50 episodes (~5,000 transitions) takes about 2 hours. On an RTX 3060, expect 6–8 hours.

Step 4 — Inference in simulation

from ark.policies import load_policy

policy = load_policy("checkpoints/act_push/best.ckpt")

obs = env.reset()
done = False
while not done:
    action = policy.predict(obs)
    obs, _, done, _ = env.step(action)

Step 5 — Switch to the real robot

This is the magic moment. Create configs/viperx_real.yaml:

backend: ros2
ros_topics:
  joint_pos: /viperx/joint_states
  command:   /viperx/joint_command
  rgb:       /camera/color/image_raw
gripper_topic: /viperx/gripper_command
control_freq: 30

Then:

env = make_env("viperx_push-v0", config="configs/viperx_real.yaml")
# Same observation space, same action space
action = policy.predict(env.reset())

No retraining, no rewriting the observation parser — Ark handles the conversion between simulator state and ROS messages for you. That's 200–500 lines of glue code you no longer have to maintain.

Engineer working with a robot

Example 2: Cloth manipulation with the OpenPyro-A1 humanoid

The second demo in the paper is cloth folding with the OpenPyro-A1 humanoid — a harder task because it needs bimanual coordination and tactile feedback. The pipeline is similar to Example 1 but:

  • The action space grows to 14-D (7 joints per arm).
  • Observations include tactile sensors and stereo RGB.
  • We use Diffusion Policy instead of ACT because the task is multimodal (fold horizontally, fold vertically, fold-in-half).
ark train --config configs/dp_openpyro_cloth.yaml \
          --dataset data/cloth_demos.zarr

The paper reports a 78% success rate on cloth folding after 100 demos and 86% on object handover after 80 demos — solid intermediate-level imitation learning numbers.

Going deeper on ROS 2 integration

Ark doesn't replace ROS — it wraps ROS in idiomatic Python. You can still:

  • Publish topics from outside Ark (Gazebo, MoveIt2, Nav2).
  • Treat an Ark policy as a node in your ROS launch file.
  • Combine Ark with existing ROS controllers (ros2_control hardware interfaces).
from ark.ros import RosBridge

bridge = RosBridge(node_name="ark_act_policy")
bridge.subscribe("/camera/color/image_raw", topic_type="sensor_msgs/Image")
bridge.publish("/arm_controller/follow_joint_trajectory", ...)

If you're new to ROS 2, read ROS 2 Introduction before this section.

Pitfalls and best practices

Bugs I've personally hit while reproducing the demos:

  1. Conda env conflicting with system ROS — always unset LD_LIBRARY_PATH before conda activate ark_env. The ROS 2 source script overrides the environment and forces PyBullet to load the wrong libs.
  2. Camera latency mismatch between sim and real — sim renders are instantaneous, real RealSense adds ~30ms. Train with obs_delay_ms: [0, 50] for robustness.
  3. Joint encoder offset — real-robot encoder zero won't match the URDF. Calibrate with ark calibrate --robot viperx before deployment.
  4. Gripper noise — ACT predicts continuous gripper positions, but the real gripper is binary. Apply a 0.5 threshold and a low-pass filter.
  5. Pub/sub buffer overflow — if you publish at 30Hz but the policy runs at 10Hz, the queue grows unbounded. Set qos_depth=5 on the subscriber.

One important note: Ark v1.5 is still being actively refactored, so APIs may shift slightly between minor releases. Pin the version in your requirements.txt:

git+https://github.com/Robotics-Ark/[email protected]
git+https://github.com/Robotics-Ark/[email protected]

When to use Ark, when not to

Use Ark when:

  • You need to switch sim ↔ real many times during development.
  • Your lab runs many different robots (ViperX, UR5, Franka, humanoid) and needs a shared abstraction.
  • You want to integrate an ML policy into an existing ROS-based stack.
  • You prefer a Python-first developer experience over peak performance.

Skip Ark when:

  • You need a real-time control loop above 1kHz (use C++ ros2_control).
  • You're training RL at massive scale (10k+ parallel envs) — Isaac Lab is the better tool.
  • You already have a working LeRobot pipeline and don't need sim/real switching.
  • Your robot is exotic enough that no Ark driver exists yet — writing the driver may cost more than using raw ROS.

Wrap-up

Ark v1.5 tackles an old, persistent problem: the gap between the machine learning workflow (Python, Gym, datasets) and the robotics workflow (C++, ROS, hardware). It doesn't break ROS or directly compete with LeRobot — it bridges them in a way Vietnamese students can pick up in days rather than months.

If you're writing an imitation learning thesis, especially with a UR5e or ViperX already in the lab and you're drowning in glue code, Ark drastically cuts time-to-result. Start with the push task tutorial, benchmark on your own hardware, and then expand to your custom task.

NT

Nguyễn Anh Tuấn

Robotics & AI Engineer. Building VnRobo — sharing knowledge about robot learning, VLA models, and automation.

Khám phá VnRobo

Bài viết liên quan

NEWTutorial
Booster Gym ICRA 2026: Train Humanoid T1 Sim-to-Real với Isaac Gym
humanoidisaac-gymreinforcement-learningsim2realbooster-t1icra-2026

Booster Gym ICRA 2026: Train Humanoid T1 Sim-to-Real với Isaac Gym

Hướng dẫn chi tiết Booster Gym — RL framework end-to-end open-source train humanoid Booster T1 walking từ teleop đến deploy thực tế.

6/5/202611 phút đọc
Tutorial
Wheeled Lab: train RC car drift sim-to-real với Isaac Lab
isaac-labsim2realwheeled-robotreinforcement-learningrc-cardrift

Wheeled Lab: train RC car drift sim-to-real với Isaac Lab

Hướng dẫn chi tiết Wheeled Lab — open-source ecosystem cho phép train RC car drift, leo dốc, visual nav trong Isaac Lab và deploy thẳng ra phần cứng thật, chi phí dưới $1000.

27/4/202610 phút đọc
Nghiên cứu
AGIBOT WORLD 2026: Dataset Thế Giới Thực Cho Robot Học Bắt Chước
agibotimitation-learningdatasetmanipulationgo-1villaembodied-aihumanoid

AGIBOT WORLD 2026: Dataset Thế Giới Thực Cho Robot Học Bắt Chước

Hướng dẫn thực hành khám phá AGIBOT WORLD 2026 — dataset lớn nhất thế giới thu thập 100% từ môi trường thật, cùng mô hình GO-1 ViLLA và thách thức ICRA 2026.

21/4/202611 phút đọc