wholebody-vlahumanoid-vlaros2mcapfoxglovelerobotrobodmdatasetdata-engineering

ROS 2 MCAP as the Raw Log Standard

Design humanoid VLA raw logs with MCAP: record, writer YAML, replay, Foxglove inspection, and LeRobot/Robo-DM export.

Nguyễn Anh TuấnJune 10, 202615 min read
ROS 2 MCAP as the Raw Log Standard

What this article gives you

In part 1, we split the pilot session into operator and data supervisor roles. In part 2, we chose the teleoperation stack that creates learnable actions. Part 3 handles the layer between the robot and the training dataset: the raw log.

A raw log is the original recording, before you force anything into a training schema. It must answer three questions: what did the robot see, what did the robot do, and when did every signal happen? If the raw log is weak, every later step has to guess. LeRobot can store observation.state, action, camera streams, and timestamp. Robo-DM targets scalable storage and loading for large multimodal robot trajectories. But before you convert into either format, you need a raw recording standard that can be replayed, inspected, and audited. For ROS 2 systems, the pragmatic answer is MCAP through rosbag2_storage_mcap.

If you are still building your mental model of ROS graphs, start with our ROS 2 introduction for robotics. If your next step is operating logs across a deployed fleet, the guide on monitoring ROS 2 robots remotely is a useful companion after the raw-log layer is stable.

This article starts with the basic command:

ros2 bag record -s mcap --all

Then it moves into McapWriterOptions YAML, and finally maps humanoid topics into a dataset schema. By the end, you should have a raw-log design that supports:

Task Tool
Record ROS 2 topics into .mcap ros2 bag record -s mcap
Replay into a ROS graph ros2 bag play -s mcap
Inspect topic, type, and timestamp coverage ros2 bag info -s mcap, mcap info, Foxglove
View cameras, joint state, TF, and actions on one timeline Foxglove
Export into training data A reader that preserves original timestamps and writes LeRobot/Robo-DM

The most important rule is: do not treat MCAP itself as your final training dataset. Treat it as the episode black box. The training schema may change, models may change, and you may add cameras later. The raw log should remain faithful enough that you can always return to it.

Why MCAP fits humanoid VLA data

MCAP is a container for timestamped pub/sub messages with arbitrary serialization. In the MCAP specification, a message record includes channel_id, sequence, log_time, publish_time, and data. For ROS 2, data is usually the CDR payload of the message, while the channel records the topic and message encoding. This is a good match for humanoids because a humanoid episode is not one video stream. It may include a head camera, left and right wrist cameras, joint state, IMU, force/torque, TF, teleop command, retargeted action, safety state, and a language prompt.

rosbag2_storage_mcap is the MCAP storage plugin for rosbag2. The ROS 2 documentation shows that regular rosbag commands work with MCAP by adding --storage mcap or -s mcap: record, play, and info all use the same storage ID. The MCAP docs also note that Foxglove can open MCAP files containing ROS 2 data, while older SQLite .db3 bags are less self-contained for many external tooling workflows.

For a VLA data center, MCAP gives four practical advantages:

Advantage What it means for humanoid collection
More self-contained than .db3 Tools such as Foxglove have more information for decoding messages, especially custom messages
Schema, channel, and message records Exporters can separate sensors, actions, metadata, and quality signals cleanly
log_time and publish_time You can preserve both recorder time and publisher time when available
Chunks, indexes, and compression You can choose between robot-side throughput and post-session query speed

A beginner does not need to learn the binary format first. You can start with ros2 bag record -s mcap, then use ros2 bag play, ros2 bag info, Foxglove, or an MCAP reader library. But you should understand the main options so you do not accidentally record files that are hard to seek, hard to validate, or hard to export.

Start with ros2 bag record -s mcap

Install the plugin first. Replace $ROS_DISTRO with your ROS 2 distribution:

sudo apt-get install ros-$ROS_DISTRO-rosbag2-storage-mcap

Check that storage options are visible:

ros2 bag record --help | rg "storage|mcap"

If rg is not installed, use grep:

ros2 bag record --help | grep -E "storage|mcap"

The simplest recording command is:

mkdir -p ~/humanoid_logs

ros2 bag record -s mcap \
  -o ~/humanoid_logs/episode_000123_raw \
  --all

This is fine for a smoke test, but production recording should not blindly use --all. On a humanoid, --all may capture debug images, heavy point clouds, internal TF topics, high-rate log streams, and temporary test topics. Start with --all during the pilot, then lock the topic list:

ros2 bag record -s mcap \
  -o ~/humanoid_logs/episode_000123_raw \
  /clock \
  /tf \
  /tf_static \
  /robot_description \
  /humanoid/head_camera/image_raw \
  /humanoid/left_wrist_camera/image_raw \
  /humanoid/right_wrist_camera/image_raw \
  /humanoid/joint_states \
  /humanoid/imu \
  /humanoid/teleop/raw_command \
  /humanoid/control/action \
  /humanoid/safety/state \
  /humanoid/task/prompt

Use a stable naming layout from day one:

logs/
  2026-06-10/
    robot_g1_001/
      episode_000123_raw/
        episode_000123_raw_0.mcap
        metadata.yaml
        operator_notes.md
        calibration/

The metadata.yaml inside a rosbag2 folder is rosbag metadata. Do not overload it with all episode annotations. Put annotations beside the bag, or publish them as timestamped topics such as /humanoid/task/prompt, /humanoid/task/result, and /humanoid/operator/event. If annotations live on the timeline, the data supervisor can inspect them in Foxglove together with video and actions.

Replay and inspect before exporting

After recording, run three checks before writing any exporter.

First, inspect the bag:

ros2 bag info -s mcap ~/humanoid_logs/episode_000123_raw

You should see duration, message counts, and the topic list. If the camera topic has no messages or the action topic has zero count, do not export. Fix recording first.

Replay into ROS 2:

ros2 bag play -s mcap ~/humanoid_logs/episode_000123_raw

If you are working with a single .mcap file instead of a rosbag2 folder, some environments are clearer when you specify storage explicitly:

ros2 bag play -s mcap ./episode_000123_raw_0.mcap

One practical caveat: replaying custom messages through ROS 2 still requires the matching message package and type support in the local workspace. MCAP stores schemas that help external tools such as Foxglove, but the ROS 2 player still publishes serialized messages through ROS 2 type support. A data center should therefore version the Docker image or workspace used for collection and replay.

Inspect in Foxglove:

  1. Open Foxglove.
  2. Open a local file or drag the MCAP/folder into the app.
  3. Open Topics and confirm the expected topics exist.
  4. Add Raw Messages for /humanoid/control/action.
  5. Add Image panels for camera topics.
  6. Add Plot panels for joint positions, velocities, action norm, and safety state.
  7. Add a 3D panel if TF and robot model data are present.

Foxglove documentation describes opening MCAP files locally and exploring topics on a timeline. This is exactly why MCAP should be the raw-log standard: the data supervisor can reject broken episodes before any training export finishes.

How to use McapWriterOptions YAML

rosbag2_storage_mcap lets you pass --storage-config-file to configure mcap::McapWriterOptions. A reasonable production starting point is:

# mcap_writer_options.yaml
noChunkCRC: false
noAttachmentCRC: false
enableDataCRC: false
noSummaryCRC: false
noChunking: false
noMessageIndex: false
noSummary: false
chunkSize: 4194304
compression: "Zstd"
compressionLevel: "Fast"
forceCompression: false

Record with the config:

ros2 bag record -s mcap \
  -o ~/humanoid_logs/episode_000123_raw \
  --storage-config-file mcap_writer_options.yaml \
  /tf /tf_static /humanoid/joint_states /humanoid/control/action

The important options:

Option Beginner explanation Suggested default
noChunking If true, the writer does not put records into chunks. This may reduce write overhead, but you lose much of the indexing/compression behavior false for long-term raw logs
chunkSize Target uncompressed chunk size 4 MB is a good starting point
compression None, Lz4, or Zstd Zstd for storage, Lz4 if CPU is tight
compressionLevel Faster writing or smaller files Fast or Fastest on the robot
noChunkCRC Disables chunk CRC checks false when you care about corruption checks
enableDataCRC Enables CRC for the whole data section, useful when not chunking false when chunking is enabled
noMessageIndex Disables message indexes false, because time/topic export needs indexes
noSummary Disables the summary section false, because tooling and seeking benefit from it
forceCompression Compresses every chunk even if not beneficial false

The ROS 2 MCAP docs also provide presets such as fastwrite and zstd_small. fastwrite is optimized for high write throughput by using settings like no chunking and no summary CRC, but the docs warn that it is not ideal as a long-term storage format unless you post-process it, because seeking and reading subsets of topics can be limited. For humanoid data collection, a practical policy is:

Situation Configuration
CPU-constrained robot during safety pilot --storage-preset-profile fastwrite, then convert/compress immediately after collection
Normal production collection YAML with Zstd/Fast, chunking on, indexes on
Archive after QA Convert to zstd_small or compress offline

Example post-session conversion:

# convert_to_archive.yaml
output_bags:
  - uri: episode_000123_archive
    storage_id: mcap
    storage_preset_profile: zstd_small
ros2 bag convert \
  -i ~/humanoid_logs/episode_000123_raw \
  -o convert_to_archive.yaml

Map humanoid topics into dataset schema

Do not start by coding the exporter. Start with a topic table. A good schema says which topics are observations, actions, instructions, metadata, and quality signals.

ROS 2 topic Example message type Role in raw log Suggested dataset field
/humanoid/head_camera/image_raw sensor_msgs/msg/Image Ego/head camera observation.images.head
/humanoid/left_wrist_camera/image_raw sensor_msgs/msg/Image Left wrist camera observation.images.left_wrist
/humanoid/right_wrist_camera/image_raw sensor_msgs/msg/Image Right wrist camera observation.images.right_wrist
/humanoid/joint_states sensor_msgs/msg/JointState Proprioception observation.state
/humanoid/imu sensor_msgs/msg/Imu Base orientation and acceleration observation.imu or part of state
/tf, /tf_static tf2_msgs/msg/TFMessage Frame tree Calibration/metadata, not always direct training input
/humanoid/teleop/raw_command custom/msg Human operator input teleop.raw or debug-only field
/humanoid/control/action custom/msg or Float32MultiArray Actual command sent to the controller action
/humanoid/safety/state custom/msg E-stop, mode, fault episode.quality, filters
/humanoid/task/prompt std_msgs/msg/String Language instruction task, language_instruction
/humanoid/task/event custom/msg Start, success, fail, discard Episode boundaries and labels

The LeRobot v3 documentation describes robot learning datasets with multimodal time-series data, sensorimotor signals, multi-camera video, and metadata. Its API examples expose keys such as observation.state, action, observation.images.front_left, and timestamp. If your goal is to train with LeRobot, you do not need unusual field names. Map your MCAP topics into the common conventions first.

A training frame after export may look like:

sample = {
    "episode_index": 123,
    "frame_index": 42,
    "timestamp": 8.400,  # seconds from episode start
    "observation.state": joint_state_vector,
    "observation.images.head": head_rgb,
    "observation.images.left_wrist": left_wrist_rgb,
    "observation.images.right_wrist": right_wrist_rgb,
    "action": action_vector,
    "task": "Pick up the blue bin and place it on the shelf.",
}

Timestamp policy is where many exporters quietly break. An MCAP message has log_time and publish_time in nanoseconds. A ROS message may also have header.stamp. You need an explicit policy:

Timestamp Use when Risk
header.stamp The sensor driver timestamps capture time correctly Some custom actions do not have a header
MCAP publish_time You want the time the node published the message It may lag capture time after processing
MCAP log_time You want the time the recorder received the message It reflects network and recorder load

Recommended beginner policy:

  1. If a message has a trustworthy header.stamp, use it as semantic time.
  2. Always keep MCAP log_time for debugging recorder delay.
  3. Write timestamp = (t - episode_start_t) / 1e9 when exporting to LeRobot.
  4. Do not modify or resample the raw log. Interpolation belongs in the exporter.

Episode boundaries should be timestamped

A common mistake is to treat one bag folder as one good demonstration. On a humanoid, the operator may spend 10 seconds preparing, testing the gripper, resetting stance, and only then starting the task. You need episode boundaries.

The simplest approach is a timestamped event topic:

/humanoid/task/event
  stamp: 2026-06-10T10:01:02.123Z
  episode_id: "episode_000123"
  event_type: "START" | "SUCCESS" | "FAIL" | "DISCARD" | "RESET"
  note: "left wrist camera bumped at 00:13"

The exporter can cut from START to SUCCESS, ignore preparation and reset time, and mark failed episodes if it sees FAIL or DISCARD. During the pilot, you can publish JSON in std_msgs/msg/String. In production, define a real message type so Foxglove and the exporter can validate fields.

Export to LeRobot without losing time

The minimal pipeline is:

MCAP raw log
  -> reader iterates by topic/time
  -> choose episode window
  -> decode cameras/state/action
  -> align on action or camera clock
  -> write LeRobot Parquet/MP4/metadata

Pseudo-code:

episode_start_ns = find_event("START").timestamp_ns
episode_end_ns = find_event("SUCCESS").timestamp_ns

frames = []
for action_msg in iter_topic("/humanoid/control/action", episode_start_ns, episode_end_ns):
    t_ns = choose_timestamp(action_msg)
    frame = {
        "timestamp": (t_ns - episode_start_ns) / 1e9,
        "observation.state": nearest("/humanoid/joint_states", t_ns),
        "observation.images.head": nearest_frame("/humanoid/head_camera/image_raw", t_ns),
        "observation.images.left_wrist": nearest_frame("/humanoid/left_wrist_camera/image_raw", t_ns),
        "observation.images.right_wrist": nearest_frame("/humanoid/right_wrist_camera/image_raw", t_ns),
        "action": decode_action(action_msg),
        "task": current_prompt(t_ns),
    }
    frames.append(frame)

Here, action is the clock because imitation learning usually learns "observation at time t -> action at time t". If cameras run at 30 FPS and control runs at 20 Hz, choose the nearest camera frame for each action. If the policy trains at 10 Hz, downsample after alignment. Never downsample the MCAP raw log.

Timestamp checklist:

Check Pass condition
Monotonic timestamps timestamp increases within the episode
Camera-action gap Nearest image is within a threshold, for example 50 ms
Header vs log delta Sensor header.stamp is not unexpectedly far from MCAP log_time
Dropped frames Frame indices do not skip excessively
Safety filter Episodes with E-stop or fault are excluded from the default train split

Export to Robo-DM while keeping MCAP as source of truth

The Robo-DM paper frames the larger problem: robot datasets include video, text, numerical modalities, and multiple camera streams, which makes curation, distribution, and loading difficult. Robo-DM proposes an efficient open-source data management toolkit using self-contained EBML-based storage, strong compression, and faster retrieval. The paper also emphasizes preserving original timestamps so alignment does not rely on fragile heuristics.

That does not mean your first pilot must record directly into Robo-DM. For a ROS 2 humanoid stack, a clean architecture is:

Robot ROS 2 graph
  -> immutable MCAP raw log
  -> QA report
  -> Robo-DM trajectory store for large-scale training/retrieval

MCAP keeps the original record for ROS 2 replay and Foxglove inspection. Robo-DM or LeRobot becomes a derived training artifact. When an evaluation run fails, you return to the MCAP file to inspect original action, original camera frames, original TF, and original safety events.

If you are starting as a beginner, use this config:

# pilot_mcap_writer_options.yaml
noChunkCRC: false
noAttachmentCRC: false
enableDataCRC: false
noSummaryCRC: false
noChunking: false
noMessageIndex: false
noSummary: false
chunkSize: 4194304
compression: "Zstd"
compressionLevel: "Fast"
forceCompression: false

Record:

ros2 bag record -s mcap \
  -o ~/humanoid_logs/$(date +%Y%m%d_%H%M%S)_pilot \
  --storage-config-file pilot_mcap_writer_options.yaml \
  /clock /tf /tf_static /robot_description \
  /humanoid/head_camera/image_raw \
  /humanoid/left_wrist_camera/image_raw \
  /humanoid/right_wrist_camera/image_raw \
  /humanoid/joint_states \
  /humanoid/imu \
  /humanoid/teleop/raw_command \
  /humanoid/control/action \
  /humanoid/safety/state \
  /humanoid/task/prompt \
  /humanoid/task/event

After each episode:

ros2 bag info -s mcap ~/humanoid_logs/20260610_100102_pilot
ros2 bag play -s mcap ~/humanoid_logs/20260610_100102_pilot --rate 0.5

Then open the bag in Foxglove and have the data supervisor mark:

Item Pass/Fail
All three cameras are visible throughout the task window
Joint state and action frequency are stable
Prompt matches the task
No E-stop occurred in the training segment
Replay does not fail because of missing message types
Exporter writes monotonically increasing timestamps

Common mistakes

Recording only MP4 video and no action. Video is useful for review, but a policy needs actions. Record the actual action topic sent to the controller.

Recording only normalized actions. Raw logs should preserve physical meaning where possible: radians, meters, Newtons, gripper range, and metadata. Normalization is an exporter step.

Changing topic names between pilot days. If today is /humanoid/control/action and tomorrow is /action, the exporter accumulates special cases. Stabilize naming early.

Disabling indexes to save a small amount of space and forgetting post-processing. A data center needs seeking by time and topic. noMessageIndex: false and noSummary: false are sane defaults.

Skipping start/success/fail events. Without boundaries, you may train on preparation, operator discussion, and reset behavior.

Trusting one timestamp without audit. Keep semantic time from header.stamp when available and recorder time from MCAP. When alignment fails, the difference between those clocks is the clue.

Technical references

Conclusion

MCAP does not replace LeRobot or Robo-DM. It sits before them. Its job is to preserve raw truth: which topics were published, which message types were used, which timestamps were recorded, which actions were sent, which cameras were active, and which safety events occurred. If you design this layer correctly, a humanoid episode can be replayed with ros2 bag play -s mcap, inspected in Foxglove, exported to LeRobot for fast training, or moved into Robo-DM for large-scale management.

In part 4, we will move one step downstream: designing the LeRobot/Robo-DM exporter, train/validation splits, video storage, metadata, and statistics without turning a clean raw log into an opaque training artifact.

NT

Nguyễn Anh Tuấn

Robotics & AI Engineer. Building VnRobo — sharing knowledge about robot learning, VLA models, and automation.

Khám phá VnRobo

Related Posts

LeRobotDataset và Robo-DM cho data lake
wholebody-vla

LeRobotDataset và Robo-DM cho data lake

6/10/202611 min read
NT
Pilot 2 người cho dữ liệu humanoid VLA
wholebody-vla

Pilot 2 người cho dữ liệu humanoid VLA

6/10/202615 min read
NT
Chọn teleoperation stack cho humanoid
wholebody-vla

Chọn teleoperation stack cho humanoid

6/10/202616 min read
NT