Deploy YOLOv8 to Jetson Orin Nano in 30 Minutes

Deploy YOLOv8 to Jetson Orin Nano — Why Does It Matter?

Deploying YOLOv8 to Jetson Orin Nano is the critical step for bringing AI perception from your laptop into real robots. If you are building mobile robots, drones, or quality inspection systems, you need real-time inference directly on the edge device — you cannot afford to wait for round-trip latency to the cloud at 200ms per frame.

In this tutorial, I will walk you through everything from A to Z: hardware setup, model export, TensorRT optimization, building a camera pipeline with GStreamer, and finally wrapping it as a ROS 2 node ready to run on your robot.

NVIDIA Jetson Orin Nano developer kit for AI perception robotics

Prerequisites — What Do You Need?

Hardware

Device	Notes
NVIDIA Jetson Orin Nano 8GB	Developer Kit or module + carrier board
MicroSD 64GB+ (UHS-I)	Or NVMe SSD for better performance
USB camera or CSI camera	Logitech C920 or IMX219 CSI
5V/4A USB-C power supply	Included with Dev Kit

Software

JetPack 6.0+ (L4T R36.x) — includes CUDA 12.2, cuDNN 8.9, TensorRT 8.6
Python 3.10+
Ultralytics (YOLOv8 framework)

# Install JetPack (if not flashed)
sudo apt update && sudo apt install -y nvidia-jetpack

# Verify CUDA
nvcc --version
# cuda_12.2.r12.2/compiler.33191640_0

# Install Ultralytics
pip3 install ultralytics

# Verify PyTorch + CUDA
python3 -c "import torch; print(f'CUDA: {torch.cuda.is_available()}, Device: {torch.cuda.get_device_name(0)}')"
# CUDA: True, Device: Orin (nvgpu)

Step 1: Export YOLOv8 to TensorRT Engine

This is the most critical step. TensorRT will optimize the model for the Ampere GPU on Orin Nano, including layer fusion, kernel auto-tuning, and precision calibration.

Export FP16 (Recommended)

from ultralytics import YOLO

# Load pretrained model (or your custom trained model)
model = YOLO("yolov8n.pt")  # nano version, optimized for edge

# Export to TensorRT FP16
# This process takes 5-10 minutes on Orin Nano
model.export(
    format="engine",
    device=0,
    half=True,        # FP16 precision
    imgsz=640,        # Input resolution
    workspace=4,      # GB workspace for TensorRT optimizer
    verbose=True
)
print("Export done! File: yolov8n.engine")

Export INT8 (Maximum FPS)

INT8 requires a calibration dataset — a set of 200-500 images representative of real-world data:

from ultralytics import YOLO

model = YOLO("yolov8n.pt")

# Export INT8 with calibration
model.export(
    format="engine",
    device=0,
    int8=True,
    data="coco128.yaml",  # Dataset for calibration
    imgsz=640,
    workspace=4
)

Important Note: The .engine file is built specific to each GPU architecture. An engine built on Orin Nano will not run on RTX 4090 and vice versa. Always export directly on your target device.

Step 2: Benchmark — FP32 vs FP16 vs INT8

I benchmarked on Jetson Orin Nano 8GB, power mode MAXN (15W), with jetson_clocks enabled:

Format	Model	FPS (640x640)	Latency	mAP@50	GPU Memory
PyTorch FP32	yolov8n	15 FPS	67ms	37.3	1.2GB
TensorRT FP32	yolov8n	32 FPS	31ms	37.3	680MB
TensorRT FP16	yolov8n	58 FPS	17ms	37.2	420MB
TensorRT INT8	yolov8n	72 FPS	14ms	36.8	310MB
PyTorch FP32	yolov8s	8 FPS	125ms	44.9	2.1GB
TensorRT FP16	yolov8s	35 FPS	29ms	44.8	780MB

Conclusion: FP16 is the sweet spot — nearly 4x speedup over PyTorch with almost no accuracy loss. Use INT8 only when you need >60 FPS or running multiple models simultaneously.

# Quick benchmark on Jetson
yolo benchmark model=yolov8n.engine imgsz=640 half=True device=0

Step 3: Complete Python Inference Script

"""
YOLOv8 TensorRT Inference on Jetson Orin Nano
Optimized for real-time robot perception
"""
import cv2
import time
import numpy as np
from ultralytics import YOLO
from collections import deque

class JetsonDetector:
    def __init__(self, model_path: str = "yolov8n.engine", conf: float = 0.5):
        self.model = YOLO(model_path)
        self.conf = conf
        self.fps_buffer = deque(maxlen=30)

    def detect(self, frame: np.ndarray) -> list[dict]:
        """Detect objects and return list of detections."""
        start = time.perf_counter()

        results = self.model.predict(
            frame,
            conf=self.conf,
            verbose=False,
            half=True
        )

        elapsed = time.perf_counter() - start
        self.fps_buffer.append(1.0 / elapsed)

        detections = []
        for box in results[0].boxes:
            det = {
                "class_id": int(box.cls[0]),
                "class_name": results[0].names[int(box.cls[0])],
                "confidence": float(box.conf[0]),
                "bbox": box.xyxy[0].cpu().numpy().tolist(),  # [x1, y1, x2, y2]
            }
            detections.append(det)

        return detections

    @property
    def fps(self) -> float:
        return np.mean(self.fps_buffer) if self.fps_buffer else 0.0


# --- Main ---
if __name__ == "__main__":
    detector = JetsonDetector("yolov8n.engine", conf=0.45)
    cap = cv2.VideoCapture(0)
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)

    while True:
        ret, frame = cap.read()
        if not ret:
            break

        detections = detector.detect(frame)

        # Draw bounding boxes
        for det in detections:
            x1, y1, x2, y2 = map(int, det["bbox"])
            label = f'{det["class_name"]} {det["confidence"]:.2f}'
            cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
            cv2.putText(frame, label, (x1, y1 - 10),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 0), 2)

        # FPS overlay
        cv2.putText(frame, f"FPS: {detector.fps:.1f}", (10, 30),
                     cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)

        cv2.imshow("YOLOv8 Jetson", frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

    cap.release()
    cv2.destroyAllWindows()

Step 4: Camera Pipeline with GStreamer

USB cameras work through OpenCV, but CSI cameras need a GStreamer pipeline for hardware-accelerated decoding on Jetson. This is also the optimal approach for USB cameras.

CSI Camera (IMX219/IMX477)

def gstreamer_pipeline(
    sensor_id=0,
    capture_width=1280,
    capture_height=720,
    display_width=1280,
    display_height=720,
    framerate=60,
    flip_method=0,
):
    """Build GStreamer pipeline string for CSI camera on Jetson."""
    return (
        f"nvarguscamerasrc sensor-id={sensor_id} ! "
        f"video/x-raw(memory:NVMM), width=(int){capture_width}, "
        f"height=(int){capture_height}, framerate=(fraction){framerate}/1 ! "
        f"nvvidconv flip-method={flip_method} ! "
        f"video/x-raw, width=(int){display_width}, height=(int){display_height}, "
        f"format=(string)BGRx ! videoconvert ! "
        f"video/x-raw, format=(string)BGR ! appsink drop=1"
    )

# Usage
cap = cv2.VideoCapture(gstreamer_pipeline(), cv2.CAP_GSTREAMER)

USB Camera (Optimized with V4L2)

def usb_gstreamer_pipeline(device="/dev/video0", width=1280, height=720, fps=30):
    """GStreamer pipeline for USB camera, decode on GPU."""
    return (
        f"v4l2src device={device} ! "
        f"video/x-raw, width={width}, height={height}, framerate={fps}/1 ! "
        f"videoconvert ! video/x-raw, format=BGR ! appsink drop=1"
    )

cap = cv2.VideoCapture(usb_gstreamer_pipeline(), cv2.CAP_GSTREAMER)

Multi-camera Setup

# Run 2 cameras in parallel
cap_left = cv2.VideoCapture(gstreamer_pipeline(sensor_id=0), cv2.CAP_GSTREAMER)
cap_right = cv2.VideoCapture(gstreamer_pipeline(sensor_id=1), cv2.CAP_GSTREAMER)

# Or use threading to avoid blocking
import threading

class CameraThread:
    def __init__(self, pipeline):
        self.cap = cv2.VideoCapture(pipeline, cv2.CAP_GSTREAMER)
        self.frame = None
        self.running = True
        self.thread = threading.Thread(target=self._update, daemon=True)
        self.thread.start()

    def _update(self):
        while self.running:
            ret, self.frame = self.cap.read()

    def read(self):
        return self.frame

    def stop(self):
        self.running = False
        self.cap.release()

Camera pipeline AI perception for mobile robots

Step 5: Deploy as ROS 2 Node

This is the final step — wrap the detector as a ROS 2 node to integrate into your robot stack.

"""
ROS 2 YOLOv8 Detection Node for Jetson Orin Nano
Publish detections to /yolo/detections topic
"""
import rclpy
from rclpy.node import Node
from sensor_msgs.msg import Image
from vision_msgs.msg import Detection2DArray, Detection2D, ObjectHypothesisWithPose
from cv_bridge import CvBridge
import cv2
import numpy as np
from ultralytics import YOLO


class YoloDetectionNode(Node):
    def __init__(self):
        super().__init__("yolo_detection_node")

        # Parameters
        self.declare_parameter("model_path", "yolov8n.engine")
        self.declare_parameter("confidence", 0.5)
        self.declare_parameter("input_topic", "/camera/image_raw")

        model_path = self.get_parameter("model_path").value
        self.conf = self.get_parameter("confidence").value
        input_topic = self.get_parameter("input_topic").value

        # Load TensorRT model
        self.model = YOLO(model_path)
        self.bridge = CvBridge()

        # Subscriber & Publisher
        self.sub = self.create_subscription(
            Image, input_topic, self.image_callback, 10
        )
        self.pub = self.create_publisher(Detection2DArray, "/yolo/detections", 10)
        self.img_pub = self.create_publisher(Image, "/yolo/annotated", 10)

        self.get_logger().info(
            f"YOLOv8 node started — model: {model_path}, conf: {self.conf}"
        )

    def image_callback(self, msg: Image):
        # Convert ROS Image -> OpenCV
        frame = self.bridge.imgmsg_to_cv2(msg, "bgr8")

        # Inference
        results = self.model.predict(frame, conf=self.conf, verbose=False, half=True)

        # Build Detection2DArray message
        det_array = Detection2DArray()
        det_array.header = msg.header

        for box in results[0].boxes:
            det = Detection2D()
            x1, y1, x2, y2 = box.xyxy[0].cpu().numpy()

            # Bounding box (center + size format)
            det.bbox.center.position.x = float((x1 + x2) / 2)
            det.bbox.center.position.y = float((y1 + y2) / 2)
            det.bbox.size_x = float(x2 - x1)
            det.bbox.size_y = float(y2 - y1)

            # Classification result
            hyp = ObjectHypothesisWithPose()
            hyp.hypothesis.class_id = str(int(box.cls[0]))
            hyp.hypothesis.score = float(box.conf[0])
            det.results.append(hyp)

            det_array.detections.append(det)

        self.pub.publish(det_array)

        # Publish annotated image (optional, for visualization)
        annotated = results[0].plot()
        self.img_pub.publish(self.bridge.cv2_to_imgmsg(annotated, "bgr8"))


def main(args=None):
    rclpy.init(args=args)
    node = YoloDetectionNode()
    rclpy.spin(node)
    node.destroy_node()
    rclpy.shutdown()


if __name__ == "__main__":
    main()

Launch File

# launch/yolo_detection.launch.py
from launch import LaunchDescription
from launch_ros.actions import Node

def generate_launch_description():
    return LaunchDescription([
        Node(
            package="yolo_perception",
            executable="yolo_node",
            name="yolo_detection",
            parameters=[{
                "model_path": "/opt/models/yolov8n.engine",
                "confidence": 0.5,
                "input_topic": "/camera/image_raw",
            }],
            output="screen",
        ),
    ])

# Build and run
cd ~/ros2_ws
colcon build --packages-select yolo_perception
source install/setup.bash
ros2 launch yolo_perception yolo_detection.launch.py

Additional Optimization Tips

Power Management

# MAXN mode (15W) — maximum performance
sudo nvpmodel -m 0
sudo jetson_clocks

# Check temperature (throttle at 80°C)
cat /sys/devices/virtual/thermal/thermal_zone*/temp
# Divide by 1000 to get °C

GPU Pre-processing

# Use cv2.cuda instead of CPU resize
gpu_frame = cv2.cuda_GpuMat()
gpu_frame.upload(frame)
gpu_resized = cv2.cuda.resize(gpu_frame, (640, 640))
frame_resized = gpu_resized.download()

Monitor Resource Usage

# Jetson stats — similar to htop but for Jetson
pip3 install jetson-stats
jtop

Summary

Complete pipeline: Camera → GStreamer → YOLOv8 TensorRT → ROS 2 topic — all running at 58+ FPS on Jetson Orin Nano 8GB with FP16. Fast enough for mobile robots, drones, and industrial quality inspection systems.

Key takeaways:

Always use TensorRT FP16 as baseline — this is the best default
GStreamer pipeline for CSI cameras — hardware decode, no CPU overhead
ROS 2 node to integrate into robot stack — publish standard Detection2DArray
Monitor temperature — Jetson throttles at 80°C, needs good cooling

Edge AI with NVIDIA Jetson: Deploy AI on Embedded Devices — Jetson series overview and TensorRT optimization
Computer Vision for Industrial Quality Inspection — YOLO applications in manufacturing
Introduction to ROS 2 for Beginners — ROS 2 fundamentals to understand the ROS node section

Deploy YOLOv8 to Jetson Orin Nano — Why Does It Matter?

NVIDIA Jetson Orin Nano developer kit for AI perception robotics

Prerequisites — What Do You Need?

Hardware

Device	Notes
NVIDIA Jetson Orin Nano 8GB	Developer Kit or module + carrier board
MicroSD 64GB+ (UHS-I)	Or NVMe SSD for better performance
USB camera or CSI camera	Logitech C920 or IMX219 CSI
5V/4A USB-C power supply	Included with Dev Kit

Software

JetPack 6.0+ (L4T R36.x) — includes CUDA 12.2, cuDNN 8.9, TensorRT 8.6
Python 3.10+
Ultralytics (YOLOv8 framework)

# Install JetPack (if not flashed)
sudo apt update && sudo apt install -y nvidia-jetpack

# Verify CUDA
nvcc --version
# cuda_12.2.r12.2/compiler.33191640_0

# Install Ultralytics
pip3 install ultralytics

# Verify PyTorch + CUDA
python3 -c "import torch; print(f'CUDA: {torch.cuda.is_available()}, Device: {torch.cuda.get_device_name(0)}')"
# CUDA: True, Device: Orin (nvgpu)

Step 1: Export YOLOv8 to TensorRT Engine

This is the most critical step. TensorRT will optimize the model for the Ampere GPU on Orin Nano, including layer fusion, kernel auto-tuning, and precision calibration.

Export FP16 (Recommended)

from ultralytics import YOLO

# Load pretrained model (or your custom trained model)
model = YOLO("yolov8n.pt")  # nano version, optimized for edge

# Export to TensorRT FP16
# This process takes 5-10 minutes on Orin Nano
model.export(
    format="engine",
    device=0,
    half=True,        # FP16 precision
    imgsz=640,        # Input resolution
    workspace=4,      # GB workspace for TensorRT optimizer
    verbose=True
)
print("Export done! File: yolov8n.engine")

Export INT8 (Maximum FPS)

INT8 requires a calibration dataset — a set of 200-500 images representative of real-world data:

from ultralytics import YOLO

model = YOLO("yolov8n.pt")

# Export INT8 with calibration
model.export(
    format="engine",
    device=0,
    int8=True,
    data="coco128.yaml",  # Dataset for calibration
    imgsz=640,
    workspace=4
)

Step 2: Benchmark — FP32 vs FP16 vs INT8

I benchmarked on Jetson Orin Nano 8GB, power mode MAXN (15W), with jetson_clocks enabled:

Format	Model	FPS (640x640)	Latency	mAP@50	GPU Memory
PyTorch FP32	yolov8n	15 FPS	67ms	37.3	1.2GB
TensorRT FP32	yolov8n	32 FPS	31ms	37.3	680MB
TensorRT FP16	yolov8n	58 FPS	17ms	37.2	420MB
TensorRT INT8	yolov8n	72 FPS	14ms	36.8	310MB
PyTorch FP32	yolov8s	8 FPS	125ms	44.9	2.1GB
TensorRT FP16	yolov8s	35 FPS	29ms	44.8	780MB

Conclusion: FP16 is the sweet spot — nearly 4x speedup over PyTorch with almost no accuracy loss. Use INT8 only when you need >60 FPS or running multiple models simultaneously.

# Quick benchmark on Jetson
yolo benchmark model=yolov8n.engine imgsz=640 half=True device=0

Step 3: Complete Python Inference Script

"""
YOLOv8 TensorRT Inference on Jetson Orin Nano
Optimized for real-time robot perception
"""
import cv2
import time
import numpy as np
from ultralytics import YOLO
from collections import deque

class JetsonDetector:
    def __init__(self, model_path: str = "yolov8n.engine", conf: float = 0.5):
        self.model = YOLO(model_path)
        self.conf = conf
        self.fps_buffer = deque(maxlen=30)

    def detect(self, frame: np.ndarray) -> list[dict]:
        """Detect objects and return list of detections."""
        start = time.perf_counter()

        results = self.model.predict(
            frame,
            conf=self.conf,
            verbose=False,
            half=True
        )

        elapsed = time.perf_counter() - start
        self.fps_buffer.append(1.0 / elapsed)

        detections = []
        for box in results[0].boxes:
            det = {
                "class_id": int(box.cls[0]),
                "class_name": results[0].names[int(box.cls[0])],
                "confidence": float(box.conf[0]),
                "bbox": box.xyxy[0].cpu().numpy().tolist(),  # [x1, y1, x2, y2]
            }
            detections.append(det)

        return detections

    @property
    def fps(self) -> float:
        return np.mean(self.fps_buffer) if self.fps_buffer else 0.0


# --- Main ---
if __name__ == "__main__":
    detector = JetsonDetector("yolov8n.engine", conf=0.45)
    cap = cv2.VideoCapture(0)
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)

    while True:
        ret, frame = cap.read()
        if not ret:
            break

        detections = detector.detect(frame)

        # Draw bounding boxes
        for det in detections:
            x1, y1, x2, y2 = map(int, det["bbox"])
            label = f'{det["class_name"]} {det["confidence"]:.2f}'
            cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
            cv2.putText(frame, label, (x1, y1 - 10),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 0), 2)

        # FPS overlay
        cv2.putText(frame, f"FPS: {detector.fps:.1f}", (10, 30),
                     cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)

        cv2.imshow("YOLOv8 Jetson", frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

    cap.release()
    cv2.destroyAllWindows()

Step 4: Camera Pipeline with GStreamer

USB cameras work through OpenCV, but CSI cameras need a GStreamer pipeline for hardware-accelerated decoding on Jetson. This is also the optimal approach for USB cameras.

CSI Camera (IMX219/IMX477)

def gstreamer_pipeline(
    sensor_id=0,
    capture_width=1280,
    capture_height=720,
    display_width=1280,
    display_height=720,
    framerate=60,
    flip_method=0,
):
    """Build GStreamer pipeline string for CSI camera on Jetson."""
    return (
        f"nvarguscamerasrc sensor-id={sensor_id} ! "
        f"video/x-raw(memory:NVMM), width=(int){capture_width}, "
        f"height=(int){capture_height}, framerate=(fraction){framerate}/1 ! "
        f"nvvidconv flip-method={flip_method} ! "
        f"video/x-raw, width=(int){display_width}, height=(int){display_height}, "
        f"format=(string)BGRx ! videoconvert ! "
        f"video/x-raw, format=(string)BGR ! appsink drop=1"
    )

# Usage
cap = cv2.VideoCapture(gstreamer_pipeline(), cv2.CAP_GSTREAMER)

USB Camera (Optimized with V4L2)

def usb_gstreamer_pipeline(device="/dev/video0", width=1280, height=720, fps=30):
    """GStreamer pipeline for USB camera, decode on GPU."""
    return (
        f"v4l2src device={device} ! "
        f"video/x-raw, width={width}, height={height}, framerate={fps}/1 ! "
        f"videoconvert ! video/x-raw, format=BGR ! appsink drop=1"
    )

cap = cv2.VideoCapture(usb_gstreamer_pipeline(), cv2.CAP_GSTREAMER)

Multi-camera Setup

# Run 2 cameras in parallel
cap_left = cv2.VideoCapture(gstreamer_pipeline(sensor_id=0), cv2.CAP_GSTREAMER)
cap_right = cv2.VideoCapture(gstreamer_pipeline(sensor_id=1), cv2.CAP_GSTREAMER)

# Or use threading to avoid blocking
import threading

class CameraThread:
    def __init__(self, pipeline):
        self.cap = cv2.VideoCapture(pipeline, cv2.CAP_GSTREAMER)
        self.frame = None
        self.running = True
        self.thread = threading.Thread(target=self._update, daemon=True)
        self.thread.start()

    def _update(self):
        while self.running:
            ret, self.frame = self.cap.read()

    def read(self):
        return self.frame

    def stop(self):
        self.running = False
        self.cap.release()

Camera pipeline AI perception for mobile robots

Step 5: Deploy as ROS 2 Node

This is the final step — wrap the detector as a ROS 2 node to integrate into your robot stack.

"""
ROS 2 YOLOv8 Detection Node for Jetson Orin Nano
Publish detections to /yolo/detections topic
"""
import rclpy
from rclpy.node import Node
from sensor_msgs.msg import Image
from vision_msgs.msg import Detection2DArray, Detection2D, ObjectHypothesisWithPose
from cv_bridge import CvBridge
import cv2
import numpy as np
from ultralytics import YOLO


class YoloDetectionNode(Node):
    def __init__(self):
        super().__init__("yolo_detection_node")

        # Parameters
        self.declare_parameter("model_path", "yolov8n.engine")
        self.declare_parameter("confidence", 0.5)
        self.declare_parameter("input_topic", "/camera/image_raw")

        model_path = self.get_parameter("model_path").value
        self.conf = self.get_parameter("confidence").value
        input_topic = self.get_parameter("input_topic").value

        # Load TensorRT model
        self.model = YOLO(model_path)
        self.bridge = CvBridge()

        # Subscriber & Publisher
        self.sub = self.create_subscription(
            Image, input_topic, self.image_callback, 10
        )
        self.pub = self.create_publisher(Detection2DArray, "/yolo/detections", 10)
        self.img_pub = self.create_publisher(Image, "/yolo/annotated", 10)

        self.get_logger().info(
            f"YOLOv8 node started — model: {model_path}, conf: {self.conf}"
        )

    def image_callback(self, msg: Image):
        # Convert ROS Image -> OpenCV
        frame = self.bridge.imgmsg_to_cv2(msg, "bgr8")

        # Inference
        results = self.model.predict(frame, conf=self.conf, verbose=False, half=True)

        # Build Detection2DArray message
        det_array = Detection2DArray()
        det_array.header = msg.header

        for box in results[0].boxes:
            det = Detection2D()
            x1, y1, x2, y2 = box.xyxy[0].cpu().numpy()

            # Bounding box (center + size format)
            det.bbox.center.position.x = float((x1 + x2) / 2)
            det.bbox.center.position.y = float((y1 + y2) / 2)
            det.bbox.size_x = float(x2 - x1)
            det.bbox.size_y = float(y2 - y1)

            # Classification result
            hyp = ObjectHypothesisWithPose()
            hyp.hypothesis.class_id = str(int(box.cls[0]))
            hyp.hypothesis.score = float(box.conf[0])
            det.results.append(hyp)

            det_array.detections.append(det)

        self.pub.publish(det_array)

        # Publish annotated image (optional, for visualization)
        annotated = results[0].plot()
        self.img_pub.publish(self.bridge.cv2_to_imgmsg(annotated, "bgr8"))


def main(args=None):
    rclpy.init(args=args)
    node = YoloDetectionNode()
    rclpy.spin(node)
    node.destroy_node()
    rclpy.shutdown()


if __name__ == "__main__":
    main()

Launch File

# launch/yolo_detection.launch.py
from launch import LaunchDescription
from launch_ros.actions import Node

def generate_launch_description():
    return LaunchDescription([
        Node(
            package="yolo_perception",
            executable="yolo_node",
            name="yolo_detection",
            parameters=[{
                "model_path": "/opt/models/yolov8n.engine",
                "confidence": 0.5,
                "input_topic": "/camera/image_raw",
            }],
            output="screen",
        ),
    ])

# Build and run
cd ~/ros2_ws
colcon build --packages-select yolo_perception
source install/setup.bash
ros2 launch yolo_perception yolo_detection.launch.py

Additional Optimization Tips

Power Management

# MAXN mode (15W) — maximum performance
sudo nvpmodel -m 0
sudo jetson_clocks

# Check temperature (throttle at 80°C)
cat /sys/devices/virtual/thermal/thermal_zone*/temp
# Divide by 1000 to get °C

GPU Pre-processing

# Use cv2.cuda instead of CPU resize
gpu_frame = cv2.cuda_GpuMat()
gpu_frame.upload(frame)
gpu_resized = cv2.cuda.resize(gpu_frame, (640, 640))
frame_resized = gpu_resized.download()

Monitor Resource Usage

# Jetson stats — similar to htop but for Jetson
pip3 install jetson-stats
jtop

Summary

Key takeaways:

Always use TensorRT FP16 as baseline — this is the best default
GStreamer pipeline for CSI cameras — hardware decode, no CPU overhead
ROS 2 node to integrate into robot stack — publish standard Detection2DArray
Monitor temperature — Jetson throttles at 80°C, needs good cooling

Edge AI with NVIDIA Jetson: Deploy AI on Embedded Devices — Jetson series overview and TensorRT optimization
Computer Vision for Industrial Quality Inspection — YOLO applications in manufacturing
Introduction to ROS 2 for Beginners — ROS 2 fundamentals to understand the ROS node section

Deploy YOLOv8 to Jetson Orin Nano — Why Does It Matter?

Prerequisites — What Do You Need?

Hardware

Software

Step 1: Export YOLOv8 to TensorRT Engine

Export FP16 (Recommended)

Export INT8 (Maximum FPS)

Step 2: Benchmark — FP32 vs FP16 vs INT8

Step 3: Complete Python Inference Script

Step 4: Camera Pipeline with GStreamer

CSI Camera (IMX219/IMX477)

USB Camera (Optimized with V4L2)

Multi-camera Setup

Step 5: Deploy as ROS 2 Node

Launch File

Additional Optimization Tips

Power Management

GPU Pre-processing

Monitor Resource Usage

Summary

Related Articles

Nguyễn Anh Tuấn

Related Posts

Hands-on: Fine-tune OpenVLA với LeRobot

Edge AI với NVIDIA Jetson: Triển khai AI trên thiết bị nhúng

Gemma 4 cho Robotics: AI mã nguồn mở chạy trên Edge

Deploy YOLOv8 to Jetson Orin Nano — Why Does It Matter?

Prerequisites — What Do You Need?

Hardware

Software

Step 1: Export YOLOv8 to TensorRT Engine

Export FP16 (Recommended)

Export INT8 (Maximum FPS)

Step 2: Benchmark — FP32 vs FP16 vs INT8

Step 3: Complete Python Inference Script

Step 4: Camera Pipeline with GStreamer

CSI Camera (IMX219/IMX477)

USB Camera (Optimized with V4L2)

Multi-camera Setup

Step 5: Deploy as ROS 2 Node

Launch File

Additional Optimization Tips

Power Management

GPU Pre-processing

Monitor Resource Usage

Summary

Related Articles

Nguyễn Anh Tuấn

Related Posts

Hands-on: Fine-tune OpenVLA với LeRobot

Edge AI với NVIDIA Jetson: Triển khai AI trên thiết bị nhúng

Gemma 4 cho Robotics: AI mã nguồn mở chạy trên Edge