aiai-perceptionedge-computingnvidia

Edge AI with NVIDIA Jetson: Deploy AI on Embedded Devices

Guide to deploying Edge AI with NVIDIA Jetson and TensorRT — optimize real-time inference for robots and embedded applications.

Nguyen Anh Tuan15 tháng 8, 20253 phút đọc
Edge AI with NVIDIA Jetson: Deploy AI on Embedded Devices

What is Edge AI and Why Jetson?

Edge AI means running AI models directly on devices instead of sending to cloud. Advantages: ultra-low latency (<30ms), offline operation, data security, bandwidth savings. In robotics, edge AI is mandatory — robot can't wait 200ms round-trip to cloud to avoid obstacle.

NVIDIA Jetson is SBC line designed for AI inference with integrated CUDA GPUs.

NVIDIA Jetson board and AI embedded devices for robotics

Jetson Product Line

Module GPU Cores RAM AI Performance Price (~)
Jetson Nano 128 CUDA 4GB 472 GFLOPS $149
Jetson Orin Nano 1024 CUDA 8GB 40 TOPS $249
Jetson Orin NX 1024 CUDA 16GB 100 TOPS $599
Jetson AGX Orin 2048 CUDA 64GB 275 TOPS $1999

For mobile robots, Jetson Orin Nano optimal — best performance/price/power (15W).

Environment Setup

JetPack SDK

# Install JetPack (includes CUDA, cuDNN, TensorRT)
sudo apt update && sudo apt install -y nvidia-jetpack

# Verify CUDA
nvcc --version
python3 -c "import torch; print(torch.cuda.is_available())"

Development Container

NVIDIA provides ready-made containers:

# Pull PyTorch container for Jetson
sudo docker pull nvcr.io/nvidia/l4t-pytorch:r36.2.0-pth2.1-py3

# Run with GPU access
sudo docker run -it --runtime nvidia --network host \
  nvcr.io/nvidia/l4t-pytorch:r36.2.0-pth2.1-py3

TensorRT Optimization

TensorRT accelerates models 2-5x vs PyTorch native.

YOLOv8 to TensorRT Conversion

from ultralytics import YOLO

model = YOLO("yolov8n.pt")

# Export to TensorRT FP16
model.export(format="engine", device=0, half=True, imgsz=640)

Performance on Jetson Orin Nano

Format FPS (640x640) Latency
PyTorch FP32 12 FPS 83ms
TensorRT FP32 28 FPS 36ms
TensorRT FP16 45 FPS 22ms
TensorRT INT8 62 FPS 16ms

Quantization: FP16 and INT8

FP16 reduces memory half and speeds inference with minimal accuracy loss — default choice for Jetson.

INT8 needs calibration but maximum speed:

import tensorrt as trt

config.set_flag(trt.BuilderFlag.INT8)
config.int8_calibrator = EntropyCalibrator(
    calibration_data="/path/to/calib_images/",
    cache_file="calibration.cache"
)

Real-time AI inference on edge computing device

Complete Real-time Pipeline

import cv2
from ultralytics import YOLO

model = YOLO("yolov8n.engine")
cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    # Inference
    results = model.predict(frame, conf=0.5, verbose=False)

    # Process results
    for box in results[0].boxes:
        cls = int(box.cls[0])
        conf = float(box.conf[0])
        x1, y1, x2, y2 = map(int, box.xyxy[0])

        # Send command to robot based on detection
        if cls == 0:  # person detected
            distance = estimate_distance(y2 - y1)
            if distance < 1.0:
                send_stop_command()

Power Management

# View power mode
sudo nvpmodel -q

# MAXN mode (15W, high performance)
sudo nvpmodel -m 0

# Maximize performance
sudo jetson_clocks

Deployment Tips

  1. Always use TensorRT FP16 — baseline, only switch INT8 for extra FPS
  2. Pre-process on GPU: use cv2.cuda instead of CPU OpenCV
  3. Batch inference: group multiple frames if latency allows
  4. Monitor temperature: Jetson throttles at 80°C

Edge AI with Jetson enables deploying powerful AI directly on robots without cloud dependency, meeting strictest real-time requirements. Combined with Python robot programming, build complete AI-to-action pipeline.

NT

Nguyễn Anh Tuấn

Robotics & AI Engineer. Building VnRobo — sharing knowledge about robot learning, VLA models, and automation.

Bài viết liên quan

NEWTutorial
NVIDIA Newton 1.0: GPU Physics 475x Nhanh Hơn MJX
simulationnvidiaphysics-enginegpusim-to-realisaac-labrobotics

NVIDIA Newton 1.0: GPU Physics 475x Nhanh Hơn MJX

Hướng dẫn thực hành NVIDIA Newton 1.0 — physics engine mã nguồn mở nhanh nhất cho sim-to-real robotics, tăng tốc 475x so với MJX trên GPU.

17/4/202611 phút đọc
NEWTutorial
Fine-Tune GR00T N1.6 với Cosmos Reason 2
grootnvidiavlacosmosfine-tuninghumanoidisaac

Fine-Tune GR00T N1.6 với Cosmos Reason 2

Hướng dẫn chi tiết fine-tune NVIDIA GR00T N1.6 — VLA model 3B tham số kết hợp Cosmos Reason 2 để điều khiển humanoid robot từ ảnh và ngôn ngữ.

15/4/202611 phút đọc
NEWDeep Dive
Task Planning cho Manipulation trên Jetson Edge
jetsonmanipulationtask-planningedge-ainvidiacuTAMPcuMotionTensorRT

Task Planning cho Manipulation trên Jetson Edge

Hướng dẫn triển khai task planning cho robot manipulation trên NVIDIA Jetson AGX Orin 64GB — từ cuTAMP, cuMotion đến VLM inference.

14/4/202615 phút đọc