← Back to Blog
navigationdevopskubernetesfleet

Kubernetes for Robot Fleet: Orchestration at Scale

Using Kubernetes and K3s to manage, update, and monitor hundreds of robots with GitOps principles.

Nguyen Anh Tuan1 tháng 11, 20254 min read
Kubernetes for Robot Fleet: Orchestration at Scale

The Robot Fleet Problem

Scaling from 5 to 50 to 500 robots makes manual SSH updates impossible. You need automated deployment, rollback, monitoring, and scaling — exactly what Kubernetes does.

But standard Kubernetes is heavy for edge devices. K3s is a lightweight variant using only 512MB RAM, running on Raspberry Pi and NVIDIA Jetson.

K3s Architecture for Robot Fleet

┌──────────────────────────────────────┐
│   Cloud Control Plane (K3s server)   │
│   - Deployment manifests             │
│   - ConfigMaps (robot config)        │
│   - Secrets (API keys)               │
└──────────────────────────────────────┘
         |
    ┌────┴────────┬─────────────┐
    v             v             v
┌─────────┐ ┌─────────┐  ┌─────────┐
│ Robot 1 │ │ Robot 2 │  │ Robot N │
│(K3s agent) │(K3s agent) │(K3s agent)
└─────────┘ └─────────┘  └─────────┘

Setup K3s on Robot

# Install K3s agent on robot
curl -sfL https://get.k3s.io | K3S_URL=https://control-plane:6443 \
  K3S_TOKEN=mytoken sh -

# Verify installation
kubectl get nodes

Deploy Robot Software via GitOps

Create Helm chart for robot services:

# robot-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: robot-nav
  namespace: fleet
spec:
  replicas: 1
  selector:
    matchLabels:
      app: robot-navigation
  template:
    metadata:
      labels:
        app: robot-navigation
    spec:
      nodeSelector:
        robot-id: robot-001
      containers:
      - name: nav
        image: ghcr.io/myorg/robot-nav:v1.2.3
        resources:
          limits:
            memory: "256Mi"
            cpu: "500m"
        volumeMounts:
        - name: device-lidar
          mountPath: /dev/ttyUSB0
      - name: monitoring
        image: ghcr.io/myorg/robot-monitor:v1.0
        resources:
          limits:
            memory: "128Mi"
            cpu: "200m"
      volumes:
      - name: device-lidar
        hostPath:
          path: /dev/ttyUSB0

Deploy to all robots:

# All robots automatically pull latest config
kubectl apply -f robot-deployment.yaml --all-namespaces

Rolling Update Without Downtime

# Update image
kubectl set image deployment/robot-nav \
  nav=ghcr.io/myorg/robot-nav:v1.2.4 \
  -n fleet

# Monitor rollout
kubectl rollout status deployment/robot-nav -n fleet

# Rollback if needed
kubectl rollout undo deployment/robot-nav -n fleet

Monitoring Fleet Health

# Monitor all robots
import subprocess
import json

result = subprocess.run(
    ['kubectl', 'get', 'pods', '-A', '-o', 'json'],
    capture_output=True
)

pods = json.loads(result.stdout)
for pod in pods['items']:
    robot_id = pod['metadata']['namespace']
    status = pod['status']['phase']
    print(f"Robot {robot_id}: {status}")

Prometheus + Grafana Monitoring

# Install monitoring stack
helm repo add prometheus-community \
  https://prometheus-community.github.io/helm-charts

helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring --create-namespace \
  --set prometheus.prometheusSpec.retention=7d

Custom Robot Metrics

from prometheus_client import Gauge, start_http_server

battery_level = Gauge('robot_battery_percent', 'Battery level', ['robot_id'])
mission_count = Gauge('robot_missions_completed', 'Missions done', ['robot_id'])

start_http_server(9090)

# In main loop
battery_level.labels(robot_id="amr-001").set(85.5)
mission_count.labels(robot_id="amr-001").inc()

GitOps with FluxCD

GitOps makes Git the single source of truth:

# Install FluxCD
flux install

# Connect to Git repo
flux create source git robot-fleet \
  --url=ssh://[email protected]/vnrobo/fleet-config \
  --branch=main

# Auto-deploy when repo changes
flux create kustomization robot-apps \
  --source=robot-fleet \
  --path="./apps" \
  --prune=true \
  --interval=5m

Workflow: Commit → GitHub → FluxCD → K3s applies → Robots updated

Handling Unstable Networks

Robots connect via WiFi — networks can drop anytime:

tolerations:
  - key: "node.kubernetes.io/unreachable"
    operator: "Exists"
    effect: "NoExecute"
    tolerationSeconds: 300  # Wait 5 minutes before rescheduling

Comparison: Fleet Management Solutions

Criterion K3s + GitOps Ansible Balena
Auto-healing Yes No Yes
Rolling update Yes Manual Yes
Offline support Good No Good
Learning curve High Medium Low
Flexibility Very high High Medium
Cost Free Free Paid

Best Practices

  1. Start small: 3-5 robots first, scale after comfortable
  2. WireGuard VPN: Between server and robots for security
  3. Private registry: Own container registry, avoid Docker Hub dependency
  4. Test rollbacks: Every deploy, test rollback procedure
  5. Resource limits: Set CPU/memory limits to prevent robot lockup

K3s + GitOps is the production way to manage robot fleets at scale. Combined with MQTT for telemetry, you have complete fleet infrastructure.


Related Articles

Related Posts

Multi-robot Coordination: Thuật toán phân công task
fleetamrprogramming

Multi-robot Coordination: Thuật toán phân công task

Các thuật toán phân công nhiệm vụ cho đội robot — từ Hungarian algorithm, auction-based đến RL-based task allocation.

20/3/202612 min read
Docker + K3s trên edge: GitOps cho robot fleet
devopsfleetkubernetes

Docker + K3s trên edge: GitOps cho robot fleet

Hướng dẫn triển khai Docker và K3s trên edge device — quản lý, cập nhật OTA và giám sát hàng trăm robot với GitOps workflow.

10/3/20268 min read
Wheeled Humanoid: Tương lai robot logistics và warehouse
humanoidfleetamr

Wheeled Humanoid: Tương lai robot logistics và warehouse

Robot hình người trên bánh xe — tại sao thiết kế hybrid này đang thay đổi ngành logistics và vận hành kho hàng.

3/3/202611 min read