Kubernetes for Robot Fleet: Orchestration at Scale

The Robot Fleet Problem

Scaling from 5 to 50 to 500 robots makes manual SSH updates impossible. You need automated deployment, rollback, monitoring, and scaling — exactly what Kubernetes does.

But standard Kubernetes is heavy for edge devices. K3s is a lightweight variant using only 512MB RAM, running on Raspberry Pi and NVIDIA Jetson.

K3s Architecture for Robot Fleet

┌──────────────────────────────────────┐
│   Cloud Control Plane (K3s server)   │
│   - Deployment manifests             │
│   - ConfigMaps (robot config)        │
│   - Secrets (API keys)               │
└──────────────────────────────────────┘
         |
    ┌────┴────────┬─────────────┐
    v             v             v
┌─────────┐ ┌─────────┐  ┌─────────┐
│ Robot 1 │ │ Robot 2 │  │ Robot N │
│(K3s agent) │(K3s agent) │(K3s agent)
└─────────┘ └─────────┘  └─────────┘

Setup K3s on Robot

# Install K3s agent on robot
curl -sfL https://get.k3s.io | K3S_URL=https://control-plane:6443 \
  K3S_TOKEN=mytoken sh -

# Verify installation
kubectl get nodes

Deploy Robot Software via GitOps

Create Helm chart for robot services:

# robot-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: robot-nav
  namespace: fleet
spec:
  replicas: 1
  selector:
    matchLabels:
      app: robot-navigation
  template:
    metadata:
      labels:
        app: robot-navigation
    spec:
      nodeSelector:
        robot-id: robot-001
      containers:
      - name: nav
        image: ghcr.io/myorg/robot-nav:v1.2.3
        resources:
          limits:
            memory: "256Mi"
            cpu: "500m"
        volumeMounts:
        - name: device-lidar
          mountPath: /dev/ttyUSB0
      - name: monitoring
        image: ghcr.io/myorg/robot-monitor:v1.0
        resources:
          limits:
            memory: "128Mi"
            cpu: "200m"
      volumes:
      - name: device-lidar
        hostPath:
          path: /dev/ttyUSB0

Deploy to all robots:

# All robots automatically pull latest config
kubectl apply -f robot-deployment.yaml --all-namespaces

Rolling Update Without Downtime

# Update image
kubectl set image deployment/robot-nav \
  nav=ghcr.io/myorg/robot-nav:v1.2.4 \
  -n fleet

# Monitor rollout
kubectl rollout status deployment/robot-nav -n fleet

# Rollback if needed
kubectl rollout undo deployment/robot-nav -n fleet

Monitoring Fleet Health

# Monitor all robots
import subprocess
import json

result = subprocess.run(
    ['kubectl', 'get', 'pods', '-A', '-o', 'json'],
    capture_output=True
)

pods = json.loads(result.stdout)
for pod in pods['items']:
    robot_id = pod['metadata']['namespace']
    status = pod['status']['phase']
    print(f"Robot {robot_id}: {status}")

Prometheus + Grafana Monitoring

# Install monitoring stack
helm repo add prometheus-community \
  https://prometheus-community.github.io/helm-charts

helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring --create-namespace \
  --set prometheus.prometheusSpec.retention=7d

Custom Robot Metrics

from prometheus_client import Gauge, start_http_server

battery_level = Gauge('robot_battery_percent', 'Battery level', ['robot_id'])
mission_count = Gauge('robot_missions_completed', 'Missions done', ['robot_id'])

start_http_server(9090)

# In main loop
battery_level.labels(robot_id="amr-001").set(85.5)
mission_count.labels(robot_id="amr-001").inc()

GitOps with FluxCD

GitOps makes Git the single source of truth:

# Install FluxCD
flux install

# Connect to Git repo
flux create source git robot-fleet \
  --url=ssh://[email protected]/vnrobo/fleet-config \
  --branch=main

# Auto-deploy when repo changes
flux create kustomization robot-apps \
  --source=robot-fleet \
  --path="./apps" \
  --prune=true \
  --interval=5m

Workflow: Commit → GitHub → FluxCD → K3s applies → Robots updated

Handling Unstable Networks

Robots connect via WiFi — networks can drop anytime:

K3s agent auto-reconnect: reconnects when network recovers
Tolerations: allow pods to run on temporarily unavailable nodes

tolerations:
  - key: "node.kubernetes.io/unreachable"
    operator: "Exists"
    effect: "NoExecute"
    tolerationSeconds: 300  # Wait 5 minutes before rescheduling

Comparison: Fleet Management Solutions

Criterion	K3s + GitOps	Ansible	Balena
Auto-healing	Yes	No	Yes
Rolling update	Yes	Manual	Yes
Offline support	Good	No	Good
Learning curve	High	Medium	Low
Flexibility	Very high	High	Medium
Cost	Free	Free	Paid

Best Practices

Start small: 3-5 robots first, scale after comfortable
WireGuard VPN: Between server and robots for security
Private registry: Own container registry, avoid Docker Hub dependency
Test rollbacks: Every deploy, test rollback procedure
Resource limits: Set CPU/memory limits to prevent robot lockup

K3s + GitOps is the production way to manage robot fleets at scale. Combined with MQTT for telemetry, you have complete fleet infrastructure.

The Robot Fleet Problem

Scaling from 5 to 50 to 500 robots makes manual SSH updates impossible. You need automated deployment, rollback, monitoring, and scaling — exactly what Kubernetes does.

But standard Kubernetes is heavy for edge devices. K3s is a lightweight variant using only 512MB RAM, running on Raspberry Pi and NVIDIA Jetson.

K3s Architecture for Robot Fleet

┌──────────────────────────────────────┐
│   Cloud Control Plane (K3s server)   │
│   - Deployment manifests             │
│   - ConfigMaps (robot config)        │
│   - Secrets (API keys)               │
└──────────────────────────────────────┘
         |
    ┌────┴────────┬─────────────┐
    v             v             v
┌─────────┐ ┌─────────┐  ┌─────────┐
│ Robot 1 │ │ Robot 2 │  │ Robot N │
│(K3s agent) │(K3s agent) │(K3s agent)
└─────────┘ └─────────┘  └─────────┘

Setup K3s on Robot

# Install K3s agent on robot
curl -sfL https://get.k3s.io | K3S_URL=https://control-plane:6443 \
  K3S_TOKEN=mytoken sh -

# Verify installation
kubectl get nodes

Deploy Robot Software via GitOps

Create Helm chart for robot services:

# robot-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: robot-nav
  namespace: fleet
spec:
  replicas: 1
  selector:
    matchLabels:
      app: robot-navigation
  template:
    metadata:
      labels:
        app: robot-navigation
    spec:
      nodeSelector:
        robot-id: robot-001
      containers:
      - name: nav
        image: ghcr.io/myorg/robot-nav:v1.2.3
        resources:
          limits:
            memory: "256Mi"
            cpu: "500m"
        volumeMounts:
        - name: device-lidar
          mountPath: /dev/ttyUSB0
      - name: monitoring
        image: ghcr.io/myorg/robot-monitor:v1.0
        resources:
          limits:
            memory: "128Mi"
            cpu: "200m"
      volumes:
      - name: device-lidar
        hostPath:
          path: /dev/ttyUSB0

Deploy to all robots:

# All robots automatically pull latest config
kubectl apply -f robot-deployment.yaml --all-namespaces

Rolling Update Without Downtime

# Update image
kubectl set image deployment/robot-nav \
  nav=ghcr.io/myorg/robot-nav:v1.2.4 \
  -n fleet

# Monitor rollout
kubectl rollout status deployment/robot-nav -n fleet

# Rollback if needed
kubectl rollout undo deployment/robot-nav -n fleet

Monitoring Fleet Health

# Monitor all robots
import subprocess
import json

result = subprocess.run(
    ['kubectl', 'get', 'pods', '-A', '-o', 'json'],
    capture_output=True
)

pods = json.loads(result.stdout)
for pod in pods['items']:
    robot_id = pod['metadata']['namespace']
    status = pod['status']['phase']
    print(f"Robot {robot_id}: {status}")

Prometheus + Grafana Monitoring

# Install monitoring stack
helm repo add prometheus-community \
  https://prometheus-community.github.io/helm-charts

helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring --create-namespace \
  --set prometheus.prometheusSpec.retention=7d

Custom Robot Metrics

from prometheus_client import Gauge, start_http_server

battery_level = Gauge('robot_battery_percent', 'Battery level', ['robot_id'])
mission_count = Gauge('robot_missions_completed', 'Missions done', ['robot_id'])

start_http_server(9090)

# In main loop
battery_level.labels(robot_id="amr-001").set(85.5)
mission_count.labels(robot_id="amr-001").inc()

GitOps with FluxCD

GitOps makes Git the single source of truth:

# Install FluxCD
flux install

# Connect to Git repo
flux create source git robot-fleet \
  --url=ssh://[email protected]/vnrobo/fleet-config \
  --branch=main

# Auto-deploy when repo changes
flux create kustomization robot-apps \
  --source=robot-fleet \
  --path="./apps" \
  --prune=true \
  --interval=5m

Workflow: Commit → GitHub → FluxCD → K3s applies → Robots updated

Handling Unstable Networks

Robots connect via WiFi — networks can drop anytime:

K3s agent auto-reconnect: reconnects when network recovers
Tolerations: allow pods to run on temporarily unavailable nodes

tolerations:
  - key: "node.kubernetes.io/unreachable"
    operator: "Exists"
    effect: "NoExecute"
    tolerationSeconds: 300  # Wait 5 minutes before rescheduling

Comparison: Fleet Management Solutions

Criterion	K3s + GitOps	Ansible	Balena
Auto-healing	Yes	No	Yes
Rolling update	Yes	Manual	Yes
Offline support	Good	No	Good
Learning curve	High	Medium	Low
Flexibility	Very high	High	Medium
Cost	Free	Free	Paid

Best Practices

Start small: 3-5 robots first, scale after comfortable
WireGuard VPN: Between server and robots for security
Private registry: Own container registry, avoid Docker Hub dependency
Test rollbacks: Every deploy, test rollback procedure
Resource limits: Set CPU/memory limits to prevent robot lockup

K3s + GitOps is the production way to manage robot fleets at scale. Combined with MQTT for telemetry, you have complete fleet infrastructure.

Kubernetes for Robot Fleet: Orchestration at Scale

The Robot Fleet Problem

K3s Architecture for Robot Fleet

Setup K3s on Robot

Deploy Robot Software via GitOps

Rolling Update Without Downtime

Monitoring Fleet Health

Prometheus + Grafana Monitoring

Custom Robot Metrics

GitOps with FluxCD

Handling Unstable Networks

Comparison: Fleet Management Solutions

Best Practices

Nguyễn Anh Tuấn

Related Posts

Docker + K3s trên edge: GitOps cho robot fleet

Multi-robot Coordination: Thuật toán phân công task

Wheeled Humanoid: Tương lai robot logistics và warehouse

Kubernetes for Robot Fleet: Orchestration at Scale

The Robot Fleet Problem

K3s Architecture for Robot Fleet

Setup K3s on Robot

Deploy Robot Software via GitOps

Rolling Update Without Downtime

Monitoring Fleet Health

Prometheus + Grafana Monitoring

Custom Robot Metrics

GitOps with FluxCD

Handling Unstable Networks

Comparison: Fleet Management Solutions

Best Practices

Nguyễn Anh Tuấn

Related Posts

Docker + K3s trên edge: GitOps cho robot fleet

Multi-robot Coordination: Thuật toán phân công task

Wheeled Humanoid: Tương lai robot logistics và warehouse

The Robot Fleet Problem

K3s Architecture for Robot Fleet

Setup K3s on Robot

Deploy Robot Software via GitOps

Rolling Update Without Downtime

Monitoring Fleet Health

Prometheus + Grafana Monitoring

Custom Robot Metrics

GitOps with FluxCD

Handling Unstable Networks

Comparison: Fleet Management Solutions

Best Practices

Related Articles

Nguyễn Anh Tuấn

Related Posts

Docker + K3s trên edge: GitOps cho robot fleet

Multi-robot Coordination: Thuật toán phân công task

Wheeled Humanoid: Tương lai robot logistics và warehouse

The Robot Fleet Problem

K3s Architecture for Robot Fleet

Setup K3s on Robot

Deploy Robot Software via GitOps

Rolling Update Without Downtime

Monitoring Fleet Health

Prometheus + Grafana Monitoring

Custom Robot Metrics

GitOps with FluxCD

Handling Unstable Networks

Comparison: Fleet Management Solutions

Best Practices

Related Articles

Nguyễn Anh Tuấn

Related Posts

Docker + K3s trên edge: GitOps cho robot fleet

Multi-robot Coordination: Thuật toán phân công task

Wheeled Humanoid: Tương lai robot logistics và warehouse