Production Deployment Checklist

Introduction

Tested with: Python 3.12.3, GCC 13.3.0, Pyvorin Edge SDK 1.0.5-edge, Ubuntu 24.04 LTS (x86_64 & ARM64). Run python3 --version and gcc --version to verify your environment.

Moving a Pyvorin Edge device from the lab bench to the field requires more than copying a config file. This checklist covers every step an operator must complete before a device is considered production-ready: hardware validation, OS hardening, service configuration, data protection, and observability. Follow it in order. Skip nothing.

Pre-Flight Hardware Checks

Before you flash an SD card or power on for the first time, verify the physical layer. Field failures are expensive — a £45 site visit to replace a £12 SD card is not uncommon.

SD Card Health

Consumer SD cards fail silently. Check for bad blocks and wear indicators before imaging:

# Install f3 if not present
sudo apt update && sudo apt install -y f3

# Test the card (replace /dev/sdX with your device)
sudo f3write /media/pi/TEST && sudo f3read /media/pi/TEST

# Check SMART-like wear data (if available)
sudo smartctl -a /dev/sdX 2>/dev/null || echo "SMART not available for this reader"

Power Supply Validation

Raspberry Pi 4 boards require a stable 5.1V/3A supply. Undervoltage causes random corruption and silent reboots.

# Monitor under-voltage events
vcgencmd get_throttled
# If bit 0 is set, the board has experienced under-voltage since boot.

# Continuous voltage monitoring (run for 60 seconds)
for i in {1..60}; do
    vcgencmd measure_volts
done

Network Connectivity

Verify that the EdgeAgent can reach its cloud endpoint and MQTT broker before sealing the enclosure:

# Basic reachability
ping -c 4 api.pyvorin.com

# Verify HTTPS egress (cloud sync)
curl -I https://api.pyvorin.com/v1/health

# Verify MQTT egress (if using MQTT ingest)
nc -vz mqtt.broker.local 8883

# DNS resolution speed
dig api.pyvorin.com +stats | grep "Query time"

Security Hardening

The default Raspberry Pi OS image is designed for convenience, not security. Every unused service is an attack surface.

Disable Unused Services

# Stop and mask services that have no role on an edge device
sudo systemctl stop avahi-daemon bluetooth
sudo systemctl disable avahi-daemon bluetooth
sudo systemctl mask avahi-daemon bluetooth

# Disable wireless if using Ethernet only
sudo rfkill block wifi
sudo rfkill block bluetooth

# Verify what is still running
sudo systemctl list-units --type=service --state=running

Configure UFW Firewall

Only open ports that the EdgeAgent actually needs. The health endpoint defaults to 8080; restrict it to localhost unless you have an explicit remote monitoring requirement.

sudo apt install -y ufw
sudo ufw default deny incoming
sudo ufw default allow outgoing

# SSH (restrict to your management subnet if possible)
sudo ufw allow from 10.0.0.0/8 to any port 22 proto tcp

# MQTT broker (if broker runs on this device)
sudo ufw allow 8883/tcp

# Health endpoint — localhost only
sudo ufw allow from 127.0.0.1 to any port 8080 proto tcp

sudo ufw enable
sudo ufw status verbose

SSH Key-Only Authentication

# On your workstation, copy your public key
ssh-copy-id pi@edge-device.local

# On the edge device, harden sshd
sudo sed -i 's/#PasswordAuthentication yes/PasswordAuthentication no/' /etc/ssh/sshd_config
sudo sed -i 's/#PermitRootLogin prohibit-password/PermitRootLogin no/' /etc/ssh/sshd_config
sudo systemctl restart ssh

Systemd Service Setup for EdgeAgent

The EdgeAgent must survive reboots, crashes, and power glitches. systemd is the correct tool for this job on Linux.

sudo tee /etc/systemd/system/pyvorin-edge.service > /dev/null <<'EOF'
[Unit]
Description=Pyvorin Edge Agent
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
User=pi
Group=pi
WorkingDirectory=/home/pi/pyvorin-edge
ExecStart=/home/pi/.local/bin/pyv-edge-agent --config /home/pi/pyvorin-edge/config.toml
Restart=always
RestartSec=5
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable pyvorin-edge
sudo systemctl start pyvorin-edge

After starting, confirm the service is healthy:

sudo systemctl status pyvorin-edge
journalctl -u pyvorin-edge --since "5 minutes ago" --no-pager

Log Rotation with logrotate

Edge devices write to SD cards with limited write endurance. Unbounded logs will fill the filesystem and accelerate wear. The EdgeAgent outputs structured JSON logs to stdout, which journald captures. We configure journald and logrotate together.

Journald Disk Limits

sudo tee /etc/systemd/journald.conf.d/00-edge-limits.conf > /dev/null <<'EOF'
[Journal]
SystemMaxUse=256M
SystemMaxFileSize=32M
MaxFileSec=1week
EOF

sudo systemctl restart systemd-journald

Logrotate for Custom Log Files

If you redirect EdgeAgent output to a file (not recommended; use journald), add a logrotate rule:

sudo tee /etc/logrotate.d/pyvorin-edge > /dev/null <<'EOF'
/home/pi/pyvorin-edge/logs/*.log {
    daily
    rotate 7
    compress
    delaycompress
    missingok
    notifempty
    create 0640 pi pi
}
EOF

Backup Strategy for the SQLite Database

The EdgeAgent stores sensor readings, events, and queue state in SQLite. The default path is edge_store.db, and the cloud sync queue defaults to sync_queue.db. Both live in the working directory unless configured otherwise in config.toml.

Automated Nightly Backup

sudo tee /usr/local/bin/edge-db-backup.sh > /dev/null <<'EOF'
#!/bin/bash
set -euo pipefail

BACKUP_DIR="/home/pi/pyvorin-edge/backups"
DB_DIR="/home/pi/pyvorin-edge"
DATE=$(date +%Y%m%d_%H%M%S)

mkdir -p "$BACKUP_DIR"

# SQLite online backup using the built-in backup API
sqlite3 "$DB_DIR/edge_store.db" ".backup '$BACKUP_DIR/edge_store_$DATE.db'"
sqlite3 "$DB_DIR/sync_queue.db" ".backup '$BACKUP_DIR/sync_queue_$DATE.db'"

# Keep only the last 14 backups
ls -t "$BACKUP_DIR"/edge_store_*.db | tail -n +15 | xargs -r rm -f
ls -t "$BACKUP_DIR"/sync_queue_*.db | tail -n +15 | xargs -r rm -f

# Verify the latest backup integrity
sqlite3 "$BACKUP_DIR/edge_store_$DATE.db" "PRAGMA integrity_check;"
EOF

sudo chmod +x /usr/local/bin/edge-db-backup.sh

# Schedule via cron at 02:17 every night
(crontab -l 2>/dev/null; echo "17 2 * * * /usr/local/bin/edge-db-backup.sh >> /var/log/edge-backup.log 2>&1") | crontab -

Monitoring Alerts

The EdgeAgent exposes system metrics via GET /health and GET /metrics on port 8080. The underlying collector is SystemMetrics in /var/www/pyvorin/edge_runtime/pyv_edge_agent/health_monitor/metrics.py, which reads /proc/stat, /proc/meminfo, /sys/class/thermal/thermal_zone0/temp, and shutil.disk_usage.

Simple Shell Health Monitor

For small fleets, a cron-based monitor is sufficient before investing in Prometheus or Datadog:

sudo tee /usr/local/bin/edge-health-check.sh > /dev/null <<'EOF'
#!/bin/bash
# Post-deployment verification and alerting script

HEALTH_URL="http://127.0.0.1:8080/health"
ALERT_WEBHOOK="${ALERT_WEBHOOK:-}"
LOG_FILE="/var/log/edge-health.log"

log() {
    echo "$(date -Iseconds) $1" | tee -a "$LOG_FILE"
}

# Fetch health JSON
HEALTH_JSON=$(curl -s -m 5 "$HEALTH_URL" 2>/dev/null) || {
    log "CRITICAL: Health endpoint unreachable"
    [ -n "$ALERT_WEBHOOK" ] && curl -s -X POST -H "Content-Type: application/json" \
        -d '{"text":"EdgeAgent health endpoint unreachable"}' "$ALERT_WEBHOOK" > /dev/null
    exit 1
}

# Extract metrics via jq (install with: sudo apt install jq)
CPU=$(echo "$HEALTH_JSON" | jq -r '.metrics.cpu_percent // 0')
DISK=$(echo "$HEALTH_JSON" | jq -r '.metrics.disk_percent // 0')
THERMAL=$(echo "$HEALTH_JSON" | jq -r '.metrics.thermal_celsius // 0')
QUEUE=$(echo "$HEALTH_JSON" | jq -r '.cloud.queue_depth // 0')

# Thresholds
CPU_LIMIT=85
DISK_LIMIT=90
THERMAL_LIMIT=75
QUEUE_LIMIT=5000

CRIT=0

if (( $(echo "$CPU > $CPU_LIMIT" | bc -l) )); then
    log "WARNING: CPU at ${CPU}%"
    CRIT=1
fi

if (( $(echo "$DISK > $DISK_LIMIT" | bc -l) )); then
    log "CRITICAL: Disk at ${DISK}%"
    CRIT=1
fi

if (( $(echo "$THERMAL > $THERMAL_LIMIT" | bc -l) )); then
    log "WARNING: Thermal at ${THERMAL}°C"
    CRIT=1
fi

if (( QUEUE > QUEUE_LIMIT )); then
    log "WARNING: Cloud queue depth ${QUEUE}"
    CRIT=1
fi

if [ "$CRIT" -eq 0 ]; then
    log "OK: CPU=${CPU}% DISK=${DISK}% THERMAL=${THERMAL}°C QUEUE=${QUEUE}"
fi

exit 0
EOF

sudo chmod +x /usr/local/bin/edge-health-check.sh
sudo apt install -y jq bc

# Run every 5 minutes
(crontab -l 2>/dev/null; echo "*/5 * * * * /usr/local/bin/edge-health-check.sh") | crontab -

Post-Deployment Verification Script

The following script is a single, copy-pasteable bash checklist. Run it immediately after every deployment. It returns exit code 0 only if all checks pass.

#!/bin/bash
# Pyvorin Edge — Production Deployment Verification
# Run as: sudo bash verify-deployment.sh

set -uo pipefail
ERRORS=0

pass() { echo "  [PASS] $1"; }
fail() { echo "  [FAIL] $1"; ((ERRORS++)); }

echo "=== Pyvorin Edge Deployment Verification ==="

# 1. Service status
if systemctl is-active --quiet pyvorin-edge; then
    pass "pyvorin-edge service is running"
else
    fail "pyvorin-edge service is not running"
fi

# 2. Health endpoint
HEALTH=$(curl -s -m 5 http://127.0.0.1:8080/health 2>/dev/null)
if [ -n "$HEALTH" ]; then
    pass "Health endpoint responds"
    STATUS=$(echo "$HEALTH" | python3 -c "import sys,json; print(json.load(sys.stdin)['status'])")
    if [ "$STATUS" = "healthy" ]; then
        pass "Agent status is healthy"
    else
        fail "Agent status is $STATUS"
    fi
else
    fail "Health endpoint unreachable"
fi

# 3. Config file exists and is readable
CONFIG="/home/pi/pyvorin-edge/config.toml"
if [ -r "$CONFIG" ]; then
    pass "Config file readable: $CONFIG"
else
    fail "Config file missing or unreadable: $CONFIG"
fi

# 4. SQLite databases are writable
DB_DIR="/home/pi/pyvorin-edge"
for db in edge_store.db sync_queue.db; do
    if [ -w "$DB_DIR/$db" ]; then
        pass "Database writable: $db"
    else
        fail "Database not writable: $db"
    fi
done

# 5. Backup directory exists
if [ -d "$DB_DIR/backups" ]; then
    pass "Backup directory exists"
else
    fail "Backup directory missing"
fi

# 6. Log rotation configured
if [ -f /etc/logrotate.d/pyvorin-edge ] || [ -d /etc/systemd/journald.conf.d ]; then
    pass "Log rotation configured"
else
    fail "Log rotation not configured"
fi

# 7. Firewall active
if sudo ufw status | grep -q "Status: active"; then
    pass "UFW firewall is active"
else
    fail "UFW firewall is not active"
fi

# 8. SSH password auth disabled
if grep -q "^PasswordAuthentication no" /etc/ssh/sshd_config; then
    pass "SSH password authentication disabled"
else
    fail "SSH password authentication still enabled"
fi

# 9. Thermal within range
THERMAL=$(vcgencmd measure_temp 2>/dev/null | sed "s/temp=//;s/'C//") || THERMAL="0"
if (( $(echo "$THERMAL < 80" | bc -l) )); then
    pass "SoC thermal: ${THERMAL}°C"
else
    fail "SoC thermal too high: ${THERMAL}°C"
fi

# 10. Disk usage
DISK_USAGE=$(df / | tail -1 | awk '{print $5}' | sed 's/%//')
if [ "$DISK_USAGE" -lt 90 ]; then
    pass "Root disk usage: ${DISK_USAGE}%"
else
    fail "Root disk usage critical: ${DISK_USAGE}%"
fi

# Summary
echo ""
if [ "$ERRORS" -eq 0 ]; then
    echo "=== ALL CHECKS PASSED ==="
    exit 0
else
    echo "=== $ERRORS CHECK(S) FAILED ==="
    exit 1
fi

Summary

A production Pyvorin Edge deployment is not complete until hardware is validated, the OS is hardened, the agent runs under systemd, logs rotate automatically, SQLite backups run nightly, and health monitoring is in place. Use the verification script after every install. Automate what you can. Document what you cannot.