edge Expert 28 min read

Over-the-Air Update System

Design and implement secure OTA updates for Pyvorin Edge using Ed25519 bundle signing, atomic symlink swaps, automatic rollback, and channel-based release management.

Published Jun 2, 2026

Introduction

Deploying bug fixes, new sensor adapters, or policy updates to a fleet of edge devices without physical access requires a robust over-the-air (OTA) update mechanism. This article designs a complete OTA pipeline for Pyvorin Edge: bundle signing with Ed25519, version manifests, atomic symlink swaps, automatic rollback on health check failure, and stable/beta/canary release channels. All signing primitives come from the real SDK source in /var/www/pyvorin/edge_sdk/pyvorin_edge/packaging/signer.py and /var/www/pyvorin/edge_sdk/pyvorin_edge/packaging/verifier.py.

Bundle Signing with Ed25519

The SDK provides BundleSigner, which generates Ed25519 key pairs, hashes every file in a bundle, and writes a signed manifest. The private key never leaves the build server. The public key is baked into each device image as a trust anchor.

Signing a Release Bundle

# On the build server
pyv-edge-sign \
    --bundle-dir ./dist/edge-agent-v1.4.2 \
    --private-key /secure/signing.key \
    --output-manifest ./dist/edge-agent-v1.4.2/manifest.json
  

Under the hood, BundleSigner.sign_bundle() performs the following steps, exactly as implemented in signer.py:

from pathlib import Path
from pyvorin_edge.packaging.signer import BundleSigner

# Load your securely stored private key
private_key = Path("/secure/signing.key").read_bytes()

# Sign the bundle
BundleSigner.sign_bundle(
    bundle_dir="./dist/edge-agent-v1.4.2",
    private_key=private_key,
    output_manifest="./dist/edge-agent-v1.4.2/manifest.json",
)
  

The manifest contains:

  • bundle_name — human-readable identifier
  • version — semantic version string
  • timestamp — Unix epoch seconds
  • files — mapping of relative paths to SHA-256 hex digests
  • signature — base64-encoded Ed25519 signature of the canonical manifest JSON

Version Manifest Format

A well-formed manifest is required by BundleVerifier.verify_at_runtime() on the device. Below is an example generated by sign_bundle():

{
  "manifest": {
    "bundle_name": "edge-agent-v1.4.2",
    "version": "1.4.2",
    "timestamp": 1716979200,
    "files": {
      "main.py": "a3f5c8...",
      "config.toml": "e7b2d1...",
      "adapters/mqtt.py": "9c4a11..."
    }
  },
  "signature": "base64Ed25519Signature..."
}
  

Atomic Swap Strategy

Updates must not corrupt a running agent. The safest approach on a Unix filesystem is a staging directory plus a symlink swap.

Directory Layout on Device

/opt/pyvorin-edge/
├── current -> versions/v1.4.1/        # Symlink to active version
├── previous -> versions/v1.4.0/       # Symlink for rollback
├── versions/
│   ├── v1.4.0/
│   ├── v1.4.1/
│   └── v1.4.2/                        # Staging directory
└── trust_anchor.json
  

The Swap Procedure

  1. Download the new bundle into versions/v1.4.2/.
  2. Verify the bundle signature and file hashes using BundleVerifier.verify_bundle().
  3. Atomically update previous to point to the current version.
  4. Atomically update current to point to versions/v1.4.2/.
  5. Restart the EdgeAgent systemd service.

Steps 3 and 4 use os.symlink() followed by os.replace(), which is atomic on Linux:

import os
from pathlib import Path

def atomic_swap(base_dir: Path, new_version: str) -> None:
    versions_dir = base_dir / "versions"
    current_link = base_dir / "current"
    previous_link = base_dir / "previous"
    new_target = versions_dir / new_version

    if not new_target.is_dir():
        raise RuntimeError(f"Staging directory missing: {new_target}")

    # 1. Point 'previous' to whatever 'current' points to now
    temp_previous = base_dir / ".previous.tmp"
    if current_link.is_symlink():
        temp_previous.symlink_to(os.readlink(current_link))
        os.replace(temp_previous, previous_link)

    # 2. Point 'current' to the new version
    temp_current = base_dir / ".current.tmp"
    temp_current.symlink_to(str(new_target))
    os.replace(temp_current, current_link)
  

Rollback on Failure

After swapping, the device must confirm the new version is healthy before committing to it. If the health check fails, the device reverts the symlink and restarts the agent.

Health Check After Update

The EdgeAgent exposes GET /health, which returns a JSON payload built in main.py. A successful update must satisfy:

  • status == "healthy"
  • metrics.cpu_percent < 95
  • metrics.disk_percent < 95
  • cloud.queue_depth < 10000 (no immediate sync backlog explosion)
  • agent.running == true

Rollback Procedure

import time
import requests
from pathlib import Path

def rollback(base_dir: Path) -> None:
    current_link = base_dir / "current"
    previous_link = base_dir / "previous"

    if not previous_link.is_symlink():
        raise RuntimeError("No previous version to roll back to")

    temp_current = base_dir / ".current.tmp"
    temp_current.symlink_to(os.readlink(previous_link))
    os.replace(temp_current, current_link)
    # systemd will restart the agent after this function exits

def verify_health(endpoint: str = "http://127.0.0.1:8080/health", timeout: float = 30.0) -> bool:
    deadline = time.time() + timeout
    while time.time() < deadline:
        try:
            resp = requests.get(endpoint, timeout=5)
            data = resp.json()
            if data.get("status") == "healthy" and data["agent"]["running"]:
                return True
        except Exception:
            pass
        time.sleep(2)
    return False
  

Update Channels

Not every device should receive bleeding-edge builds. Three channels let you stage risk:

Channel Purpose Fleet %
stable Battle-tested releases. Receive only patch and minor updates after a 48-hour canary bake period. 90%
beta Pre-release validation on representative hardware in real environments. 9%
canary Immediate deployment of every merged main branch build. Used to detect regressions before they reach beta. 1%

Each device stores its channel in /opt/pyvorin-edge/channel. The OTA poller reads this file and queries the update server with a ?channel= parameter.

Complete OTA Update Flow

The following Python script is a self-contained OTA updater that runs on the device. It downloads, verifies, stages, swaps, health-checks, and rolls back — all using real SDK classes.

#!/usr/bin/env python3
"""OTA updater for Pyvorin Edge devices.

Uses BundleSigner/BundleVerifier from the SDK and implements atomic
symlink swap with automatic rollback on health check failure.
"""

from __future__ import annotations

import argparse
import hashlib
import json
import os
import shutil
import sys
import tempfile
import time
from pathlib import Path
from typing import Any, Dict

import requests

from pyvorin_edge.packaging.signer import BundleSigner, BundleVerificationError
from pyvorin_edge.packaging.verifier import BundleVerifier


class OTAUpdater:
    """Device-side OTA update orchestrator."""

    def __init__(self, base_dir: str, update_server: str) -> None:
        self.base_dir = Path(base_dir).resolve()
        self.update_server = update_server.rstrip("/")
        self.versions_dir = self.base_dir / "versions"
        self.current_link = self.base_dir / "current"
        self.previous_link = self.base_dir / "previous"
        self.verifier = BundleVerifier()

    def _local_version(self) -> str:
        manifest_path = self.current_link / "manifest.json"
        if not manifest_path.is_file():
            return "0.0.0"
        with open(manifest_path, "r", encoding="utf-8") as f:
            data = json.load(f)
        return data.get("manifest", {}).get("version", "0.0.0")

    def _channel(self) -> str:
        channel_file = self.base_dir / "channel"
        if channel_file.is_file():
            return channel_file.read_text().strip()
        return "stable"

    def _download_bundle(self, version: str, dest: Path) -> None:
        url = f"{self.update_server}/bundles/{version}.tar.gz"
        resp = requests.get(url, stream=True, timeout=120)
        resp.raise_for_status()

        dest.parent.mkdir(parents=True, exist_ok=True)
        with tempfile.NamedTemporaryFile(delete=False, dir=dest.parent) as tmp:
            for chunk in resp.iter_content(chunk_size=8192):
                tmp.write(chunk)
            tmp_path = tmp.name

        # Verify checksum if server provides one
        expected_hash = resp.headers.get("X-Bundle-Hash")
        if expected_hash:
            actual_hash = hashlib.sha256(open(tmp_path, "rb").read()).hexdigest()
            if actual_hash != expected_hash:
                os.unlink(tmp_path)
                raise BundleVerificationError("Download hash mismatch")

        shutil.unpack_archive(tmp_path, dest)
        os.unlink(tmp_path)

    def _verify_staging(self, staging_dir: Path) -> None:
        """Verify bundle integrity at runtime using the trust anchor."""
        self.verifier.verify_at_runtime(staging_dir)

    def _atomic_swap(self, new_version: str) -> None:
        staging = self.versions_dir / new_version
        if not staging.is_dir():
            raise RuntimeError(f"Staging directory missing: {staging}")

        temp_previous = self.base_dir / ".previous.tmp"
        if self.current_link.is_symlink():
            temp_previous.symlink_to(os.readlink(self.current_link))
            os.replace(temp_previous, self.previous_link)

        temp_current = self.base_dir / ".current.tmp"
        temp_current.symlink_to(str(staging))
        os.replace(temp_current, self.current_link)

    def _rollback(self) -> None:
        if not self.previous_link.is_symlink():
            raise RuntimeError("No previous version available for rollback")
        temp_current = self.base_dir / ".current.tmp"
        temp_current.symlink_to(os.readlink(self.previous_link))
        os.replace(temp_current, self.current_link)

    def _health_check(self, timeout: float = 60.0) -> bool:
        endpoint = "http://127.0.0.1:8080/health"
        deadline = time.time() + timeout
        while time.time() < deadline:
            try:
                resp = requests.get(endpoint, timeout=5)
                data = resp.json()
                if (
                    data.get("status") == "healthy"
                    and data["agent"]["running"]
                    and data["metrics"]["cpu_percent"] < 95
                    and data["metrics"]["disk_percent"] < 95
                    and data["cloud"]["queue_depth"] < 10000
                ):
                    return True
            except Exception:
                pass
            time.sleep(3)
        return False

    def run(self) -> int:
        current_version = self._local_version()
        channel = self._channel()
        print(f"Current version: {current_version}  Channel: {channel}")

        # 1. Check for update
        resp = requests.get(
            f"{self.update_server}/check",
            params={"version": current_version, "channel": channel},
            timeout=30,
        )
        resp.raise_for_status()
        update_info: Dict[str, Any] = resp.json()

        if not update_info.get("available"):
            print("No update available.")
            return 0

        new_version = update_info["version"]
        print(f"Update available: {new_version}")

        # 2. Download to staging
        staging_dir = self.versions_dir / new_version
        if staging_dir.exists():
            shutil.rmtree(staging_dir)
        self._download_bundle(new_version, staging_dir)

        # 3. Verify
        try:
            self._verify_staging(staging_dir)
            print("Bundle verification passed.")
        except BundleVerificationError as exc:
            print(f"Bundle verification failed: {exc}")
            shutil.rmtree(staging_dir)
            return 1

        # 4. Atomic swap
        self._atomic_swap(new_version)
        print(f"Swapped to {new_version}. Restarting agent...")

        # 5. Restart agent (caller must handle systemd restart)
        # In production, this script is invoked by systemd with:
        #   ExecStart=/usr/local/bin/ota-updater.py
        #   ExecStopPost=/usr/bin/systemctl restart pyvorin-edge
        # For this example we simulate a restart notification:
        print("Agent restart triggered. Waiting for health check...")
        time.sleep(5)  # Allow systemd to restart the service

        # 6. Health check
        if self._health_check():
            print("Health check passed. Update committed.")
            return 0
        else:
            print("Health check FAILED. Rolling back...")
            self._rollback()
            print("Rollback complete. Agent will restart to previous version.")
            return 1


def main() -> int:
    parser = argparse.ArgumentParser(description="Pyvorin Edge OTA Updater")
    parser.add_argument("--base-dir", default="/opt/pyvorin-edge", help="Base install directory")
    parser.add_argument("--server", required=True, help="Update server base URL")
    args = parser.parse_args()

    updater = OTAUpdater(base_dir=args.base_dir, update_server=args.server)
    return updater.run()


if __name__ == "__main__":
    sys.exit(main())
  

Integration with BundleVerifier.verify_at_runtime()

The verifier class in /var/www/pyvorin/edge_sdk/pyvorin_edge/packaging/verifier.py is called both during the OTA staging step and on every agent startup. The runtime verification flow is:

  1. Load manifest.json from the bundle directory.
  2. Verify the manifest block exists and is well-formed.
  3. For every file listed in manifest.files, compute SHA-256 and compare against the expected hash.
  4. Log each verification result. If any file is missing or mismatched, raise BundleVerificationError.

This is exactly what OTAUpdater._verify_staging() delegates to:

# From /var/www/pyvorin/edge_sdk/pyvorin_edge/packaging/verifier.py

class BundleVerifier:
    def verify_at_runtime(self, bundle_dir: str | Path) -> bool:
        bundle_path = Path(bundle_dir).resolve()
        manifest_path = bundle_path / "manifest.json"

        if not manifest_path.is_file():
            raise BundleVerificationError(f"Manifest not found: {manifest_path}")

        with open(manifest_path, "r", encoding="utf-8") as f:
            signed_manifest: dict[str, Any] = json.load(f)

        manifest = signed_manifest.get("manifest")
        if manifest is None:
            raise BundleVerificationError("Malformed manifest: missing manifest block")

        files_info: dict[str, str] = manifest.get("files", {})
        all_valid = True
        for relative_path, expected_hash in files_info.items():
            file_path = bundle_path / relative_path
            if not file_path.is_file():
                logger.error("Missing file during runtime verification: %s", relative_path)
                all_valid = False
                continue
            actual_hash = self._hash_file(file_path)
            if actual_hash != expected_hash:
                logger.error("Hash mismatch for %s", relative_path)
                all_valid = False

        if not all_valid:
            raise BundleVerificationError("Runtime bundle verification failed")
        return True
  

Server-Side Manifest Endpoint

The update server must return a JSON payload that the OTA poller can parse. A minimal example:

from flask import Flask, request, jsonify

app = Flask(__name__)

VERSIONS = {
    "stable": "1.4.1",
    "beta": "1.4.2-rc2",
    "canary": "1.5.0-dev.3",
}

@app.route("/check")
def check():
    current = request.args.get("version", "0.0.0")
    channel = request.args.get("channel", "stable")
    latest = VERSIONS.get(channel, VERSIONS["stable"])
    return jsonify({
        "available": latest != current,
        "version": latest,
        "channel": channel,
        "download_url": f"/bundles/{latest}.tar.gz",
        "hash_url": f"/bundles/{latest}.sha256",
    })
  

Summary

A secure OTA pipeline for Pyvorin Edge requires four pillars: cryptographic signing (Ed25519 via BundleSigner), runtime verification (via BundleVerifier.verify_at_runtime()), atomic filesystem swaps, and automatic rollback driven by the agent's own /health endpoint. Separate releases into stable, beta, and canary channels to control blast radius. Never deploy an update you cannot roll back in under 30 seconds.