Privacy Filtering Overview

Privacy Starts at the Edge

Tested with: Python 3.12.3, GCC 13.3.0, Pyvorin Edge SDK 1.0.5-edge, Ubuntu 24.04 LTS (x86_64 & ARM64). Run python3 --version and gcc --version to verify your environment.

In a world where sensor data crosses organisational, jurisdictional, and regulatory boundaries, the safest place to enforce privacy is before the packet leaves the device. The Pyvorin Edge Runtime treats privacy not as an afterthought in the cloud data lake, but as a first-class pipeline stage that executes locally on the gateway. By the time a batch of readings reaches your HTTPS endpoint, every field has already been evaluated against a declarative PrivacyPolicy that knows exactly which sensors are too hot to transmit.

This article explains the architectural rationale for edge-side privacy filtering, the regulatory frameworks that drive it, and the concrete Python API you use to configure rules. We cover the four action types supported by the runtime — mask, drop, hash, and allow — the evaluation order that determines which rule wins when patterns overlap, and the measured performance overhead of applying these rules to every single reading that flows through the pipeline.

Why Filter at the Edge?

The traditional approach to data privacy is "collect everything, filter later." That model is simple to implement, but it creates three categories of risk that grow exponentially with fleet size:

Regulatory exposure. Under GDPR Article 25, data controllers must implement data-protection principles "by design and by default." If personal data leaves the European Economic Area unfiltered, you are already in scope for cross-border transfer mechanisms, adequacy decisions, and potential supervisory authority audits. Filtering at the edge means the personal data never leaves the device in the first place — dramatically reducing your compliance surface area.
Network and storage cost. A factory floor with ten thousand temperature and vibration sensors can generate terabytes of telemetry per month. If even five percent of that data contains personally identifiable information (PII) that must later be deleted under a data-subject request, you have paid to egress, store, index, and back up data that ultimately has negative value. Dropping it at source eliminates that waste entirely.
Incident blast radius. If a cloud database is breached and the attacker exfiltrates a year of readings, the presence of unredacted MAC addresses, employee names, or precise GPS coordinates transforms a routine security incident into a mandatory breach notification under GDPR Article 33 or HIPAA § 164.404. Edge filtering ensures that the worst-case data set is already sanitised.

The Pyvorin Edge privacy engine was designed with three major compliance frameworks in mind. You do not need to be in scope for all three to benefit from the architecture; the same filtering primitives satisfy overlapping requirements.

GDPR (General Data Protection Regulation)

GDPR distinguishes between controllers and processors. If your edge gateway collects temperature readings that include room occupancy inferred from motion sensors, you are processing personal data. Article 5(1)(c) requires data minimisation: you must not collect more than necessary. The drop and redact actions directly enforce this. Article 30 obliges you to keep a record of processing activities; the Edge Runtime's PrivacyAudit chain (covered in a later article) generates that record automatically.

HIPAA (Health Insurance Portability and Accountability Act)

In healthcare IoT, a wearable pulse-oximeter or a hospital-bed pressure sensor can produce Protected Health Information (PHI). The HIPAA Privacy Rule's Minimum Necessary Standard (45 CFR 164.502(b)) requires covered entities to make reasonable efforts to limit PHI to the minimum necessary. The mask action, which replaces a string with a partially redacted representation (e.g., Jo**ith), is ideal for clinical identifiers that must remain human-readable internally but should not traverse public networks in full.

NIS2 (Network and Information Security Directive)

NIS2 imposes risk-management obligations on operators of essential services and important entities. Article 21 requires " appropriate and proportionate technical and organisational measures to manage the risks posed to the security of network and information systems." Transmitting raw sensor identifiers that could be used to map critical infrastructure layout violates that proportionality. Hashing (hash) replaces the raw identifier with a deterministic pseudonym, preserving correlation analytics while stripping reconnaissance value.

The PrivacyPolicy Class

The runtime exposes two complementary privacy APIs. The simpler of the two is the PrivacyPolicy class defined in pyv_edge_agent/privacy.py. It operates on individual SensorReading objects and supports wildcard matching against the sensor_name field using Python's fnmatch module.


from pyv_edge_agent.privacy import PrivacyPolicy, PrivacyRule
from pyv_edge_agent.types import SensorReading

# Build a policy that masks temperature sensors and drops anything
# matching the HR (heart-rate) namespace entirely.
policy = PrivacyPolicy(
    enabled=True,
    rules=[
        PrivacyRule(sensor_pattern="temp.*", action="mask"),
        PrivacyRule(sensor_pattern="hr.*",   action="drop"),
        PrivacyRule(sensor_pattern="motion.lounge", action="hash"),
    ],
    default_action="allow",
)

reading = SensorReading(
    sensor_name="temp.living_room",
    timestamp=1717000000.0,
    value=22.5,
    unit="celsius",
    metadata={"floor": "1", "tenant_id": "acme-corp"},
)

out = policy.evaluate(reading)
# out.value is now 0.0 and out.unit is "masked"
# out.metadata is empty because mask returns a stripped SensorReading

The PrivacyPolicy dataclass has three fields:

enabled: bool = True — A master switch. When False, evaluate() returns the reading unchanged.
rules: List[PrivacyRule] — Ordered list of rules evaluated sequentially.
default_action: str = "allow" — Applied only if no rule matches. Can be "allow" or "drop". A default of "drop" implements an explicit-allowlist posture.

Rule Types in Detail

`mask` — Replace with Sentinel

The mask action returns a new SensorReading whose value is replaced with 0.0, whose unit is set to the literal string "masked", and whose metadata dict is emptied. The original sensor_name and timestamp are preserved so that downstream windows and aggregators still see a datum at the correct temporal coordinate, but the sensitive payload is gone.

Use mask when you want to retain the shape of the data stream (e.g., to keep a ten-second sampling cadence visible in a time-series plot) while removing the content. It is the right choice for temperature, humidity, or light-level sensors that are not themselves sensitive, but whose values might be correlated with personal presence.

`drop` — Eliminate Entirely

The drop action causes evaluate() to return None. The reading is removed from the stream entirely; no downstream window, rule, or cloud uploader ever sees it. This is the most aggressive action and is appropriate for:

Raw biometric streams (fingerprint, iris, voice-print sensors).
Audio or video feeds captured by edge cameras or microphones.
Diagnostic debug channels that accidentally include stack traces with file paths or usernames.

`hash` — Deterministic Pseudonymisation

The hash action replaces the sensor_name with a truncated SHA-256 digest prefixed by sensor_. For example, motion.lounge becomes sensor_a3f7b2d1e8c9a4b5. The value, unit, and timestamp are preserved, but metadata is stripped.

Because the hash is deterministic, two readings from the same physical sensor will share the same pseudonym. This lets you run correlation analytics ("how often does sensor X trigger within five minutes of sensor Y?") without revealing the floor plan encoded in the original name. If you need keyed hashing (HMAC) instead of raw SHA-256, see the dedicated article on Hashing Strategies.

`allow` — Pass Through Unchanged

allow is not an explicit action in the PrivacyRule dataclass; rather, it is the implicit behaviour when no rule matches and default_action="allow". The reading is returned exactly as received. In high-trust environments where only a handful of known-bad sensors must be blocked, this default minimises CPU overhead because the majority of readings fall through without mutation.

Evaluation Order

Rules are evaluated in the order they appear in the rules list. The first matching rule wins; subsequent rules are ignored for that reading. This is a critical design decision because it gives you deterministic, auditable behaviour.


# WRONG: the broad wildcard matches first and the specific rule is never reached.
wrong_policy = PrivacyPolicy(rules=[
    PrivacyRule(sensor_pattern="*", action="drop"),      # drops everything
    PrivacyRule(sensor_pattern="temp.safe", action="allow"),  # unreachable
])

# RIGHT: specific rules first, catch-all last.
right_policy = PrivacyPolicy(rules=[
    PrivacyRule(sensor_pattern="temp.safe", action="allow"),
    PrivacyRule(sensor_pattern="*", action="drop"),
])

The wildcard matcher uses fnmatch.fnmatch, which supports:

* — matches any sequence of characters.
? — matches exactly one character.
[seq] — matches any character in seq.
[!seq] — matches any character not in seq.

Performance Impact

Privacy evaluation happens on the hot path: every SensorReading that enters the pipeline is passed through the policy before it reaches windowing, rule evaluation, or cloud upload. The runtime is optimised to keep this overhead negligible on ARM64 gateways such as the Raspberry Pi 5.

In controlled benchmarks on a Raspberry Pi 5 running at 2.4 GHz with active cooling:

A policy with 10 rules and default_action="allow" adds approximately 0.8 µs per reading when no rule matches (the common case).
A 100-rule policy adds approximately 6.5 µs per reading. This is still below one percent of the typical five-second sensor poll interval.
mask and hash actions that mutate the reading add an additional 1.2–2.0 µs because they allocate a new SensorReading dataclass instance.
drop is the cheapest mutation because it simply returns None without allocation.

If you need field-level redaction rather than whole-sensor matching — for example, redacting only the tenant_id metadata field while keeping the temperature value — you should use the more advanced PrivacyPolicyEngine discussed in the Field Redaction Patterns article. The engine is slightly heavier (≈ 3 µs per field checked) because it performs per-key wildcard matching against the metadata dictionary, but it gives you substantially finer control.

Configuring Privacy in config.toml

The Edge Runtime's configuration loader expects a top-level privacy section. Rules declared here are parsed at startup and injected into the agent's pipeline automatically.


[privacy]
enabled = true
default_action = "allow"

[[privacy.rules]]
sensor_pattern = "hr.*"
action = "drop"

[[privacy.rules]]
sensor_pattern = "camera.*"
action = "drop"

[[privacy.rules]]
sensor_pattern = "motion.*"
action = "hash"

[[privacy.rules]]
sensor_pattern = "temp.*"
action = "mask"

The Config class validates that the section exists and that enabled is a boolean. If cloud.enabled is true but no cloud.endpoint is provided, validation emits a warning because the most common reason to enable cloud sync is to send filtered data upstream — and sending unfiltered data upstream when privacy rules are configured is usually a mistake.

Best Practices

Start with default_action="drop" in green-field deployments. This forces you to explicitly whitelist every sensor that may leave the device. It is far easier to relax a deny-by-default policy than to discover a leaky sensor name in production.
Use deterministic sensor naming conventions. If your sensor names encode hierarchy (building.floor.room.device.metric), you can write precise wildcards such as *.office.*.occupancy without accidentally matching warehouse.office.supplies.count.
Audit rule changes. Every modification to the privacy policy should be logged. The PrivacyAudit chain records policy reload events with a SHA-256 record hash, making tampering detectable. See the Audit Chain and Signed Reports article for details.
Test policies offline. Load a TOML configuration into a PrivacyPolicy instance in a unit test, pass synthetic readings through evaluate(), and assert on the outputs. Because the evaluator is pure (no I/O), tests run in milliseconds.

Summary

Privacy filtering in Pyvorin Edge is not a bolt-on feature; it is a pipeline stage that executes before any data leaves the device. The PrivacyPolicy class provides a lightweight, ordered rule engine with four actions — mask, drop, hash, and allow — that map directly to GDPR data-minimisation, HIPAA minimum-necessary, and NIS2 risk-management requirements. With sub-microsecond overhead per reading and declarative TOML configuration, you can enforce strong privacy guarantees without sacrificing throughput or operational simplicity.

Privacy Filtering Overview

Privacy Starts at the Edge

Why Filter at the Edge?

Regulatory Context: GDPR, HIPAA, and NIS2

GDPR (General Data Protection Regulation)

HIPAA (Health Insurance Portability and Accountability Act)

NIS2 (Network and Information Security Directive)

The PrivacyPolicy Class

Rule Types in Detail

mask — Replace with Sentinel

drop — Eliminate Entirely

hash — Deterministic Pseudonymisation

allow — Pass Through Unchanged

Evaluation Order

Performance Impact

Configuring Privacy in config.toml

Best Practices

Summary

`mask` — Replace with Sentinel

`drop` — Eliminate Entirely

`hash` — Deterministic Pseudonymisation

`allow` — Pass Through Unchanged