Privacy Filtering Overview
Why the Pyvorin Edge Runtime filters sensitive data at the edge before it ever reaches the cloud, and how the PrivacyPolicy class enforces mask, drop, hash, and allow rules.
Published Jun 2, 2026
Privacy Starts at the Edge
In a world where sensor data crosses organisational, jurisdictional, and regulatory boundaries,
the safest place to enforce privacy is before the packet leaves the device.
The Pyvorin Edge Runtime treats privacy not as an afterthought in the cloud data lake, but as a
first-class pipeline stage that executes locally on the gateway. By the time a batch of readings
reaches your HTTPS endpoint, every field has already been evaluated against a declarative
PrivacyPolicy that knows exactly which sensors are too hot to transmit.
This article explains the architectural rationale for edge-side privacy filtering, the regulatory
frameworks that drive it, and the concrete Python API you use to configure rules. We cover the
four action types supported by the runtime — mask, drop, hash,
and allow — the evaluation order that determines which rule wins when patterns overlap,
and the measured performance overhead of applying these rules to every single reading that flows
through the pipeline.
Why Filter at the Edge?
The traditional approach to data privacy is "collect everything, filter later." That model is simple to implement, but it creates three categories of risk that grow exponentially with fleet size:
- Regulatory exposure. Under GDPR Article 25, data controllers must implement data-protection principles "by design and by default." If personal data leaves the European Economic Area unfiltered, you are already in scope for cross-border transfer mechanisms, adequacy decisions, and potential supervisory authority audits. Filtering at the edge means the personal data never leaves the device in the first place — dramatically reducing your compliance surface area.
- Network and storage cost. A factory floor with ten thousand temperature and vibration sensors can generate terabytes of telemetry per month. If even five percent of that data contains personally identifiable information (PII) that must later be deleted under a data-subject request, you have paid to egress, store, index, and back up data that ultimately has negative value. Dropping it at source eliminates that waste entirely.
- Incident blast radius. If a cloud database is breached and the attacker exfiltrates a year of readings, the presence of unredacted MAC addresses, employee names, or precise GPS coordinates transforms a routine security incident into a mandatory breach notification under GDPR Article 33 or HIPAA § 164.404. Edge filtering ensures that the worst-case data set is already sanitised.
Regulatory Context: GDPR, HIPAA, and NIS2
The Pyvorin Edge privacy engine was designed with three major compliance frameworks in mind. You do not need to be in scope for all three to benefit from the architecture; the same filtering primitives satisfy overlapping requirements.
GDPR (General Data Protection Regulation)
GDPR distinguishes between controllers and processors. If your edge gateway
collects temperature readings that include room occupancy inferred from motion sensors, you are
processing personal data. Article 5(1)(c) requires data minimisation: you must not collect more
than necessary. The drop and redact actions directly enforce this.
Article 30 obliges you to keep a record of processing activities; the Edge Runtime's
PrivacyAudit chain (covered in a later article) generates that record automatically.
HIPAA (Health Insurance Portability and Accountability Act)
In healthcare IoT, a wearable pulse-oximeter or a hospital-bed pressure sensor can produce
Protected Health Information (PHI). The HIPAA Privacy Rule's Minimum Necessary Standard (45 CFR
164.502(b)) requires covered entities to make reasonable efforts to limit PHI to the minimum
necessary. The mask action, which replaces a string with a partially redacted
representation (e.g., Jo**ith), is ideal for clinical identifiers that must remain
human-readable internally but should not traverse public networks in full.
NIS2 (Network and Information Security Directive)
NIS2 imposes risk-management obligations on operators of essential services and important
entities. Article 21 requires " appropriate and proportionate technical and organisational
measures to manage the risks posed to the security of network and information systems."
Transmitting raw sensor identifiers that could be used to map critical infrastructure layout
violates that proportionality. Hashing (hash) replaces the raw identifier with a
deterministic pseudonym, preserving correlation analytics while stripping reconnaissance value.
The PrivacyPolicy Class
The runtime exposes two complementary privacy APIs. The simpler of the two is the
PrivacyPolicy class defined in pyv_edge_agent/privacy.py. It operates
on individual SensorReading objects and supports wildcard matching against the
sensor_name field using Python's fnmatch module.
from pyv_edge_agent.privacy import PrivacyPolicy, PrivacyRule
from pyv_edge_agent.types import SensorReading
# Build a policy that masks temperature sensors and drops anything
# matching the HR (heart-rate) namespace entirely.
policy = PrivacyPolicy(
enabled=True,
rules=[
PrivacyRule(sensor_pattern="temp.*", action="mask"),
PrivacyRule(sensor_pattern="hr.*", action="drop"),
PrivacyRule(sensor_pattern="motion.lounge", action="hash"),
],
default_action="allow",
)
reading = SensorReading(
sensor_name="temp.living_room",
timestamp=1717000000.0,
value=22.5,
unit="celsius",
metadata={"floor": "1", "tenant_id": "acme-corp"},
)
out = policy.evaluate(reading)
# out.value is now 0.0 and out.unit is "masked"
# out.metadata is empty because mask returns a stripped SensorReading
The PrivacyPolicy dataclass has three fields:
enabled: bool = True— A master switch. WhenFalse,evaluate()returns the reading unchanged.rules: List[PrivacyRule]— Ordered list of rules evaluated sequentially.default_action: str = "allow"— Applied only if no rule matches. Can be"allow"or"drop". A default of"drop"implements an explicit-allowlist posture.
Rule Types in Detail
mask — Replace with Sentinel
The mask action returns a new SensorReading whose value is
replaced with 0.0, whose unit is set to the literal string
"masked", and whose metadata dict is emptied. The original
sensor_name and timestamp are preserved so that downstream windows and
aggregators still see a datum at the correct temporal coordinate, but the sensitive payload is
gone.
Use mask when you want to retain the shape of the data stream (e.g., to
keep a ten-second sampling cadence visible in a time-series plot) while removing the
content. It is the right choice for temperature, humidity, or light-level sensors that
are not themselves sensitive, but whose values might be correlated with personal presence.
drop — Eliminate Entirely
The drop action causes evaluate() to return None. The
reading is removed from the stream entirely; no downstream window, rule, or cloud uploader ever
sees it. This is the most aggressive action and is appropriate for:
- Raw biometric streams (fingerprint, iris, voice-print sensors).
- Audio or video feeds captured by edge cameras or microphones.
- Diagnostic debug channels that accidentally include stack traces with file paths or usernames.
hash — Deterministic Pseudonymisation
The hash action replaces the sensor_name with a truncated SHA-256
digest prefixed by sensor_. For example, motion.lounge becomes
sensor_a3f7b2d1e8c9a4b5. The value, unit, and
timestamp are preserved, but metadata is stripped.
Because the hash is deterministic, two readings from the same physical sensor will share the same pseudonym. This lets you run correlation analytics ("how often does sensor X trigger within five minutes of sensor Y?") without revealing the floor plan encoded in the original name. If you need keyed hashing (HMAC) instead of raw SHA-256, see the dedicated article on Hashing Strategies.
allow — Pass Through Unchanged
allow is not an explicit action in the PrivacyRule dataclass; rather,
it is the implicit behaviour when no rule matches and default_action="allow". The
reading is returned exactly as received. In high-trust environments where only a handful of
known-bad sensors must be blocked, this default minimises CPU overhead because the majority of
readings fall through without mutation.
Evaluation Order
Rules are evaluated in the order they appear in the rules list. The first matching
rule wins; subsequent rules are ignored for that reading. This is a critical design decision
because it gives you deterministic, auditable behaviour.
# WRONG: the broad wildcard matches first and the specific rule is never reached.
wrong_policy = PrivacyPolicy(rules=[
PrivacyRule(sensor_pattern="*", action="drop"), # drops everything
PrivacyRule(sensor_pattern="temp.safe", action="allow"), # unreachable
])
# RIGHT: specific rules first, catch-all last.
right_policy = PrivacyPolicy(rules=[
PrivacyRule(sensor_pattern="temp.safe", action="allow"),
PrivacyRule(sensor_pattern="*", action="drop"),
])
The wildcard matcher uses fnmatch.fnmatch, which supports:
*— matches any sequence of characters.?— matches exactly one character.[seq]— matches any character in seq.[!seq]— matches any character not in seq.
Performance Impact
Privacy evaluation happens on the hot path: every SensorReading that enters the
pipeline is passed through the policy before it reaches windowing, rule evaluation, or cloud
upload. The runtime is optimised to keep this overhead negligible on ARM64 gateways such as the
Raspberry Pi 5.
In controlled benchmarks on a Raspberry Pi 5 running at 2.4 GHz with active cooling:
- A policy with 10 rules and
default_action="allow"adds approximately 0.8 µs per reading when no rule matches (the common case). - A 100-rule policy adds approximately 6.5 µs per reading. This is still below one percent of the typical five-second sensor poll interval.
maskandhashactions that mutate the reading add an additional 1.2–2.0 µs because they allocate a newSensorReadingdataclass instance.dropis the cheapest mutation because it simply returnsNonewithout allocation.
If you need field-level redaction rather than whole-sensor matching — for example, redacting
only the tenant_id metadata field while keeping the temperature value — you should
use the more advanced PrivacyPolicyEngine discussed in the Field Redaction
Patterns article. The engine is slightly heavier (≈ 3 µs per field checked) because it
performs per-key wildcard matching against the metadata dictionary, but it gives you
substantially finer control.
Configuring Privacy in config.toml
The Edge Runtime's configuration loader expects a top-level privacy section. Rules
declared here are parsed at startup and injected into the agent's pipeline automatically.
[privacy]
enabled = true
default_action = "allow"
[[privacy.rules]]
sensor_pattern = "hr.*"
action = "drop"
[[privacy.rules]]
sensor_pattern = "camera.*"
action = "drop"
[[privacy.rules]]
sensor_pattern = "motion.*"
action = "hash"
[[privacy.rules]]
sensor_pattern = "temp.*"
action = "mask"
The Config class validates that the section exists and that enabled is
a boolean. If cloud.enabled is true but no cloud.endpoint is provided,
validation emits a warning because the most common reason to enable cloud sync is to send
filtered data upstream — and sending unfiltered data upstream when privacy rules are configured
is usually a mistake.
Best Practices
-
Start with
default_action="drop"in green-field deployments. This forces you to explicitly whitelist every sensor that may leave the device. It is far easier to relax a deny-by-default policy than to discover a leaky sensor name in production. -
Use deterministic sensor naming conventions. If your sensor names encode
hierarchy (
building.floor.room.device.metric), you can write precise wildcards such as*.office.*.occupancywithout accidentally matchingwarehouse.office.supplies.count. -
Audit rule changes. Every modification to the privacy policy should be
logged. The
PrivacyAuditchain records policy reload events with a SHA-256 record hash, making tampering detectable. See the Audit Chain and Signed Reports article for details. -
Test policies offline. Load a TOML configuration into a
PrivacyPolicyinstance in a unit test, pass synthetic readings throughevaluate(), and assert on the outputs. Because the evaluator is pure (no I/O), tests run in milliseconds.
Summary
Privacy filtering in Pyvorin Edge is not a bolt-on feature; it is a pipeline stage that executes
before any data leaves the device. The PrivacyPolicy class provides a lightweight,
ordered rule engine with four actions — mask, drop, hash,
and allow — that map directly to GDPR data-minimisation, HIPAA minimum-necessary,
and NIS2 risk-management requirements. With sub-microsecond overhead per reading and
declarative TOML configuration, you can enforce strong privacy guarantees without sacrificing
throughput or operational simplicity.