Correctness Validation
Pyvorin treats correctness as non-negotiable. Every compiled result is checked against CPython ground truth before any performance claim is accepted.
The Correctness Pipeline
- CPython reference run: The script is executed with standard CPython to produce the canonical output and return value.
- Pyvorin compilation: The same script is compiled and executed via the native backend.
- Hash comparison: Outputs are serialised and hashed. If the hashes match,
correctness_match=True. - Result storage: The hash, output, and match status are written to the benchmark event for audit.
Validation Methods
| Method | When used |
|---|---|
| Exact equality | Scalar return values (int, float, str) |
| Structural hash | Lists, dicts, tuples (recursive normalisation) |
| Text diff | Printed output captured from stdout |
| Mathematical validator | Known analytical results (e.g., sum of series) |
What Happens on Mismatch
If the Pyvorin result differs from CPython:
- The benchmark is marked FAIL regardless of runtime.
correctness_match=Falseis recorded.- The mismatch is reported to the failure telemetry pipeline.
- No speedup claim is accepted for that workload until the bug is fixed.
Known Correctness Gaps
A small number of constructs have accepted correctness gaps documented in the test suite. These are explicitly tracked as xfail and do not affect the primary compile() path:
- Nested generator expressions in certain edge cases
- Enumerate over float lists in tight loops
- Bare generator expressions that silently compile to zero
These gaps are fixed in priority order and are never silently wrong on the supported compile path.
Continuous Correctness Regression Tests
The full test suite (1,400+ tests) and benchmark suite run correctness parity checks on every release. Zero failures is the expected state. Any new failure blocks the release pipeline until root-caused.
Best Practices for Users
- Always run
pyvorin-thin benchmarkwith correctness validation enabled (default). - Compare outputs manually when migrating critical workloads.
- Report mismatches via
pyvorin-thin report-failureso they can be investigated.