Pyvorin Docs

Correctness Validation

Pyvorin treats correctness as non-negotiable. Every compiled result is checked against CPython ground truth before any performance claim is accepted.

The Correctness Pipeline

  1. CPython reference run: The script is executed with standard CPython to produce the canonical output and return value.
  2. Pyvorin compilation: The same script is compiled and executed via the native backend.
  3. Hash comparison: Outputs are serialised and hashed. If the hashes match, correctness_match=True.
  4. Result storage: The hash, output, and match status are written to the benchmark event for audit.

Validation Methods

MethodWhen used
Exact equalityScalar return values (int, float, str)
Structural hashLists, dicts, tuples (recursive normalisation)
Text diffPrinted output captured from stdout
Mathematical validatorKnown analytical results (e.g., sum of series)

What Happens on Mismatch

If the Pyvorin result differs from CPython:

  • The benchmark is marked FAIL regardless of runtime.
  • correctness_match=False is recorded.
  • The mismatch is reported to the failure telemetry pipeline.
  • No speedup claim is accepted for that workload until the bug is fixed.

Known Correctness Gaps

A small number of constructs have accepted correctness gaps documented in the test suite. These are explicitly tracked as xfail and do not affect the primary compile() path:

  • Nested generator expressions in certain edge cases
  • Enumerate over float lists in tight loops
  • Bare generator expressions that silently compile to zero

These gaps are fixed in priority order and are never silently wrong on the supported compile path.

Continuous Correctness Regression Tests

The full test suite (1,400+ tests) and benchmark suite run correctness parity checks on every release. Zero failures is the expected state. Any new failure blocks the release pipeline until root-caused.

Best Practices for Users

  • Always run pyvorin-thin benchmark with correctness validation enabled (default).
  • Compare outputs manually when migrating critical workloads.
  • Report mismatches via pyvorin-thin report-failure so they can be investigated.