guides

ML Inference Optimization

Speed up Python ML preprocessing, postprocessing, and inference code.

Published May 30, 2026

Preprocessing Acceleration

Model preprocessing (tokenisation, feature engineering, normalisation) is often pure Python and benefits greatly from compilation:

def preprocess(features):
    normalised = []
    for f in features:
        val = (f - mean) / std
        if val > 3.0:
            val = 3.0
        normalised.append(val)
    return normalised

Postprocessing

def postprocess(logits):
    probs = []
    total = sum(logits)
    for logit in logits:
        probs.append(logit / total)
    return probs

Batch Inference Orchestration

Pyvorin compiles the loop that batches and calls the model. The model inference itself (TensorFlow, PyTorch) runs in its optimised C++ runtime.

Limitations

  • GPU tensor operations are not compiled by Pyvorin.
  • Custom CUDA kernels remain untouched.
  • Focus on Python glue code around model calls.