Skip to content

Properties and verification

A loaded ONNX model under Hydronnx is an ordinary Chelis function, so it takes ordinary Chelis @property blocks. chelis prove verifies them by sampling inputs from the property's forall quantifier and checking the condition on each, turning "the model probably behaves like this" into a checked invariant.

This document walks through the worked property from the example (probability_property.ch.snippet), explains the single-file constraint on chelis prove, and shows what a failing property looks like.

The worked classifier at examples/tabular_classifier/iris_classifier.onnx ends in Softmax, so its three output scores are a genuine probability distribution and must sum to 1.0 for any input. That is the property to verify. The snippet:

@property output_is_a_probability_distribution forall(features: tensor[1, 4, f32]):
{
probs = forward(features)
total = tensor_to_scalar(sum(sum(probs, 1), 0))
_ = drop(features)
((total >= 0.999) && (total <= 1.001))
}
with samples = 64
with seed = 0

The shape is conventional Chelis: the property names a universally quantified input (features), names the function under verification (forward), and reduces to a Boolean. The with samples = 64 with seed = 0 clauses pin the sampler's budget and seed, so a clean run is deterministically reproducible.

chelis prove operates on the single file you hand it. It does not resolve cross-module imports. A @property that calls forward must live in the same .ch file as forward. Every property probe in the chelis spike repo is self-contained for the same reason.

The example does this by appending the property to the emitted module in a fresh build context:

Terminal window
cd examples/tabular_classifier
cat probability_property.ch.snippet >> src/irisclassifier.ch
chelis prove src/irisclassifier.ch --samples 64 --seed 0

Expected output:

property: output_is_a_probability_distribution -- 64/64 passed
1 passed, 0 failed, 0 unsupported, 0 errors

64 of 64 samples satisfied the invariant: the property is verified for this sampler budget.

To restore the example to its pristine starting state, re-run chelis-hydronnx-emit (which overwrites the appended snippet) or git restore src/irisclassifier.ch. The example's acceptance test does the append in a tempdir, so the committed file stays clean.

The worked classifier's three outputs come from a Softmax, so the sum-to-1 invariant should hold by construction. chelis prove checks that the emitted host-runtime evaluation actually produces a sum within the tolerance band the property declares, for 64 random inputs. If an emitter bug ever lost numerical fidelity (or the final operator was misemitted), the property would flag it as a counterexample. The property turns the emitted function into something whose behavior you can formally verify, not just informally trust.

A failing property: what chelis prove reports

Section titled “A failing property: what chelis prove reports”

To see what chelis prove does when a property does not hold, append a property that asserts the same total is >= 1.999 (a false statement for a softmax output, which sums to 1):

@property output_sums_to_two forall(features: tensor[1, 4, f32]):
{
probs = forward(features)
total = tensor_to_scalar(sum(sum(probs, 1), 0))
_ = drop(features)
(total >= 1.999)
}
with samples = 64
with seed = 0

chelis prove on the file containing both properties prints (verbatim, all on stdout):

property: output_is_a_probability_distribution -- 64/64 passed
property failure: output_sums_to_two
--> src/irisclassifier.ch
1 passed, 1 failed, 0 unsupported, 0 errors

and exits 1. The --> <path> line points at the file containing the failing property: chelis prove's diagnostic for "this is where the failure was raised". chelis does not currently print the counterexample tuple itself; the failed-property line plus the path is the evidence. The acceptance test exercises exactly this output_sums_to_two failure to confirm the verification machinery is real, not hollow.

In practice you write the true property and chelis prove either verifies it (no counterexample within the sample budget) or reports a property failure: <name> for an invariant the model does not satisfy. A failure during development tells you the model's behavior deviates from your stated invariant: a real defect, not an artifact. A clean run gives you a statistical guarantee bounded by the sampler budget; widen --samples for a stronger result.

The property machinery is chelis's, not Hydronnx's. Hydronnx adds nothing to the @property surface beyond making forward available as a typed function the property can quantify over. In particular:

  • Input quantifier types. forall(<name>: <type>) admits any chelis type. For an emitted forward, the natural choice is the function's declared input type, the example's tensor[1, 4, f32]. chelis prove samples random tensors of that type.
  • Symbolic tensor dims are not sampleable. chelis's sampler rejects a symbolic tensor dim. If your emitted forward has a symbolic batch dim, you cannot quantify directly over tensor[batch, ...]. Bind batch to a concrete value first, or use a fully-static fixture. The worked example uses a concrete [1, 4], sidestepping the issue.
  • Property body. Any chelis-expressible Boolean. The example's body uses sum, tensor_to_scalar, and comparison builtins, all standard chelis surface. The _ = drop(features) line discards the unused input (chelis tensors are affine).
  • with samples and with seed. Standard chelis property clauses. samples is the budget; seed makes a run reproducible.

A property on forward quantifies over forward's input type. A property on a composed function (classify(raw) in the example) would quantify over tensor[1, 4, f32] (the raw input) and assert something about the int64 output. The composability comes for free because the composed function is itself an ordinary Chelis function.

The single-file constraint applies to whatever the property calls into. If classify is in a different module from forward, both definitions need to land in the same file passed to chelis prove (for example by concatenation in a build script). The example sidesteps this by attaching the property directly to the emitted module.

  • Hydronnx does not emit properties. Properties are user-written. The emit CLI never produces a @property block; it produces the typed forward that the property quantifies over.
  • No cross-module chelis prove. This is a chelis-side constraint; the workaround is the single-file append shown above.
  • No symbolic-tensor-dim quantification (see above). Concrete shapes only.
  • Property verification over AD needs a matching AD gate. Hydronnx gates grad(forward) for the weight-free fixture and the weighted tabular Gemm fixture. Properties on broader weighted models should be paired with a model-specific AD or finite-difference gate first. See Limitations and scope. Properties on the raw forward are fine.