Skip to content

Error handling

Hydronnx surfaces errors at three layers, in order of the workflow:

  1. chelis-hydronnx-inspect errors, surfaced when reading the .onnx. The CLI exits non-zero, no output is produced.
  2. chelis-hydronnx-emit errors. Emit fails for two distinct reasons: the same parse failures inspect can hit, plus emit-time failures (unsupported operator, weight conversion, a shape or signature the type system cannot express, and so on). The .ch is not produced; the CLI exits non-zero.
  3. Type-check-time and runtime errors, produced by chelis check, chelis test, and chelis prove. These are standard chelis errors; Hydronnx adds nothing to that layer.

This document covers layers 1 and 2. Every error message shown is taken verbatim from src/error.rs (HydronnxError) and src/translator/error.rs (TranslateError). The Display text is what the CLIs print after error: on stderr.

HydronnxError is a single enum, but its variants are constructed at different points in the pipeline. The split is by construction site, not by Rust type. The column is the CLI that can surface the variant:

HydronnxError variantConstructed ininspectemit
FileNotFoundsrc/parser.rsyesyes
FileCorruptedsrc/parser.rsyesyes
NotValidOnnxsrc/parser.rsyesyes
UnsupportedOpsetsrc/parser.rsyesyes
UnsupportedDtype (parse-time)src/parser.rsyesyes
Iosrc/parser.rsyesyes
UnsupportedOperatorsrc/emit/mod.rs (from OpEmitError::Unsupported)noyes
UnsupportedDtype (emit-time)src/emit/mod.rs (from SignatureError::UnsupportedDtype or WeightError::UnsupportedDtype)noyes
MalformedOperatorsrc/emit/mod.rs (from OpEmitError::Malformed)noyes
WeightConversionFailuresrc/emit/mod.rs (from WeightError)noyes
InternalInconsistencysrc/emit/mod.rs (emitter and signature invariants, plus the Dim::Dynamic path via SignatureError::DynamicDim)noyes
Translate(TranslateError)src/emit/ops/* via src/emit/mod.rsnoyes

Two things to read from the table:

  • inspect only ever raises the six parse-time variants. It does not run the operator translator or the emitter; an unsupported operator is not an error in inspect (it merely appears in the node list).
  • UnsupportedOperator is an emit-time error. It is constructed in src/emit/mod.rs when the operator-emit fan-out has no arm for the op-type. The parser does not gate against unknown operators; it carries the op-type string through to the IR, and the emitter is where unknown-op rejection happens.

These six variants are raised by parse_model (src/parser.rs). They fire before the operator translator runs, so both CLIs hit them.

error: file not found: <path>

The <path> is exactly what you passed on the CLI. Cause: the path does not exist or is unreadable. Resolution: check the path.

error: file corrupted: <path>: <reason>

The protobuf at <path> did not decode. Cause: truncated download, zero-byte file, a tool that wrote a JSON-ONNX file with an .onnx extension, and so on. The <reason> is the prost decoder's message.

Example (against tests/fixtures/truncated.onnx):

error: file corrupted: tests/fixtures/truncated.onnx: failed to decode Protobuf message: ModelProto.ir_version: invalid wire type: LengthDelimited (expected Varint)

Resolution: re-export the model or re-download.

error: not a valid ONNX file: <path>: <reason>

The protobuf decoded but the result is not a valid ONNX model (missing required fields, no graph, and so on). Resolution: re-export with a standard ONNX exporter (PyTorch torch.onnx.export, scikit-learn skl2onnx, and so on).

error: unsupported opset version <N> (supported: 11-23): <path>

Hydronnx supports ONNX opsets 11 through 23 inclusive. Older opsets are rejected because their operator semantics are too divergent; newer opsets are rejected because they have not been tested. Cause: your exporter chose an opset outside that band. Resolution: re-export naming a supported opset; most exporters accept an opset_version=17 (or similar) argument.

Example (against tests/fixtures/wrong_opset.onnx):

error: unsupported opset version 7 (supported: 11-23): tests/fixtures/wrong_opset.onnx
error: unsupported dtype '<dtype_name>' at node index <N>: <path>

Cause: the model contains an initializer or a graph input/output with an ONNX dtype outside the supported set. The set is f32, f64, i32, i64, bool; uint8, int8, bf16, fp16, complex, and string dtypes all reject at parse time. Resolution: cast to a supported dtype at export.

The same UnsupportedDtype variant is also raised at emit-time by operator-specific dtype checks (see below). The parse-time form catalogues the raw initializer or value-info dtype; the emit-time form catalogues an operator's input dtype rejection.

error: io error: <path>: <source>

Cause: an OS-level read failure (permissions, network filesystem hiccup). Resolution: check the path's permissions; retry.

Errors surfaced by chelis-hydronnx-emit only

Section titled “Errors surfaced by chelis-hydronnx-emit only”

These variants fire after the parse succeeds, when the emitter walks the IR. chelis-hydronnx-inspect never raises them.

error: unsupported operator '<op_type>' at node index <N>: <path> (see docs/limitations.md#deferred-operators)

Cause: the model uses an ONNX operator not in the emit set. Constructed in src/emit/mod.rs from OpEmitError::Unsupported, which fires when the operator-emit fan-out (src/emit/ops/mod.rs) has no arm for the op-type. The deferred operator spellings are catalogued in Supported ONNX surface. The <op_type> names the operator (for example 'Round') and <N> is its index in the graph's node list.

Example (against tests/fixtures/per_op/elementwise/Round.onnx):

error: unsupported operator 'Round' at node index 0: tests/fixtures/per_op/elementwise/Round.onnx (see docs/limitations.md#deferred-operators)

Resolution paths, in order of cost:

  1. If you wrote the model, re-export without the unsupported operator, or move that post-process out of ONNX into hand-written Chelis after the emit.
  2. If the deferral is one of the chelis-side blockers (Round or scatter primitives, or the MultiHeadAttention contrib-domain alias), the resolution is on chelis; see Supported ONNX surface.
  3. For now, those models are out of scope. ONNX Runtime is the right tool for them; see Migration from ONNX Runtime.

The same Display text as the parse-time form, but constructed at emit when the signature derivation or weight emission rejects a dtype:

error: unsupported dtype '<dtype_name>' at node index <N>: <path>

In the emit-time path, <N> is 0 (the emitter does not thread a node index for signature/weight rejections; the surrounding context comes from the per-operator TranslateError family below).

error: malformed operator at node index <N>: operator '<op_type>' (node '<node_name>') is malformed: <reason>: <path>

Cause: the op-type has an emitter, but this specific ONNX node is outside Hydronnx's supported surface: wrong input/output arity, an unbound value name, an unsupported attribute combination such as dilated or grouped Conv / ConvTranspose, non-equal Conv-family strides, or a shape the Surf lowering cannot express. When the failure is a documented deferral, <reason> points to the deferred operator section.

Resolution: read the <reason> first. If it points to a deferred operator, re-export the model without that attribute pattern. If it names malformed ONNX structure, fix the exporter or model file.

error: weight conversion failure for tensor '<tensor_name>': <reason>: <path>

Cause: an initializer's raw_data could not be turned into the typed Vec<f32> / Vec<i64> / and so on the emitter expects (size mismatch, non-finite values where the dtype rejects them, a weight tensor too large for the inline-literal emit). Constructed in src/emit/mod.rs from the WeightError family. Almost always a corrupted or hand-edited .onnx. Resolution: re-export.

error: internal inconsistency: <description>: <path>

Cause: an emitter or signature invariant failed after parser and per-operator validation. Multiple construction sites in src/emit/mod.rs carry a <description> that names the specific failure. The two cases a user is most likely to hit:

  • Anonymous dynamic dim (an ONNX dim_param with no dim_value and no name). The Dim::Dynamic is parsed silently by chelis-hydronnx-inspect (renders as ? in the shape display) but the emitter cannot express it in a typed tensor[...]. The exact message, against a synthetic fixture with one anonymous dim:

    error: internal inconsistency: cannot derive signature: value 'x' has an anonymous dynamic dimension, which Chelis's type system cannot express: <path>

    Resolution: re-export naming the dim explicitly (dim_param = "batch") or pinning it to a static value.

  • Module-name derivation failure, where the model's name, filename, and graph name all fail the chelis module-pascal-components lint and --module-name was not passed. Resolution: pass --module-name <Name> (see the CLI reference).

  • Graph-shape inconsistencies the parser does not gate: a graph output produced by no node, input, or initializer, or a --module-name that does not itself sanitize. Resolution: re-export.

Operator-translator errors (TranslateError)

Section titled “Operator-translator errors (TranslateError)”

These fire from the operator emitters (src/emit/ops/*) and wrap into HydronnxError::Translate. They share the multi-line format hydronnx: <category> plus key/value lines plus a spec pointer.

This is the per-operator translator's reason text, wrapped by HydronnxError::UnsupportedOperator for the fan-out boundary:

hydronnx: unsupported operator
operator: <node_name> (op_type: <op_type>)
reason: no translator for op_type `<op_type>`
spec: chelis/spec/design/hydronnx.md
hydronnx: unsupported attribute
operator (op_type: <op>)
attribute: <attr> = <value>
reason: attribute value not in the supported set
spec: chelis/spec/design/hydronnx.md

Most commonly: Gelu approximate="none" (only approximate="tanh" emits, since chelis has no erf primitive), or a Pad mode the emitter does not support.

hydronnx: unsupported dtype
operator (op_type: <op>)
dtype: <dtype>
reason: dtype not in supported set (f32, f64, i32, i64)
spec: chelis/spec/design/hydronnx.md
hydronnx: unsupported shape
operator (op_type: <op>)
reason: <reason>
spec: chelis/spec/design/hydronnx.md

Common reasons: MatMul rank greater than 2, Attention multi-head or batch greater than 1, rank-1 or rank-3 pooling (rank-4 only).

hydronnx: malformed ONNX node
operator: <node_name> (op_type: <op>)
reason: <reason>
spec: chelis/spec/design/hydronnx.md

Catch-all for graph-shape violations the emitter detects: missing required attributes, wrong number of inputs, an attribute pointing at a tensor that does not exist, a Slice naming more than one axis, and so on.

hydronnx: integer matmul rejected
operator (op_type: <op>)
dtype: <dtype>
reason: chelis structurally rejects integer operands to BlasMatmul
spec: chelis/spec/design/hydronnx.md

Cause: the model has an i32 / i64 Gemm or MatMul. chelis's BlasMatmul only accepts float operands. Resolution: cast the matmul operands to a float dtype in your model.

hydronnx: missing input
operator (op_type: <op>)
input index: <index>
expected: <expected>
spec: chelis/spec/design/hydronnx.md
hydronnx: dimension mismatch
operator (op_type: <op>)
expected: <expected>
actual: <actual>
spec: chelis/spec/design/hydronnx.md
hydronnx: attribute out of range
operator (op_type: <op>)
attribute: <attr> = <value>
allowed: <allowed>
spec: chelis/spec/design/hydronnx.md

Hydronnx prints warning: ... lines to stderr without failing. Two categories exist; the CLI continues and produces output as normal.

Non-differentiable-operator warning (emit only)

Section titled “Non-differentiable-operator warning (emit only)”

chelis-hydronnx-emit warns when the model contains an operator whose gradient is undefined or zero-almost-everywhere by its own ONNX semantics (ArgMax / ArgMin, Floor / Ceil / Round / Sign, comparison and logical ops, Scatter / ScatterElements, an integer-target Cast). The exact text, against tests/fixtures/per_op/reduce/ArgMax.onnx:

warning: this model contains 1 mathematically non-differentiable operator(s) - automatic differentiation (`grad`) on the emitted `forward` will not work. chelis's AD does not name the operator; Hydronnx does:
[ArgMax] ArgMax - integer-index output - zero gradient a.e., undefined on ties
note: removing these operators does not by itself make the model AD-ready; chelis AD support is operator-dependent. Hydronnx gates the weighted Gemm path, but broader models need their own AD parity gate.

The emit succeeds; the warning is purely about grad. See Limitations and scope for the AD limits story.

--precision f64 and --batch-dim N are plumbed but inert. Passing either one prints a warning: line to stderr noting that the flag has no effect, then proceeds. The emit succeeds, but the emitted .ch is exactly what you would have gotten without the flag. See CLI reference.

Hydronnx tries to name the failing site precisely. The current behavior:

  • Parse-time errors carry the input <path> and (for node-level errors) the node_index. They do not carry the node name because the parser may fail before names are bound.
  • Translate-time errors carry the node name and op_type. They do not carry the <path>, because the translator runs on the in-memory IR.

When debugging, a useful trick: chelis-hydronnx-inspect on the same .onnx prints the full node list with names and indices, so a "node index 3" error can be cross-referenced against the inspect output's nodes (N): section.

Once the .ch is in hand, all further errors come from chelis:

  • chelis check: type errors. The emitted forward's sig is the contract; a wrong shape at the call site is a DimensionMismatch here.
  • chelis test: runtime errors during the example's host-runtime execution.
  • chelis prove: counterexamples to a @property (exit 1 with property failure: <name>).

Hydronnx does not wrap or rewrite chelis's error output; the chelis diagnostic is shown verbatim. The chelis docs are the source of truth for their format.