Provenance and span tracking
Provenance is the reason Octant exists. The translation alone is useful, but the property that makes Octant part of a trust stack is that every piece of the output can be traced back to the formula it came from, and that the trace survives compilation. This chapter describes the span model, the manifest Octant writes, and how a compiled trace points back to the source formula.
The span model
Section titled “The span model”Every value that flows through Octant's pipeline carries a span. A span records where in the LaTeX source a node originated:
Span { source_id, start_byte, end_byte, start_line, start_column, end_line, end_column, label, // the \label{...} of a containing equation, if any}Byte offsets are zero-based; lines and columns are one-based to match
editor convention. The label is the equation label of a containing
\begin{equation} block when one is present.
Spans flow from the tokenizer through the parser, the normalizer, and
the emitter. When a pass produces a node that did not appear literally
in the source, the synthesized node inherits the span of its parent
LaTeX node. For example, \frac{a}{b} becomes a division node that did
not appear as a token in the source; that division node carries the span
of the whole \frac expression. The synthesized (lit 1.0) in the
x^{-1} lowering inherits the span of the power expression. A
synthesized child never has a missing span; it always points back into
the customer's LaTeX.
The spans manifest
Section titled “The spans manifest”Alongside the Deep .dp, translate writes a .spans.json provenance
manifest. It is a flat list of entries plus a header:
{ "source": "input.tex", "source_hash": "sha256:...", "spans": [ { "deep_node_id": "eq:bs_001", "deep_path": "/__top[0]/def[0]/app[1]/app[1]", "latex": { "start_byte": 142, "end_byte": 168, "start_line": 7, "start_column": 12, "end_line": 7, "end_column": 38, "label": "eq:bs" }, "latex_text": "s \\Phi(d_1) - k e^{-r t} \\Phi(d_2)" } ]}The header records the source filename and a hash of the source bytes, so a downstream tool can detect drift between the manifest and a changed source. Each entry has four fields:
deep_node_id: a stable identifier for the emitted Deep node. This is the identifieroctant explain --targettakes, and the identifier the Chelis compiler preserves through its passes.deep_path: a slash-prefixed sequence of<tag>[<index>]segments locating the node in the Deep AST. Described below.latex: the full span, byte range and line/column position, plus the equation label if present.latex_text: the literal LaTeX text the span covers. This is the exact source slice, the bytes fromstart_bytetoend_byte, verbatim.
Node identifiers
Section titled “Node identifiers”The emitter generates identifiers as <label-or-n>_<NNN>. The prefix is
the containing equation's \label{...} if one is in scope, or n if
no label is present. The suffix is a per-prefix sequential counter,
zero-padded to three digits. So a labelled equation eq:bs produces
eq:bs_001, eq:bs_002, and so on; an unlabelled formula produces
n_001, n_002. The label prefix lets explain locate any node within
a labelled equation, and the counter guarantees uniqueness even when the
same LaTeX text appears more than once.
The deep_path notation
Section titled “The deep_path notation”deep_path is a slash-prefixed sequence of <tag>[<index>] segments
locating the node in the Deep AST. It is not JSONPath.
<tag>is the Chelis Deep node tag:def,app,let,bind,if,var,lit,access,params, and so on.<index>is the zero-based position of the entry within its parent's child list, counting after the tag and the metadata map. In(app {meta} (var mul) (var a) (var b)), the(var mul)is atapp[0],(var a)atapp[1], and(var b)atapp[2]./__top[i]is the synthetic root segment for the i-th top-level expression in the program.
The notation is unambiguous and parseable: split on / and walk the
Deep AST element by element. It matches the positional list shape of
Chelis Deep, where JSONPath, which assumes named keys, would not.
How provenance survives compilation
Section titled “How provenance survives compilation”Octant emits Deep in the canonical three-tuple form
(tag {meta} ...children). Each node that originates from the source
carries its stable identifier in the metadata map as {span: "<id>"}:
(def {span: "eq1_001"} call_price (fn {span: "eq1_002"} (params {} ...) ... (app {span: "eq1_006"} (var {} sub) (app {span: "eq1_007"} (var {} mul) (var {} s) ...) (app {span: "eq1_008"} ...))))The {span: "..."} entry carries the identifier; the full span data
lives in the sidecar manifest. The Chelis compiler preserves these
identifiers through every pass. Parsing keeps them on AST nodes, type
checking copies them to typed nodes, lowering copies them to RISC IR
nodes, and code generation emits them as comments in the generated host
source.
The result is that a runtime trace of the binary references identifiers that point back into the LaTeX source. A model-validation team examining a risk number can trace the computation through the binary back to the original formula in the original notation. "Show me where this number comes from" has a mechanical answer.
Tracing with explain
Section titled “Tracing with explain”Given an identifier from the manifest, explain prints the LaTeX it
came from:
octant explain references/black_scholes_call.tex --target n_002The output reports the byte range, the line and column position, the
literal source slice, the equation label if present, and a few lines of
surrounding context with the matched line marked. Translate first to
produce the manifest, then read identifiers from it. If you pass an
identifier that is not in the manifest, explain reports the error and
lists valid identifiers.
Nautilus references
Section titled “Nautilus references”When the source uses a function that lives in a Nautilus module, for
example \Phi, mapping to Nautilus.Distributions.normal_cdf, the way
Octant emits the reference depends on --emit-nautilus.
Chelis Deep symbols are restricted to a single bare name, so a dotted
target like Nautilus.Distributions.normal_cdf cannot appear as one
symbol. There are two ways to encode it.
In the default stub mode, the emitter walks the body, collects every
distinct Nautilus reference, and prepends a top-level identity-stub
def for each at the file top:
(def {span: "__synthesized_nautilus_stub__"} normal_cdf (fn {span: "__synthesized_nautilus_stub__"} (params {span: "__synthesized_nautilus_stub__"} (x_1 {type: (t-prim {} f32)})) (var {} x_1)))The stub's parameter count matches the call-site arity. Its body returns
the first argument unchanged. Each call site is rewritten to a bare
(var {} normal_cdf). The emitted Deep is self-contained and compiles
through the Chelis Deep build path.
The stub returns its first argument unchanged, so the generated code is correct for the audit-chain claim but numerically a placeholder if executed standalone. Customers link the generated code against a real Nautilus implementation at deploy time, which shadows the stub with the actual function. Octant prints a loud warning to stderr whenever it inlines a stub.
The stub-introduced nodes carry a reserved synthesized-marker span,
__synthesized_nautilus_stub__. The manifest gains one entry per stub
with that marker as its identifier and a zero byte range; the marker
entries have no LaTeX origin and are exempt from the byte-exact rule
that applies to every LaTeX-derived entry.
The --emit-nautilus access opt-out reverts to the access-chain callee
form, (access (access (var Nautilus) Distributions) normal_cdf), with
no stubs and no warning. It suits callers that resolve the Nautilus
package through a Chelis-side build pipeline.
In both modes the call-site application node keeps its LaTeX-derived
span, so the reference is traceable back to the byte range of the
original \Phi(...) regardless of which emission mode you use.