Introduction
School is a deep-learning framework for the
Chelis programming language. It ships as
a reef package under the School module prefix and is written entirely in
Chelis.
The idea is mini-JAX on a tensor-first language. You compose an architecture out
of School layers, differentiate it with the language's grad transform, and
train it with real optimizer updates and a real training loop. Automatic
differentiation flows through tensor-op composition, so a forward function built
from differentiable primitives is differentiable end to end without any
hand-written backward pass.
What is in the box
Section titled “What is in the box”- Typed layers under
School.Nn: linear and convolutional layers, embeddings and positional encodings, normalization, attention, pooling, padding, dropout, and the common activations. Every layer is a plain function with an explicit tensor signature, so shapes are checked at compile time. - Losses under
School.Loss: classification, regression, and embedding losses, plus accuracy and perplexity metrics. - Optimizers under
School.Optim: the SGD family, Adam, AdamW, Adagrad, RMSProp, Lion, and LAMB. Each is a pure step function that threads new parameters and optimizer state explicitly. - Learning-rate schedules under
School.ScheduleandSchool.SchedExt. - A training loop and a data layer under
School.TrainandSchool.Data: a dataset abstraction, a shuffling mini-batch loader, an epoch-by-batch loop combinator, and training metrics. - A model library under
School.Models: an MLP reference architecture and a set of vision and transformer architectures.
How the framework is shaped
Section titled “How the framework is shaped”School follows the conventions of the language it targets.
- Tensors are typed by shape. A signature like
tensor[a, b, f32] -> tensor[b, c, f32] -> tensor[c, f32] -> tensor[a, c, f32]is the type of a linear layer. Shape mismatches are compile-time errors. - There is no broadcasting. Where two tensors must line up, the code uses
expandandreshapeexplicitly. School layers do this for you inside their forward functions. - Values are immutable. Optimizers and the training loop do not mutate parameters in place. A step takes the current parameters and returns the next parameters, and the caller threads them forward.
- Differentiation is a transform, not a tape.
grad(f)(args)returns the per-argument gradient of a scalar-returning functionf. School's layers and losses are written as compositions of differentiable tensor operations so thatgradcan run through them.
The statistical primitives School builds on (distributions, special functions,
linear algebra) come from the upstream nautilus shell, and weight initializers
(Xavier, Kaiming) come from upstream Std.Init. School does not reimplement
these.
Reading this book
Section titled “Reading this book”Getting started walks a small example from forward pass
to a trained MLP. The reference chapters that follow document the layers, losses,
optimizers, schedules, training loop, data layer, and model library. Each entry
traces to the module that defines it under src/, and every code example is
taken from the validated example corpus under examples/. The final chapter sets
out the scope and limitations of the current surface.