Typechecking and Code Generation

After parsing and AST construction, the implementation has two different but related questions to answer.

First: is the program statically coherent? Second: if it is, how should it run on a host platform?

The typechecker answers the first question. The two code generators answer the second. This chapter covers all three, in that order, because the generators depend on the typechecker and the typechecker depends on the AST — the pipeline runs left to right, and so does the explanation.

3.1 3.1 The Typechecker

The typechecker in src/nex/typechecker.cljc is a proper pass over the AST, not a set of ad hoc checks buried in the interpreter.

That separation matters on two levels. At the language level, it gives programmers earlier and clearer feedback — a type error reported before execution is more useful than a runtime failure with a confusing message. At the implementation level, it prevents the interpreter and generators from rediscovering static facts on every evaluation. Facts established once by the typechecker do not need to be re-established downstream.

3.1.1 The Type Environment

Typechecking is fundamentally a process of carrying and refining assumptions. The typechecker makes this process visible by maintaining an explicit environment that records:

variables and their types
methods and their signatures
classes and their definitions
proven non-nil facts that hold in the current control-flow scope

The environment is created with make-type-env and manipulated through a small set of named operations: env-add-var, env-lookup-var, env-add-method, env-lookup-method, env-add-class, and env-mark-non-nil. Because nearly all of the checker’s knowledge lives in this one explicit structure, the pass is straightforward to inspect and debug. There is no hidden compiler state to account for.

3.1.2 The Main Typechecking Flow

The best entry point into the typechecker is check-program. Its control flow is clear and worth reading directly:

create a fresh type environment
register imported Java classes as placeholder class names
collect class information for all declared classes
register built-in methods and built-in classes
inject any pre-existing variable types, such as REPL bindings
register top-level functions as callable values
check class bodies
check top-level statements in source order

The ordering is not arbitrary. Class information is collected before class bodies are checked so that forward references resolve correctly. Built-ins are registered after user class collection so that the checker has access to both before it begins body-checking. Top-level statements are checked last, preserving the source-order feel of the language while ensuring all declarations are already in scope when they are needed.

This is a good example of Nex’s general implementation style: a multi-pass pipeline with explicit, non-overlapping responsibilities.

3.2 3.2 What the Typechecker Checks

The typechecker is substantial. It handles scalar types, arrays, maps, sets, tasks, channels, and function values; generic classes and their arguments; constructor calls; inheritance-aware method lookup; detachability and nil; convert; spawn result typing; and the validity rules for channel operations and select. This is enough to protect real programs without being so complex that it becomes opaque.

Reading through the file, the implementation clusters into four areas:

expression checking — computing the type of literals, calls, binary expressions, constructors, and so on
statement checking — updating the environment across let, assignment, loops, conditionals, and blocks
class checking — validating fields, constructors, methods, inheritance, and invariants
built-in registration — giving core types such as Array, Map, Set, Task, and Channel their method signatures

Three features have richer semantics than ordinary method calls and deserve closer attention.

3.2.1 Spawn Typing

spawn is handled by a dedicated path rather than by pretending it is a normal function call. The checker creates a nested environment, introduces a synthetic result binding, and tracks assignments to that binding to infer the task’s element type. A spawn body that assigns no result produces Task; one that assigns a consistent type produces Task[T]; one with inconsistent assignments is rejected. This precision would not be achievable if spawn were treated as an ordinary expression.

3.2.2 Channels and `select`

The same explicit treatment applies to channels and select. The checker registers Channel[T] methods as built-ins, but it also has dedicated rules for constructor forms such as Channel.with_capacity, the send and receive operations, timeout arguments, and the legality of select clauses. This is necessary because select is not a method call — it is a control-flow form whose clauses are restricted to a small set of communication patterns, and those restrictions must be enforced by the checker, not discovered at runtime.

3.2.3 Nil and Detachability

The checker tracks proven non-nil facts in the type environment. This gives it limited but useful flow-sensitive behaviour: after certain checks, a variable may be treated as safely non-detachable within a particular branch. The implementation records locally justified facts in the environment and uses them where appropriate. It does not attempt global dataflow analysis or theorem proving — the goal is precision where it is achievable without obscurity.

3.2.4 Type Aliases and Function Types

Before checking any class or routine, the typechecker registers every declare type binding from the program's :type-aliases bucket into the type environment. From then on, a single operation — alias expansion — does all the work. Wherever a type is about to be compared, or a method resolved on a value, the checker first expands aliases to the concrete type they stand for. Because expansion is applied uniformly — to parameter types, to the synthetic result variable, to return types, and to the variable types threaded in from the REPL — an alias-typed value behaves in every respect exactly like its expansion. A variable declared m: Matrix resolves m.get(0).get(1) through Array[Array[Real]], and a parameter f: Transformer can be invoked as f(x) through its underlying function type.

One subtlety keeps aliases honest. The checker treats a bare capitalised name that is neither a known class nor a built-in type as a generic type parameter — that is how G in Stack[G] is recognised. A declared alias is also a capitalised name, so the checker excludes declared aliases from that test: an alias names a concrete type, not a type variable to be inferred.

Function types participate in the same machinery. A type expression such as Function(n: Integer): Integer is carried as a structured type that records its parameter types and its return type. A function value with that type knows its own result: invoking it, whether through the explicit f.call1(x) form or the f(x) call sugar, yields the declared return type. Only a bare Function with no recorded signature falls back to Any, because in that case the result genuinely is unknown.

3.2.5 Forward Declarations, Sealed Types, and Once Fields

Forward declarations. A declare function node contributes its signature to the environment without a body. Calls to the function therefore typecheck regardless of where — or whether — the full definition appears later; when the definition does appear, its body is checked against the already-registered signature. The backends never see these nodes: declaration-only functions are filtered out before code generation.

Sealed types and exhaustive match. A match over a value whose static type is a sealed class becomes an exhaustive type switch. The checker computes the complete set of classes that directly inherit the sealed parent — which is well defined precisely because sealed closes the hierarchy — and requires a clause for every one of them. A missing variant is a compile-time non-exhaustive match error; an explicit else branch covers the remainder and suppresses the check.

For that guarantee to hold, the checker also requires a sealed class to be deferred. The exhaustiveness check enumerates only the subclasses; if the sealed parent were itself instantiable, a bare parent instance would be a legal runtime value matching none of the subclass clauses, and the match the checker had certified as total would fail at run time. Rejecting a non-deferred sealed class — “a sealed class must be declared sealed deferred” — closes that gap at its source.

Once fields. While checking a constructor body the environment carries an __in_constructor__ marker. An assignment to a once field is permitted only when that marker is present; anywhere else it is rejected at compile time with a message naming the field. This is the static half of the guarantee whose runtime half lives in the interpreter and whose code-generation consequences are described in Section 5.7.

3.3 3.3 Conservative by Design

The typechecker is not trying to be heroic. It does not perform deep global inference or sophisticated constraint solving. That restraint is deliberate.

Nex is intended to teach programmers how to state intent clearly. A type system that works primarily through invisible cleverness would undermine that goal — if the rules cannot be explained, they cannot be taught. The checker aims to be precise enough to catch real errors and simple enough that a contributor reading the file can understand how every rule is applied.

This is a recurring theme across the Nex implementation: precision matters, but intelligibility matters too. Where they are in tension, Nex usually chooses intelligibility.

3.4 3.4 Why Two Backends

Nex currently has two maintained compilation backends: JVM bytecode for JVM deployment and JavaScript for Node.js and browser environments. The relevant files are:

These backends solve the same semantic problem in different host runtimes. Their top-level control flow is intentionally similar:

parse Nex source
type-check it unless explicitly disabled
lower or translate the program into the target runtime model
emit runtime artifacts needed by that target

Type-checking before generation is the right default. A generator should not silently produce host code for a program that Nex itself considers ill-typed.

Having two targets also imposes a useful discipline. Any feature added to Nex must make sense in three settings — interpreted on the JVM, compiled to JVM bytecode, and translated to JavaScript. That pressure discourages features that only work by accident in one execution model. If you add or change a language feature, you should expect to touch the grammar, the walker, the typechecker, the interpreter, and both backends. This is not duplication for its own sake. It is how Nex keeps semantics explicit across the whole system.

3.5 3.5 The JVM Bytecode Compiler

The JVM bytecode compiler is the primary deployment backend. It emits .class files and standalone shaded jars rather than generating host source code.

The compiler is organised around a small set of recurring stages: mapping Nex types to JVM descriptors, lowering AST nodes into compiler IR, emitting classes and methods with ASM, and packaging generated output into file-level artifacts. To read it effectively, start with these entry points and follow the control flow downward:

compile-ast, compile-file, compile-jar — whole-program and file-level entry points
lower-repl-cell, lower-class-def — lowering into compiler IR
compile-unit->bytes, compile-user-class->bytes, compile-launcher->bytes — ASM-backed bytecode emission

The compiler emits Nex classes, helper classes for function values and closures, runtime support for contracts and invariants, helpers for arrays, maps, sets, cursors, tasks, channels, I/O, and launcher classes for file compilation.

One design decision worth noting: the bytecode backend still carries a substantial runtime alongside the generated user code. A compiled Nex jar includes the runtime support classes and the active compiler classpath rather than assuming a separately installed Nex VM. This keeps deployment simple at the cost of larger output, a tradeoff that is appropriate for Nex’s current use cases.

3.6 3.6 The JavaScript Generator

The JavaScript generator serves a different purpose. It must preserve Nex’s semantics in a host environment with a different object model, single-threaded event-loop execution, promises rather than JVM blocking primitives, and ES module import conventions. It does not simply mirror the JVM backend — it re-expresses Nex semantics in a JavaScript-native form.

The generator is organised around the same four concerns as the JVM backend — type mapping, expression emission, statement emission, and class emission — but several target-specific areas are worth calling out.

First, the generator must decide when operations should produce await-bearing code. This matters for task and channel semantics, where the event-loop model requires explicit async boundaries that the JVM does not.

Second, JavaScript imports are generated from Nex import ... from ... forms rather than from JVM-style qualified names, reflecting the ES module convention.

Third, target-specific with "javascript" blocks are retained while with "java" blocks are omitted. This is a visible example of compile-time target selection — the same source file can contain platform-specific fragments, and each generator takes only what belongs to it.

For the concurrency side specifically, read generate-spawn-expr, generate-select, and generate-select-clause-js. These functions make explicit how Nex concurrency is lowered into async JavaScript without changing the surface language — the semantics are preserved, but the mechanism is entirely different from the JVM path.

3.7 3.7 Backends as Semantic Documents

A backend is not only an output mechanism. It is also a semantic document.

Reading the JVM and JavaScript backends side by side is instructive precisely where they diverge. Where both take the same path, the feature is straightforwardly language-level. Where they diverge — concurrency being the clearest example — the divergence reveals something real about Nex: the language semantics are more fundamental than any one execution model, and the backends are two independent proofs of that claim.

The larger architectural point is that parsing and typechecking are shared while code generation diverges only where the targets genuinely differ. One syntax, one AST, one static model, multiple execution strategies. That separation is what a language implementation should aim for, and Nex largely achieves it. *