Chapter 4

Static Semantics

The static semantics describes what can be known about a program before it runs: the type of every expression, the conformance of every class to its parents, and the well-formedness of every contract. A program that passes these checks is elaborated; one that fails is rejected, and never reaches the dynamic semantics of Chapter 5.

We proceed in the manner of The Definition of Standard ML: first the semantic objects—the types and bindings the rules manipulate; then the relations among them—conformance and lookup; then the inference rules that assign types to phrases.

4.1Types

A type is one of the following.

\(\tau\)::=\(\texttt{Integer} \mid \texttt{Integer64} \mid \texttt{Real} \mid \texttt{Decimal}\)— numeric
|\(\texttt{Char} \mid \texttt{Boolean} \mid \texttt{String}\)— other scalars
|\(A[\tau_1,\dots,\tau_n]\)— class type (\(n \ge 0\))
|\(\alpha\)— type variable
|\(\tau_1 \times \cdots \times \tau_n \to \tau\)— function type
|\(\mathsf{Fun}\)— the unconstrained function type
|\(\tau\,?\)— optional type
|\(\mathsf{Any} \mid \mathsf{Void} \mid \mathsf{Nil}\)— top, no-result, the type of nil

Here \(A\) ranges over class names. A class type with no arguments is written \(A\) rather than \(A[\,]\). The scalar types, \(\mathsf{Any}\), \(\mathsf{Function}\) and the collection classes are not primitive in any deep sense: they are the classes of the standard environment (Appendix B), and \(\texttt{Integer}\) and the rest are abbreviations for particular class types. We keep them syntactically distinct only because the lexer does.

\(\mathsf{Any}\) is the type of which every type is a subtype; it is the elaborated form of the root class Any. \(\mathsf{Void}\) is the type assigned to a statement, and to a routine with no declared return type: it is not a value type and may not be the type of a field or parameter. \(\mathsf{Nil}\) is the type of the constant nil; it is a subtype of every optional type and of no other.

4.2The Static Environment

Elaboration takes place against a static environment, written \(C\), comprising three components:

We write \(C(x) = \tau\) for the lookup of an ordinary variable, \(C \oplus \{x : \tau\}\) for the environment \(C\) extended with the binding \(x:\tau\) (shadowing any earlier binding of \(x\)), and \(C(A)\) for the signature of class \(A\). The functions \(\mathit{fields}(C, A)\) and \(\mathit{routine}(C, A, m)\) collect, respectively, the fields and the named routine \(m\) visible in \(A\)—those declared in \(A\) together with those inherited from its parents, with \(A\)’s own declarations overriding inherited ones of the same name.

4.3Conformance

The central relation of the static semantics is conformance, written \(\tau \preceq \tau'\) and read “\(\tau\) conforms to \(\tau'\)” or “\(\tau\) is a subtype of \(\tau'\).” A value of a conforming type may be used wherever the wider type is expected. Conformance is the reflexive, transitive closure of the rules below.

\[ \tau \preceq \tau \tag{C-Refl} \] \[ \frac{\tau \preceq \tau' \quad \tau' \preceq \tau''}{\tau \preceq \tau''} \tag{C-Trans} \] \[ \tau \preceq \mathsf{Any} \tag{C-Top} \]

A class type conforms to its parents, with type arguments substituted through the inheritance clause. If \(A\) is declared inherit \(B[\sigma_1, \dots]\) under generic parameters \(\alpha_1,\dots,\alpha_n\), then

\[ \frac{A[\alpha_1,\dots,\alpha_n] \;\text{inherits}\; B[\sigma_1,\dots,\sigma_m]}{A[\tau_1,\dots,\tau_n] \preceq B[\sigma_1,\dots,\sigma_m][\tau_i/\alpha_i]} \tag{C-Inherit} \]

The numeric types form a small chain reflecting widening: \(\texttt{Integer} \preceq \texttt{Integer64} \preceq \texttt{Real} \preceq \texttt{Decimal}\) is not assumed here; numeric conversions in Nex are explicit except where the standard environment provides a coercion. The one implicit numeric rule the language relies upon is that integer operands are admitted where a real is required by the arithmetic of the standard environment; this is recorded with the arithmetic operators in Appendix B rather than as a conformance.

Optional types

An optional type \(\tau\,?\) is inhabited by the values of \(\tau\) together with nil. Hence \(\tau\) conforms to \(\tau\,?\), and \(\mathsf{Nil}\) conforms to every optional type; but \(\tau\,?\) does not conform to \(\tau\).

\[ \tau \preceq \tau\,? \tag{C-Opt} \qquad \mathsf{Nil} \preceq \tau\,? \tag{C-Nil} \] \[ \frac{\tau \preceq \tau'}{\tau\,? \preceq \tau'\,?} \tag{C-OptMono} \]

Function types

A function type conforms to another when it accepts at least the arguments the other accepts and returns at most what the other promises: its parameters vary contravariantly and its result covariantly. The unconstrained type \(\mathsf{Fun}\) is a supertype of every function type.

\[ \frac{\tau_i \preceq \sigma_i \;(1 \le i \le n) \quad \sigma \preceq \tau}{\sigma_1 \times \cdots \times \sigma_n \to \sigma \;\preceq\; \tau_1 \times \cdots \times \tau_n \to \tau} \tag{C-Fun} \] \[ (\tau_1 \times \cdots \times \tau_n \to \tau) \preceq \mathsf{Fun} \tag{C-FunTop} \]

Note the directions in (C-Fun): for the subtype on the left, each parameter type \(\sigma_i\) must be a supertype of the corresponding \(\tau_i\) (\(\tau_i \preceq \sigma_i\)), while the result \(\sigma\) must be a subtype of \(\tau\). Thus a Function(a: Animal) conforms to a Function(a: Dog) — a handler for any animal may stand in wherever a dog-handler is wanted — but not the reverse.

Why contravariant, and why it is sound Contravariance is what makes conformance a local guarantee. A value that conforms to Function(a: A) genuinely accepts every \(A\), so a call through that type cannot go wrong — there is no “catcall” to chase down later. This is the opposite of the Eiffel tradition, which makes parameters covariant and so admits an unsound relation that must be rescued by a separate whole-program analysis; Nex instead rejects the unsafe direction at the point of conformance, with a diagnostic at the offending value. The same rule governs method redefinition (Section 4.9): an overriding routine may widen a parameter and narrow a return, but not the reverse. The contravariant parameter rule also aligns with the Design-by-Contract substitution principle the language follows elsewhere — an heir may weaken what it demands and strengthen what it guarantees.

4.4Elaboration of Expressions

The judgement \(C \vdash e \Rightarrow \tau\) asserts that, in the static environment \(C\), the expression \(e\) is well-typed with type \(\tau\). The rules follow the grammar of Section 2.7.

Constants and variables

\[ C \vdash \mathit{int} \Rightarrow \texttt{Integer} \tag{4.1} \qquad C \vdash \mathit{real} \Rightarrow \texttt{Real} \tag{4.2} \] \[ C \vdash \mathit{string} \Rightarrow \texttt{String} \tag{4.3} \qquad C \vdash \mathit{char} \Rightarrow \texttt{Char} \tag{4.4} \] \[ C \vdash \texttt{true} \Rightarrow \texttt{Boolean} \qquad C \vdash \texttt{false} \Rightarrow \texttt{Boolean} \tag{4.5} \] \[ C \vdash \texttt{nil} \Rightarrow \mathsf{Nil} \tag{4.6} \] \[ \frac{C(x) = \tau}{C \vdash x \Rightarrow \tau} \tag{4.7} \]

The constant this has the type of the current class, available in \(C\) within the body of a routine; result has the declared return type of the enclosing routine, and is in \(C\) only there (Section 2.9).

Operators

An infix operator denotes a routine of the operand’s class. The arithmetic and comparison operators are elaborated through the signatures recorded in the standard environment; we summarise their effect. For an arithmetic operator \(\odot\) — one of + - * / % ^ — on numeric operands the result is the wider of the two operand types; comparison operators yield Boolean; the equality operators \(=\), \(/\!=\), \(==\), \(!\!=\) yield Boolean for any operands of conforming type; and the logical operators require and yield Boolean.

\[ \frac{C \vdash e_1 \Rightarrow \tau_1 \quad C \vdash e_2 \Rightarrow \tau_2 \quad \tau_1,\tau_2 \;\text{numeric}}{C \vdash e_1 \odot e_2 \Rightarrow \tau_1 \sqcup \tau_2} \tag{4.8} \] \[ \frac{C \vdash e_1 \Rightarrow \tau_1 \quad C \vdash e_2 \Rightarrow \tau_2 \quad (\tau_1 \preceq \tau_2 \;\text{or}\; \tau_2 \preceq \tau_1)}{C \vdash e_1 = e_2 \Rightarrow \texttt{Boolean}} \tag{4.9} \] \[ \frac{C \vdash e_1 \Rightarrow \texttt{Boolean} \quad C \vdash e_2 \Rightarrow \texttt{Boolean}}{C \vdash e_1 \;\texttt{and}\; e_2 \Rightarrow \texttt{Boolean}} \tag{4.10} \]

Here \(\tau_1 \sqcup \tau_2\) is the wider of two numeric types. Rule (4.10) stands for the conjunction; disjunction (or) and negation (not) are analogous.

Member access and calls

A member access \(e.f\) requires that the static type of \(e\) be a class type possessing an accessible field or routine named \(f\). If \(f\) is a field, its type is the result; if \(f\) is a nullary routine, its return type is the result; if \(f\) is applied to arguments, the argument types must conform to the parameter types of the routine.

\[ \frac{C \vdash e \Rightarrow A[\bar\tau] \quad f : \sigma \in \mathit{fields}(C, A[\bar\tau])}{C \vdash e.f \Rightarrow \sigma} \tag{4.11} \] \[ \frac{\begin{array}{c}C \vdash e \Rightarrow A[\bar\tau] \quad \mathit{routine}(C, A[\bar\tau], m) = (\sigma_1 \times \cdots \times \sigma_n \to \sigma) \\ C \vdash e_i \Rightarrow \rho_i \quad \rho_i \preceq \sigma_i \;\; (1 \le i \le n)\end{array}}{C \vdash e.m(e_1,\dots,e_n) \Rightarrow \sigma} \tag{4.12} \]

The accessibility condition is that \(f\) is not declared in a private feature section of \(A\), unless the access occurs textually within \(A\). A bare call \(m(e_1,\dots,e_n)\) is elaborated as a call on this if \(m\) is a routine of the current class, and otherwise as a call of the free function \(m\) (Section 4.8).

Method overloading by arity A class may declare several routines of the same name differing in the number of parameters. \(\mathit{routine}(C, A, m)\) therefore selects among them by the count of arguments at the call site; there is no overloading by parameter type, and free functions may not be overloaded at all. A safe member access \(e\,?.f\) has type \(\sigma\,?\) where \(\sigma\) is the type of \(e.f\): the access yields nil when \(e\) is nil, and so is defined precisely on optional receivers.

Object creation

The expression create \(A[\bar\tau].k(e_1,\dots,e_n)\) requires that \(A\) be a non-deferred class, that \(k\) be one of its constructors, and that the arguments conform to \(k\)’s parameters; its type is \(A[\bar\tau]\). When the constructor and argument list are omitted, \(A\) must have a constructor taking no arguments, or none at all.

\[ \frac{\begin{array}{c} A \;\text{not deferred} \quad k \in \mathit{constructors}(C, A) \quad k : \sigma_1 \times \cdots \times \sigma_n \to A[\bar\tau] \\ C \vdash e_i \Rightarrow \rho_i \quad \rho_i \preceq \sigma_i \;\;(1 \le i \le n) \end{array}}{C \vdash \texttt{create}\; A[\bar\tau].k(e_1,\dots,e_n) \Rightarrow A[\bar\tau]} \tag{4.13} \]

Conditional and anonymous-function expressions

The conditional expression requires a boolean condition and two branches; its type is the join—the least common supertype—of the branch types, written \(\tau_1 \sqcup \tau_2\) and always defined because \(\mathsf{Any}\) is a top.

\[ \frac{C \vdash e \Rightarrow \texttt{Boolean} \quad C \vdash e_1 \Rightarrow \tau_1 \quad C \vdash e_2 \Rightarrow \tau_2}{C \vdash \texttt{when}\; e \;\texttt{then}\; e_1 \;\texttt{else}\; e_2 \;\texttt{end} \Rightarrow \tau_1 \sqcup \tau_2} \tag{4.14} \] \[ \frac{C \oplus \{x_i : \sigma_i\} \vdash \mathit{block} : \sigma}{C \vdash \texttt{fn}\,(x_1{:}\sigma_1,\dots,x_n{:}\sigma_n){:}\sigma\;\texttt{do}\;\mathit{block}\;\texttt{end} \Rightarrow \sigma_1 \times \cdots \times \sigma_n \to \sigma} \tag{4.15} \]

4.5Elaboration of Statements

A statement is elaborated by the judgement \(C \vdash \mathit{stmt} \Rightarrow C'\), producing a possibly extended environment \(C'\); a statement that binds no new name produces \(C\) unchanged. A block is elaborated by threading the environment through its statements.

\[ \frac{C \vdash e \Rightarrow \tau \quad \tau \preceq \sigma \quad x : \notin C}{C \vdash \texttt{let}\; x{:}\sigma := e \Rightarrow C \oplus \{x : \sigma\}} \tag{4.16} \] \[ \frac{C(x) = \sigma \quad C \vdash e \Rightarrow \tau \quad \tau \preceq \sigma \quad x \;\text{not a}\; \texttt{once}\;\text{field outside a constructor}}{C \vdash x := e \Rightarrow C} \tag{4.17} \] \[ \frac{C \vdash e \Rightarrow \texttt{Boolean} \quad C \vdash \mathit{block}_1 : C \quad C \vdash \mathit{block}_2 : C}{C \vdash \texttt{if}\; e \;\texttt{then}\; \mathit{block}_1 \;\texttt{else}\; \mathit{block}_2 \;\texttt{end} \Rightarrow C} \tag{4.18} \]

A let with no type annotation takes \(\sigma\) to be the inferred type \(\tau\) of its initialiser. The premise “\(x\) not a once field outside a constructor” in (4.17) is the static enforcement of immutability: assigning a once field anywhere but in a constructor of its class is rejected. The loop, case, match, across, and do/rescue statements elaborate their constituent blocks under the environment extended with any variable they introduce (the cursor variable of across, the bound variable of a match clause), and produce the original environment: a block does not leak its locals to the statements that follow it.

Type dispatch and exhaustiveness

A match on an expression of class type \(A\) dispatches on the runtime class. Each clause when \(B\) as \(x\) then \(\mathit{block}\) requires \(B \preceq A\) and elaborates its block with \(x\) bound to \(B\).

\[ \frac{\begin{array}{c} C \vdash e \Rightarrow A \quad B_j \preceq A \quad C \oplus \{x_j : B_j\} \vdash \mathit{block}_j : C \;\;(1 \le j \le k)\\ A \;\text{sealed} \;\Rightarrow\; \{B_1,\dots,B_k\} = \mathit{subclasses}(C, A) \;\;\text{or an}\; \texttt{else}\;\text{clause is present} \end{array}}{C \vdash \texttt{match}\; e \;\texttt{of}\; \overline{\texttt{when}\,B_j\,\texttt{as}\,x_j\,\texttt{then}\,\mathit{block}_j}\; \texttt{end} \Rightarrow C} \tag{4.19} \]

When \(A\) is sealed, the set of clause classes must exhaust \(\mathit{subclasses}(C, A)\), unless an else clause is present to cover the remainder. A missing variant is a compile-time error. This exhaustiveness check is the whole purpose of the sealed modifier, and is why a sealed class is required to be deferred (Section 4.9).

Type conversion

The form convert \(e\) to \(x{:}\sigma\) is a boolean expression that attempts a downward or upward cast: it succeeds, binding \(x\) of type \(\sigma\), when the runtime type of \(e\) is related to \(\sigma\) (a supertype or subtype), and otherwise yields false with \(x\) bound to nil. Statically it requires that the type of \(e\) and \(\sigma\) be related by conformance in one direction or the other.

\[ \frac{C \vdash e \Rightarrow \tau \quad (\tau \preceq \sigma \;\text{or}\; \sigma \preceq \tau)}{C \vdash \texttt{convert}\; e \;\texttt{to}\; x{:}\sigma \Rightarrow \texttt{Boolean},\;\; C \oplus \{x : \sigma\,?\}} \tag{4.20} \]

4.6Contract Formation

Every assertion—in a require, an ensure, an invariant, or a loop invariant—must elaborate to Boolean. A precondition is elaborated in the environment of the routine’s entry (parameters and fields in scope); a postcondition in that environment extended with result and with the old forms; an invariant in the environment of the class’s fields.

\[ \frac{C \vdash e \Rightarrow \texttt{Boolean}}{C \vdash (\mathit{label} : e) \;\mathbf{assertion}} \tag{4.21} \] \[ \frac{C \vdash f \Rightarrow \tau \quad f \in \mathit{fields}(C, \text{current class})}{C^{\mathsf{ensure}} \vdash \texttt{old}\; f \Rightarrow \tau} \tag{4.22} \]

Rule (4.22) records the restriction of Section 2.9: old applies only to a field of the current object, and only within a postcondition (the superscript \(\mathsf{ensure}\) marks that environment). The label of an assertion has no effect on its type; it is carried into the dynamic semantics solely to name the condition in a violation report.

4.7Generic Elaboration

A generic class \(A[\alpha_1{\to}K_1,\dots]\) is elaborated once, with its parameters \(\alpha_i\) added to \(\Delta\) and constrained so that any actual argument must conform to \(K_i\). An application \(A[\tau_1,\dots,\tau_n]\) is well-formed when each \(\tau_i\) conforms to the constraint \(K_i\) of the corresponding parameter; its fields and routine signatures are those of \(A\) with \(\tau_i\) substituted for \(\alpha_i\) throughout.

\[ \frac{A[\alpha_1{\to}K_1,\dots,\alpha_n{\to}K_n] \in \Sigma \quad \tau_i \preceq K_i \;\;(1 \le i \le n)}{C \vdash A[\tau_1,\dots,\tau_n] \;\mathbf{type}} \tag{4.23} \]

A parameter written \(?\alpha\) additionally permits \(\mathsf{Nil}\) as an argument. A generic routine is elaborated in the same way, its type parameters local to the routine. Because elaboration substitutes rather than erases, the type \(\texttt{Array[Integer]}\) and the type \(\texttt{Array[String]}\) are distinct types with distinct field and routine signatures, and a value of one does not conform to the other.

4.8Free Functions and the Static World

The free functions of a program are elaborated together with its classes, so that any function may call any other regardless of textual order. A bare application \(m(e_1,\dots,e_n)\), where \(m\) is not a routine of the current class, is elaborated against the signature of the free function \(m\); a declare function signature supplies that type before the definition is seen, which is how mutual recursion is admitted (Section 3.4). A later definition must match its declaration exactly. Free functions, unlike methods, may not be overloaded: each free function name is meant to be unique, and a definition matching an earlier declare function must agree with it in full. (The current implementation does not enforce this uniqueness; see the note on enforcement gaps in Section 4.9.)

4.9Well-Formedness of Classes

Beyond the typing of expressions, a program must satisfy conditions on its class structure. These are checked once, over the whole static world.

Known enforcement gap One check this Definition states is, in the current implementation, not yet performed where described: the uniqueness of free function names (Section 4.8) is not enforced — a second definition of a name silently supersedes the first rather than being rejected. Because the interpreter is the authoritative account of the language (Section 1.3), this is recorded as a gap between the intended static semantics and the implementation, to be closed in one or the other. (The conformance of an overriding routine’s parameters and return type, once a separate gap, is now checked at the override.)

A program all of whose phrases elaborate, and all of whose classes are well-formed, is statically valid. The dynamic semantics of the next chapter is defined only for statically valid programs.