Chapter 2

Syntax of the Core

This chapter defines the lexical structure of Nex and the grammar of its core: the expressions and statements from which the body of every routine is built. The structure peculiar to classes and modules is deferred to Chapter 3.

The grammar is presented in a variant of Backus–Naur form. Nonterminals are set in slanted type and terminals—reserved words and symbols—in typewriter type. Angle brackets \(\langle\,\cdot\,\rangle\) enclose an optional phrase; a phrase followed by a superscript asterisk may be repeated zero or more times, and one followed by a superscript plus, one or more times. Where a production is a derived form, explained by translation into a simpler one, it is so marked and the translation appears in Appendix C.

2.1Reserved Words

The following are the reserved words of Nex. They may not be used as identifiers.

and across as case class convert create declare deferred do else elseif end ensure feature fn from function if import inherit intern invariant let match nil not note of old once or private raise repeat require rescue result retry sealed select spawn then this timeout to type until variant when with

The boolean constants true and false and the built-in type names Integer, Integer64, Real, Decimal, Char, Boolean, String, and Function are also recognised by the lexer. They are not, strictly, keywords usable in arbitrary positions; they denote particular constants and classes of the standard environment (Appendix B), and a program may not redefine them.

The identifier result is reserved within the body of a routine that declares a return type: it names the cell whose final contents become the routine’s result (Section 3.4). The identifier exception is bound, within a rescue block, to the value being handled (Section 5.7).

2.2Special Constants

A special constant is an integer, real, character, string, or boolean literal, or the constant nil.

scon::=int— integer constant
|real— real constant
|char— character constant
|string— string constant
|true | false— boolean constant
|nil— the null reference

Integer constants

An integer constant is a non-empty sequence of decimal digits, or a based literal introduced by 0b (binary), 0o (octal), or 0x (hexadecimal). An underscore may appear between two digits as a visual separator and has no other significance. There is no sign: a leading minus is the unary operator of Section 2.6, not part of the constant.

int::=digit+— decimal
|0b bit+— e.g. 0b1111_0000
|0o odigit+— e.g. 0o755
|0x hdigit+— e.g. 0xFF_AA_33

A decimal integer constant has type Integer. The range of Integer is guaranteed to at least 32 bits but is otherwise host-defined, as is whether a constant exceeding the host range is accepted; the guaranteed and host-defined parts of every scalar value space—integer ranges, the IEEE 754 format of Real, the precision of Decimal, and the Unicode character model of Char and String—are set out in Appendix B, §B.3.

Real constants

A real constant has an optional integer part, a mandatory decimal point, a mandatory fractional part of at least one digit, and an optional exponent. Thus 4.5, 10.0, .5, and 12.0e-3 are real constants, while 10. and 12.e-3 are not: a real constant may not end at the point. A real constant has type Real.

real::=digit+. digit+exp
exp::=(e|E) ⟨+|-digit+

Character constants

A character constant is a # followed either by a single character other than a digit or whitespace, by a decimal code point, or by one of the named characters #nul, #space, #newline, #tab, #return. Thus #A, #b, and #65 are character constants. A character constant has type Char.

String constants

A string constant is a sequence of characters enclosed in double quotes "" or single quotes ''. A string constant has type String.

No escape interpretation Nex string literals do not interpret backslash escapes: in a string constant "\n" denotes the two characters backslash and n, not a newline. Control characters are written with the corresponding character constants (#newline, #tab) or obtained from the standard environment. The only role of the backslash in the lexer is to allow a quote character to appear within a string delimited by that same quote.

2.3Comments

A comment is introduced by two hyphens -- and extends to the end of the line. Comments do not nest—there being no closing delimiter—and are discarded by the lexer. A comment is equivalent to a single space.

2.4Identifiers

An identifier begins with a letter or underscore and continues with letters, underscores, and digits. Identifiers are case-sensitive. By convention—and only by convention—class names begin with an upper-case letter and the names of variables, fields, routines, and parameters with a lower-case letter; the grammar does not enforce this.

id::=(letter|_) (letter|_|digit)*

An identifier that coincides with a reserved word is not an identifier but that reserved word. The longest-match rule (Section 2.5) applies, so classes is a single identifier and not the keyword class followed by es.

2.5Lexical Analysis

The character stream is converted to a stream of tokens by the following rules. Whitespace—spaces, tabs, carriage returns, and line feeds—and comments separate tokens but are otherwise insignificant; Nex is not an indentation-sensitive language. Each token is the longest sequence of characters that can begin a token at the current position (the maximal munch rule). Thus <= is one token and not two, and := is the assignment operator and not a colon followed by an equals sign.

The reserved words, the operator and punctuation symbols, the special constants, and the identifiers are the tokens of the language. The symbol #{ is a single token opening a set display (Section 3 and Appendix C); it must be distinguished by maximal munch from the character constant introducer #.

2.6Operators and Precedence

Nex has a fixed set of infix and prefix operators; the programmer cannot declare new ones or alter their precedence. The operators are listed below from lowest to highest binding power. All binary operators are left-associative except exponentiation ^, which is right-associative. The prefix operators are unary minus - and logical not.

LevelOperatorsDescription
1 (lowest)orlogical disjunction (short-circuit)
2andlogical conjunction (short-circuit)
3= /= == !=value and identity (in)equality
4< <= > >=ordering comparison
5+ -addition, subtraction
6* / % ^multiplication, division, remainder, power
7 (highest)- notprefix negation

Operators are not first-class and do not denote methods directly at the surface; an expression such as a + b is evaluated by the rules of Chapter 5, which appeal to the arithmetic of the operands’ classes. Application—a call or member access—and parentheses bind more tightly than any operator.

Short-circuit conjunction and disjunction and and or are short-circuiting in every back end: in e₁ and e₂ the operand e₂ is evaluated only if e₁ is true, and in e₁ or e₂ only if e₁ is false. There are no separate and then / or else forms; the plain operators are the short-circuit forms. Consequently x /= nil and x.f is a safe guard. This is a property of the dynamic semantics (Section 5.4) and is recorded here because it affects how an expression is read.

2.7The Grammar of Expressions

An expression denotes a value. The expression grammar is layered by the precedence of Section 2.6; we present it here in collapsed form, taking the precedence and associativity as given, and exhibit the layered productions in Appendix A.

exp::=scon— special constant
|id— variable or field
|this— the current object
|result— the result cell
|( exp )— parenthesised
|exp binop exp— infix application
|unop exp— prefix application
|exp . id( args )— member access / method call
|exp ? . id( args )— safe member access
|id ( args )— function or local call
|create idtyargs⟩ ⟨. id( args )⟩⟩— object creation
|when exp then exp else exp end— conditional expression
|fngen(params): tydo block end— anonymous function
|spawn do block end— task creation
|old exp— pre-state snapshot (in ensure only)
|arraylit | maplit | setlit— collection displays (Appendix C)

The argument list and the collection displays have the obvious forms:

args::=exp (, exp)*
arraylit::=[exp (, exp)*]
maplit::={exp : exp (, exp : exp)*}
setlit::=#{exp (, exp)*}

The empty braces {} denote the empty map; the empty set is written #{}. These displays are derived forms, expanded into creations and a sequence of insertions in Appendix C.

2.8The Grammar of Statements

A statement is executed for its effect. A block is a sequence of statements, executed in order. An expression standing alone is a statement, executed for its effect and its value discarded.

block::=stmt*
stmt::=id := exp— assignment to a variable or field
|exp . id := exp— assignment to a field
|let id: ty:= exp— local declaration
|if exp then block (elseif exp then block)*else blockend
|from blockinv⟩ ⟨varuntil exp do block end— general loop
|repeat exp do block end— counted loop
|across exp as id do block end— cursor loop
|case exp of caseclause+else stmtend— constant dispatch
|match exp of matchclause+else blockend— type dispatch
|select selectclause+timeout⟩ ⟨else blockend— communication (Chapter 6)
|do blockrescue blockend— scoped block
|with string do block end— resource block
|raise exp— raise an exception
|retry— restart the enclosing do
|exp— expression statement
caseclause::=scon (, scon)* then stmt
matchclause::=when idtyargsas id then block
inv::=invariant assertion+
var::=variant exp

The forms governing concurrency—spawn, select, the timeout clause—are listed here for completeness but their meaning is given in Chapter 6. The resource block with and the loop invariant and variant are explained in Chapters 5 and 3 respectively.

2.9Syntactic Restrictions

Certain conditions, though expressible in the grammar, are forbidden. They are checked before elaboration and a violation is a compile-time error.

Further restrictions that depend on types or on the class hierarchy—the exhaustiveness of a match on a sealed class, the conformance of an overriding routine, the immutability of a once field—belong to the static semantics and are stated in Chapter 4.