This chapter defines the lexical structure of Nex and the grammar of its core: the expressions and statements from which the body of every routine is built. The structure peculiar to classes and modules is deferred to Chapter 3.
The grammar is presented in a variant of Backus–Naur form. Nonterminals
are set in slanted type and terminals—reserved words and
symbols—in typewriter type. Angle brackets
\(\langle\,\cdot\,\rangle\) enclose an optional phrase; a phrase followed by a
superscript asterisk may be repeated zero or more times, and one followed by a
superscript plus, one or more times. Where a production is a derived form,
explained by translation into a simpler one, it is so marked and the
translation appears in Appendix C.
2.1Reserved Words
The following are the reserved words of Nex. They may not be used as identifiers.
and across as case
class convert create declare
deferred do else elseif
end ensure feature fn
from function if import
inherit intern invariant let
match nil not note
of old once or
private raise repeat require
rescue result retry sealed
select spawn then this
timeout to type until
variant when with
The boolean constants true and false and the
built-in type names Integer, Integer64,
Real, Decimal, Char,
Boolean, String, and Function are also
recognised by the lexer. They are not, strictly, keywords usable in arbitrary
positions; they denote particular constants and classes of the standard environment
(Appendix B), and a program may not redefine them.
The identifier result is reserved within the body of a routine
that declares a return type: it names the cell whose final contents become the
routine’s result (Section 3.4). The identifier exception
is bound, within a rescue block, to the value being handled
(Section 5.7).
2.2Special Constants
A special constant is an integer, real, character, string, or
boolean literal, or the constant nil.
| scon | ::= | int | — integer constant |
| | | real | — real constant | |
| | | char | — character constant | |
| | | string | — string constant | |
| | | true | false | — boolean constant | |
| | | nil | — the null reference |
Integer constants
An integer constant is a non-empty sequence of decimal digits, or a based
literal introduced by 0b (binary), 0o (octal), or
0x (hexadecimal). An underscore may appear between two digits as a
visual separator and has no other significance. There is no sign: a leading
minus is the unary operator of Section 2.6, not part of the constant.
| int | ::= | digit+ | — decimal |
| | | 0b bit+ | — e.g. 0b1111_0000 | |
| | | 0o odigit+ | — e.g. 0o755 | |
| | | 0x hdigit+ | — e.g. 0xFF_AA_33 |
A decimal integer constant has type Integer. The range of
Integer is guaranteed to at least 32 bits but is otherwise
host-defined, as is whether a constant exceeding the host range is accepted; the
guaranteed and host-defined parts of every scalar value space—integer
ranges, the IEEE 754 format of Real, the precision of
Decimal, and the Unicode character model of Char and
String—are set out in Appendix B, §B.3.
Real constants
A real constant has an optional integer part, a mandatory decimal point, a
mandatory fractional part of at least one digit, and an optional exponent. Thus
4.5, 10.0, .5, and 12.0e-3
are real constants, while 10. and 12.e-3 are
not: a real constant may not end at the point. A real constant has type
Real.
| real | ::= | ⟨digit+⟩ . digit+ ⟨exp⟩ |
| exp | ::= | (e|E) ⟨+|-⟩ digit+ |
Character constants
A character constant is a # followed either by a single
character other than a digit or whitespace, by a decimal code point, or by one
of the named characters #nul, #space,
#newline, #tab, #return. Thus
#A, #b, and #65 are character constants. A
character constant has type Char.
String constants
A string constant is a sequence of characters enclosed in double quotes
"…" or single quotes
'…'. A string constant has type
String.
"\n" denotes the two characters backslash and
n, not a newline. Control characters are written with the
corresponding character constants (#newline, #tab)
or obtained from the standard environment. The only role of the backslash in the
lexer is to allow a quote character to appear within a string delimited by
that same quote.
2.3Comments
A comment is introduced by two hyphens -- and extends to the end
of the line. Comments do not nest—there being no closing delimiter—and
are discarded by the lexer. A comment is equivalent to a single space.
2.4Identifiers
An identifier begins with a letter or underscore and continues with letters, underscores, and digits. Identifiers are case-sensitive. By convention—and only by convention—class names begin with an upper-case letter and the names of variables, fields, routines, and parameters with a lower-case letter; the grammar does not enforce this.
| id | ::= | (letter|_) (letter|_|digit)* |
An identifier that coincides with a reserved word is not an identifier but
that reserved word. The longest-match rule (Section 2.5) applies, so
classes is a single identifier and not the keyword
class followed by es.
2.5Lexical Analysis
The character stream is converted to a stream of tokens by the following
rules. Whitespace—spaces, tabs, carriage returns, and line feeds—and
comments separate tokens but are otherwise insignificant; Nex is not an
indentation-sensitive language. Each token is the longest sequence of characters
that can begin a token at the current position (the maximal munch
rule). Thus <= is one token and not two, and := is
the assignment operator and not a colon followed by an equals sign.
The reserved words, the operator and punctuation symbols, the special
constants, and the identifiers are the tokens of the language. The symbol
#{ is a single token opening a set display (Section 3 and
Appendix C); it must be distinguished by maximal munch from the character
constant introducer #.
2.6Operators and Precedence
Nex has a fixed set of infix and prefix operators; the programmer cannot
declare new ones or alter their precedence. The operators are listed below from
lowest to highest binding power. All binary operators are left-associative
except exponentiation ^, which is right-associative. The prefix
operators are unary minus - and logical not.
| Level | Operators | Description |
|---|---|---|
| 1 (lowest) | or | logical disjunction (short-circuit) |
| 2 | and | logical conjunction (short-circuit) |
| 3 | = /= == != | value and identity (in)equality |
| 4 | < <= > >= | ordering comparison |
| 5 | + - | addition, subtraction |
| 6 | * / % ^ | multiplication, division, remainder, power |
| 7 (highest) | - not | prefix negation |
Operators are not first-class and do not denote methods directly at the
surface; an expression such as a + b is evaluated by the rules of
Chapter 5, which appeal to the arithmetic of the operands’ classes.
Application—a call or member access—and parentheses bind more tightly
than any operator.
and and or are short-circuiting in every
back end: in e₁ and e₂ the operand e₂ is evaluated
only if e₁ is true, and in e₁ or e₂ only if
e₁ is false. There are no separate and then /
or else forms; the plain operators are the
short-circuit forms. Consequently x /= nil and x.f is a safe
guard. This is a property of the dynamic semantics
(Section 5.4) and is recorded here because it affects how an expression
is read.
2.7The Grammar of Expressions
An expression denotes a value. The expression grammar is layered by the precedence of Section 2.6; we present it here in collapsed form, taking the precedence and associativity as given, and exhibit the layered productions in Appendix A.
| exp | ::= | scon | — special constant |
| | | id | — variable or field | |
| | | this | — the current object | |
| | | result | — the result cell | |
| | | ( exp ) | — parenthesised | |
| | | exp binop exp | — infix application | |
| | | unop exp | — prefix application | |
| | | exp . id ⟨( args )⟩ | — member access / method call | |
| | | exp ? . id ⟨( args )⟩ | — safe member access | |
| | | id ( args ) | — function or local call | |
| | | create id ⟨tyargs⟩ ⟨. id ⟨( args )⟩⟩ | — object creation | |
| | | when exp then exp else exp end | — conditional expression | |
| | | fn ⟨gen⟩ ( ⟨params⟩ ) ⟨: ty⟩ do block end | — anonymous function | |
| | | spawn do block end | — task creation | |
| | | old exp | — pre-state snapshot (in ensure only) | |
| | | arraylit | maplit | setlit | — collection displays (Appendix C) |
The argument list and the collection displays have the obvious forms:
| args | ::= | ⟨exp (, exp)*⟩ |
| arraylit | ::= | [ ⟨exp (, exp)*⟩ ] |
| maplit | ::= | { ⟨exp : exp (, exp : exp)*⟩ } |
| setlit | ::= | #{ ⟨exp (, exp)*⟩ } |
The empty braces {} denote the empty map; the empty set is
written #{}. These displays are derived forms, expanded into
creations and a sequence of insertions in Appendix C.
2.8The Grammar of Statements
A statement is executed for its effect. A block is a sequence of statements, executed in order. An expression standing alone is a statement, executed for its effect and its value discarded.
| block | ::= | stmt* | |
| stmt | ::= | id := exp | — assignment to a variable or field |
| | | exp . id := exp | — assignment to a field | |
| | | let id ⟨: ty⟩ := exp | — local declaration | |
| | | if exp then block (elseif exp then block)* ⟨else block⟩ end | ||
| | | from block ⟨inv⟩ ⟨var⟩ until exp do block end | — general loop | |
| | | repeat exp do block end | — counted loop | |
| | | across exp as id do block end | — cursor loop | |
| | | case exp of caseclause+ ⟨else stmt⟩ end | — constant dispatch | |
| | | match exp of matchclause+ ⟨else block⟩ end | — type dispatch | |
| | | select selectclause+ ⟨timeout⟩ ⟨else block⟩ end | — communication (Chapter 6) | |
| | | do block ⟨rescue block⟩ end | — scoped block | |
| | | with string do block end | — resource block | |
| | | raise exp | — raise an exception | |
| | | retry | — restart the enclosing do | |
| | | exp | — expression statement |
| caseclause | ::= | scon (, scon)* then stmt |
| matchclause | ::= | when id ⟨tyargs⟩ as id then block |
| inv | ::= | invariant assertion+ |
| var | ::= | variant exp |
The forms governing concurrency—spawn,
select, the timeout clause—are listed here for completeness but
their meaning is given in Chapter 6. The resource block with
and the loop invariant and variant are explained in Chapters 5
and 3 respectively.
2.9Syntactic Restrictions
Certain conditions, though expressible in the grammar, are forbidden. They are checked before elaboration and a violation is a compile-time error.
- No two parameters of one routine, nor two fields of one class, nor two
local declarations in scope at the same point, may bind the same identifier.
An inner
letmay, however, shadow an outer binding of the same name in a nested block. - The expression
old emay occur only within anensureclause, andemust there denote a field of the current object (Section 3.4). It may not be applied to a parameter. - The identifier
resultmay be read or assigned only within a routine that declares a return type. - The cursor variable of an
acrossloop, the bound variable of amatchclause, and the local of aconvertare in scope only within the body they introduce. - A routine has at most one
requireclause and at most oneensureclause; multiple conditions are written as named assertions within the single clause (Section 3.4). retrymay occur only within arescueblock.
Further restrictions that depend on types or on the class hierarchy—the
exhaustiveness of a match on a sealed class, the conformance of an
overriding routine, the immutability of a once field—belong to
the static semantics and are stated in Chapter 4.