13

Become a Language Designer!

The chapter Advanced Flow Control ended with the promise of building control abstractions that have the same status as built-in ones like try and yield. We specifically mentioned about a control structure that will enable us to do nondeterministic computing in Slogan.

The key idea here is that expressions in a nondeterministic language can have more than one possible value. Our nondeterministic program evaluator will work by automatically choosing a possible value and keeping track of the choice. If a subsequent requirement is not met, the evaluator will try a different choice, and it will keep trying new choices until the evaluation succeeds, or until we run out of choices. Nondeterministic computing has important applications in fields like artificial intelligence and relational programming.

But before undertaking the grand job of adding nondeterminism to the language, we must become familiar with the facilities that allow us to build new syntactic abstractions in Slogan. Two complementary methods are available for adding new syntax to Slogan. The first two sections of this chapter will discuss these in some depth and show some practical examples. Then we will add nondeterministic operators to Slogan by taking advantage of the knowledge we gained in this chapter and the chapter on flow control.1

13.1 Declaring New Syntax

The simplest way to add new syntax is by using the declare syntax statement. It has the general form shown below:


declare syntax <keyword> <layout> => <expansion>
      

Keyword is a name that uniquely identifies the new syntactic form. Layout defines the structure of the extension as it will appear in code. This can contain operators, identifiers and literals. All identifiers starting with the dollar sign ($) are treated specially by the compiler. They will be replaced by valid Slogan expressions. The expansion part will contain Slogan code that should be generated by the compiler upon successfully parsing the new syntax.

The best way to understand the declare syntax statement, is to use it to implement a new syntactic extension. We will add the do-while construct from the C programming language to Slogan. Do-while is an imperative loop which executes its body at-least once, even if the condition in the while clause is false. We will define the syntax of the do-while loop as


do <expression_to_execute>
while (<condition_to_check>)
      

This syntax will expand to produce a code block that will execute expression_to_execute once and then go into a named let to conditionally execute it in a loop, controlled by the value of condition_to_check. The implementation of this construct follows:


declare syntax
  do $body while($condition)
  => { $body
       letfn loop ()
        when ($condition)
        { $body
          loop() }}
      

Some examples of using the new do-while loop follows:


let i = 0
do
{ showln(i)
  i = i + 1
} while (i < 5)
//> 0
    1
    2
    3
    4
  
do
{ showln(i)
  i = i + 1
} while (i < 5)
//> 5
      

The code generated by declare syntax is hygienic. That means, free variables in the expansion code will refer to bindings visible when the syntax was specified. This is demonstrated in the following program:


let x = 100

declare syntax
  prnx => showln(x)

prnx
//> 100
let (x = 200) prnx
//> 100
      

13.2 Un-hygienic Macros

Another facility for extending the compiler is the macro system. It gives full access to the lower-level compiler APIs for generating code. These extensions are created using the declare macro statement which has the general form:


declare macro <keyword> <layout> => <expander>
      

The keyword and layout parts are similar to that of declare syntax. Expander is Slogan code that will be executed by the compiler to generate the actual expansion of the macro. The module core.compiler define functions that can generate low-level Slogan code. The expander will consist mostly of calls to these functions.

The following example shows a kind of extension that can be defined only by using macros – one that introduces new variable bindings.


declare macro
  bind_names ($names, $values) $body
  => let (bindings = map(^(n, v) [n, v], $names, $values))
       compiler.let_(bindings, $body)
    
// Usage:
bind_names([a, b, c], [1, 2, 3]) a + b + c
// 6
      

It is possible to add custom code parsers and invoke them from macros. The parsers are functions called by the compiler. So they must be defined using a declare function statement. Let us look at an example of this. The macro we are going to define is called str. It expects an asterisk (*) symbol followed by an identifier. The macro converts the identifier to a string. In short, the str macro defines a DSL for converting symbols to strings. Not very useful, but good enough to illustrate compile-time parser functions!


declare function compile_ident(tokenizer)
  let (token = tokenizer.next)
  { when (special_token_to_string(token) <> "*")
      error("expected asterisk")
    let (sym = tokenizer.next)
    { when (not(is_valid_identifier(sym)))
        error("invalid identifier")
      sym }}

declare macro
  str $name
  => symbol_to_string($name),
     [$name:compile_ident]

// Usage:
(str * hello) == "hello"
// true
      

Macros are un-hygienic in that free variables in the expander takes their definition from the context of macro expansion, not from where the macro was defined. This is demonstrated by the program below.


let x = 100

declare macro
  prnx_impure => compiler.call('showln, ['x])

// Usage:
prnx_impure
// 100
let (x = 200) prnx_impure
// 200
      

The gensym function can be used for generating symbolic names that are guaranteed not to conflict with names of user-defined variables.

13.3 Nondeterministic Computations

Nondeterministic computing2, like lazy sequence processing, is useful for "generate and test" applications. Consider the task of starting with two lists of positive integers and finding a pair of integers - one from the first list and one from the second list - whose sum is prime. In previous chapters we saw how to handle this with finite sequence operations and with infinite sequences. Our approach was to generate the sequence of all possible pairs and filter these to select the pairs whose sum is prime. Whether we actually generate the entire sequence of pairs first, or interleave the generating and filtering is immaterial to the essential image of how the computation is organized.

The nondeterministic approach evokes a different image. Imagine simply that we choose (in some way) a number from the first list and a number from the second list and require (using some mechanism) that their sum be prime. This is expressed by the following function:


function prime_sum_pair(xs, ys)
  let (x = an_element_of(xs), y = an_element_of(ys))
  { require(is_prime(x + y));
    [x, y] }
      

It might seem as if this function merely restates the problem, rather than specifying a way to solve it. Nevertheless, this is a legitimate nondeterministic program.

The key idea here is that expressions in a nondeterministic language can have more than one possible value. For instance, an_element_of might return any element of the given list. Our nondeterministic program will work by automatically choosing a possible value and keeping track of the choice. If a subsequent requirement is not met, the program will try a different choice, and it will keep trying new choices until the evaluation succeeds, or until we run out of choices. Just as lazy expressions freed the programmer from the details of how values are delayed and forced, the nondeterministic program will free the programmer from the details of how choices are made.

To enable Slogan to support nondeterminism, we introduce a new syntactic extension called amb. The expression amb [e1, e2, ..., en] will return the value of one of the n expressions ei ambiguously. For example, the expression


[amb [1, 2, 3], amb ['a, 'b]]
      

can have six possible values:


[1, a], [1, b], [2, a], [2, b], [3, a], [3, b]
      

Amb with a single choice produces an ordinary (single) value.

Amb with no choices - the expression amb [] - is an expression with no acceptable values. Operationally, we can think of it as an expression that when evaluated causes the computation to "fail": The computation aborts and no value is produced.

Amb can be implemented with the help of continuations, which are used to keep track of the next choice point the computation should return to. When it is first invoked, amb simply returns the first element from the list of values passed to it. It also establishes a failure continuation, which if invoked will restart the selection process with the rest of the elements in the list. This search process is allowed to continue until the list is empty, at which point the default failure condition is executed. In our implementation, this will raise an error.


function fail()
  error("search tree exhausted")
      

The implementation of amb will keep track of its next choice point by updating fail with a continuation to restart the search with the remaining values:


declare syntax
  amb $values
  => callcc(^(k)
     { let failed_k, fail0 = false, fail
       let xs = callcc(^(fk) { failed_k = fk; $values })
       match (xs)
         [] -> fail()
       | [x] -> { fail = fail0; k(x) }
       | y:ys -> { fail = ^() failed_k(ys); k(y) }})          
      

We will also define a function that will repeat the search until a condition is met. This is the same require function we used in the prime_sum_pair above:


function require(p)
  when(not(p)) fail()
      

With the primitives required for adding nondeterminism to our programs in place, we are ready to define a working version of the prime_sum_pair function:


function prime_sum_pair(xs, ys)
  let (x = amb xs, y = amb ys)
  { require(is_prime(x + y))
    list(x, y) }

// Usage:  
prime_sum_pair([11, 12, 4, 7], [10, 3, 9, 5])
// [12, 5]
  
prime_sum_pair([1,2,3,4,5], [120, 100, 300, 400])
// [1, 100]
      

Exercise 13.1. Having amb as a syntactic extension clearly marks it out as doing something very different from a regular function call. But it is not necessary that is should be a special form recognized by the compiler. How will you define amb as a normal function?

Exercise 13.2 As the argument to amb is a regular list, all its elements are evaluated even before their values may be actually required by the nondeterministic computation. How will you make sure that the expressions passed to amb will be evaluated only when they are explicitly required?

13.3.1 Solving Logic Puzzles

The following puzzle (taken from Dinesman 1968) is typical of a large class of simple logic puzzles:

Baker, Cooper, Fletcher, Miller, and Smith live on different floors of an apartment house that contains only five floors. Baker does not live on the top floor. Cooper does not live on the bottom floor. Fletcher does not live on either the top or the bottom floor. Miller lives on a higher floor than does Cooper. Smith does not live on a floor adjacent to Fletcher's. Fletcher does not live on a floor adjacent to Cooper's. Where does everyone live?

We can determine who lives on each floor in a straightforward way by enumerating all the possibilities and imposing the given restrictions:


function multiple_dwelling()
  let (xs = [1, 2, 3, 4, 5],
       baker = amb xs,
       cooper = amb xs,
       fletcher = amb xs,
       miller = amb xs,
       smith = amb xs)
   { require(is_distinct([baker, cooper, fletcher, miller, smith]))
     require(baker <> 5)
     require(cooper <> 1)
     require(fletcher <> 5)
     require(fletcher <> 1)
     require(miller > cooper)
     require(abs(smith - fletcher) <> 1)
     require(abs(fletcher - cooper) <> 1);
     ['baker:baker, 'cooper:cooper,
      'fletcher:fletcher, 'miller:miller,
      'smith:smith] }
      

The function is_distinct determines if all elements in a list are unique and can be defined as,


function is_distinct(xs)
  if (is_empty(xs)) true
  else if (is_empty(tail(xs))) true
  else if (member(head(xs), tail(xs))) false
  else is_distinct(tail(xs))
      

Calling the multiple_dwelling function will solve the puzzle for us:


multiple_dwelling()
// [baker:3, cooper:2, fletcher:4, miller:5, smith:1]
      

Exercise 13.3 If you carefully read the specification of the puzzle, it will become clear that not all combinations of peoples and floors needs to be generated and searched. For instance, the combination of baker and floor 5 need not be checked as it is explicitly stated that he does not live on that floor. Can you write a more efficient version of multiple_dwelling by eliminating all such redundant checks?


1In the current implementation of Slogan, syntactic abstractions are visible only in the scope they are defined. For instance, a macro can be used only in the closure, module or namespace that it's defined in. New syntax or macro should ideally be defined in the global scope.

2This section is based mostly on content from Chapter 4.3 of SICP.


Next | Previous | Contents