A Tryst With Infinity

In Chapter 7 on State and Modularity, we learned how to use assignment as a tool for modeling the real world. In the preceding chapter we came face-to-face with some of the complications that assignment can raise if used along with concurrency. The root cause of the problem is that a variation in the object being modeled will lead to a state variation in the program as well. Along with the problems of concurrent assignments, objects with varying state has another drawback – they are ignorant of their own history. In other words, once a state change has occurred, it is not possible to figure out how the object looked at a point in time in the past.

In this chapter we will explore another way to model change without assignments. The idea, at an abstract level, is to represent the varying object as a function f(x, t), which return the state of the object x at time t. This function can be implemented using an association list or a hash table which maps a time to the state of x at that time. This will work as long as we don't want the state at an arbitrary time in the future or in the past. But for this abstraction to be useful, we should be able to represent very large or even infinite sequences of states. Slogan realizes this idea in the form of lazy sequences.

Before looking at how lazy sequences help us model systems with state without ever using assignment, we should get familiar with the programming technique that make these sequences possible. This technique is known as lazy or delayed evaluation.

9.1 Delayed Evaluation

If we are allowed to package an expression in such a way that it can be evaluated later on demand, we have delayed evaluation in our language. In fact you already know how to do delayed evaluations! When we package an expression as a function, we are effectively delaying the evaluation of that expression until the function is called. For example, if we want an expensive computation to be evaluated only when its value is actually needed by the program, we package the expression into a function. We call that function at the point in the program where the value needs to be computed and used.

let expensive_computation = ^() { showln("performing long computation...")
                                  12345 }

//> performing long computation...
// 12345
//> performing long computation...
// 12345

The problem with the expensive_computation function is that each time it is called, the whole computation is repeated again. A better version of the function will cache the result once computed and return that value when the function is called again.

let expensive_computation = let (cached_result = false)
                              ^() if (cached_result) cached_result
                                  else { showln("performing long computation...")
                                         cached_result = 12345
                                         cached_result }

//> performing long computation...
// 12345
// 12345

This optimization is known as memoization. Slogan provides the delay operator (~) for expressing memoized functions easily. Any expression can be delayed by prefixing it with the delay operator. The delay operator's return value is an object called the promise which can be forced to evaluate and return the expression's value. This value is cached by the promise object and returned when it is forced again.

We can rewrite the expensive_computation function using delay as shown below:

let expensive_computation = ~{ showln("performing long computation...")
                               12345 }          

// <promise>
//> performing long computation...
// 12345
// 12345

9.2 Lazy Sequences

With that introduction to delayed expressions, we are in a position to understand sequences that can conceptually extend to infinity. The fundamental data structure that can be used to represent a sequence of values is the list. As we saw earlier in this book, a list is just a pair of pairs that is terminated by an empty list. To make this clear, look at the following definitions of two lists with similar layouts:

let xs = [1, 2, 3, 4, 5]
let ys = 1:2:3:4:5:[]

// [1, 2, 3, 4, 5]
xs == ys
// true

The problem with lists created from pairs is that they have to eventually terminate. What if we need a list from which we can fetch values to our soul's content? This can be achieved by making a pair whose head is the first value in the sequence and whose tail is a promise to generate the next value when required. This idea is expressed in the function below, which represents an infinite sequence of integers:

function infinite_integers(i)

Infinite_integers calls itself recursively to generate the next value in the sequence, but that call is delayed until the next value is actually needed. So calling the function will not go into an infinite loop.

let xs = infinite_integers(1)
// [1, ...]
// 1
// 2

To have to explicitly force each element in the sequence can be a bit awkward. The functions first, rest and nth know how to access elements in a sequence, invoking force at the right places.

// 1
// 2
nth(0, xs)
// 1
nth(10, xs)
// 11

9.2.1 Generic Operations on Sequences

In Slogan, list is the default data structure for representing and manipulating a sequence of values. When coupled with higher-order functions like map, fold and filter, programs that manipulate lists can lead to elegant solutions to many data transformation problems. To illustrate this, consider a program for computing the sum of all prime numbers in an interval. As a first attempt, let us define the program in an iterative style:

function sum_primes(a, b)
  letfn loop (count = a, accum = 0)
    if (count > b) accum
    else if (is_prime(count)) loop(inc(count), count + accum)
    else loop(inc(count), accum)

// usage example:
sum_primes(1, 20)
// 77
sum_primes(10000, 1000000)
// 37544665627

We can define this function in a more declarative style by using the higher-order functions we saw in Chapter 6.

function sum_primes(a, b)
  fold(`+`, 0, filter(is_prime, range(a, b)))          

Though the second sum_primes is shorter and sweeter than the first, it is grossly inefficient when it comes to memory consumption. Filter cannot do any work until range returns a whole list. Then a new list is generated by filter which is reduced to a sum by fold.

The inefficiency in using lists becomes painfully apparent if we use the same paradigm to compute the second prime in a large interval:

function nth_prime(n, a, b)
  nth(n, filter(is_prime, range(a, b)))

nth_prime(1, 1, 20)
// 3
nth_prime(1, 10000, 1000000)
// 10009

The function does find the nth prime, but the computational overhead is outrageous for calls that search over a large range of numbers. For instance, the call nth_prime(10000, 1000000) will construct a list of almost a million integers, filter this list by testing each element for primality, and then ignore almost all of the result. If we were programming in the iterative style, we would interleave the enumeration and the filtering, and stop when we reached the second prime.

With lazy sequences we can achieve the best of both worlds: We can formulate programs elegantly as sequence manipulations, while attaining the efficiency of incremental computation. There are higher-order functions that can work with lists and lazy sequences alike. Filter and map are two examples. When working with lazy sequences, they will construct only that part of the sequence that is actually required for the current computation. As the computation makes progress, more of the sequence will be realized maintaining an illusion of an infinite stream of values.

Both the sum_primes and nth_prime functions can be rewritten to take advantage of generic functions that takes lazy sequences as arguments. Specifically, we have to replace the call to range with enumerate and the call to fold with accumulate . While range returns a fully realized list of values, enumerate will give us a lazy sequence of the same values. Accumulate knows how to reduce a lazy sequence to a single value, incrementally.

First we will write the sum_primes function using the new sequence functions:

function sum_primes(a, b)
  accumulate(`+`, 0, filter(is_prime, enumerate(a, b)))

When we call sum_primes with a large range, it will return immediately with a lazy sequence of sums, deferring the actual computation of the sum only when it is explicitly requested by a sequence accessor such as nth:

let s = sum_primes(10000, 1000000)
nth(10000, s)
// 628132235

The performance enhancement we get for the nth_prime function is even more profound.

function nth_prime(n, a, b)
  nth(n, filter(is_prime, enumerate(a, b)))

nth_prime(1, 10000, 1000000) // this call will return immediately
// 10009

9.2.2 Modularity of Lazy Sequences

As we saw in Chapter 7, one of the major benefits of introducing assignment is that we can increase the modularity of our systems by hiding or encapsulating its state in local variables. Let us revisit this idea with the example of a simple Pseudo-random Number Generator. First let us implement a purely functional version of the PRNG that do not make use of local state or assignments1.

function rand(seed)

function rand_update(x)
  let (a = 27, b = 26, m = 127)
    modulo(a * x + b, m)

The functional purity of the rand function comes with a cost, users must take care of passing the accurate current state of the PRNG to each successive call to the function. This is because its implementation is not modular enough to hide and keep track of its internal state.

// 42
// 17
// 104

By bringing assignment to the scene, we will be able to create PRNG objects that can keep track of their own internal state and are more modular.

function make_rand(seed)
  let (x = seed)
    ^() { x = rand_update(x)
          x }

let r = make_rand(10)
// 42
// 17
// 104

If we realize the PRNG using lazy sequences, we will be able to have the benefits of modularity by not incurring the costs associated with assignment. With lazy sequences, we will have a stream of random numbers produced by successive calls to rand_update.

function rand(seed)
  let (x = rand_update(seed))

let rs = rand(10)
nth(0, rs)
// 42
nth(2, rs)
// 104
nth(99, rs)
// 20

We get considerable modularity with this approach, yet we do not have to maintain a changing local state.

9.2 Generators

Generators are functions that can behave as if they return an infinite sequence of values. A generator function can be made to return a value before its body is fully evaluated. This is accomplished by calling the yield expression, which returns a pair of the return value and an object known as an iterator. The iterator object knows how to restart the function where yield was called. If the function is written in such a way that it will return another value:iterator pair, the caller effectively gets a sequence of infinite values.

In the next example, we will write a function that will give us an infinite sequence of random numbers, generated with the help of yield.

function rand(seed)
  letfn loop (x = rand_update(seed))
  { yield x
    loop(rand_update(x)) }

let rs = rand(10)
// 42
// 17
// 104

Note that once an iterator is asked to move forward, there is no way to ask it to get a value from the past. Because of this, they may not be a stand-in replacement for lazy sequences created by the delay operator. A function that generates an iterator should also not call itself recursively. Instead use a named letfn as shown in the example.

By default, yield returns void. The next function can be called to pass a user-defined value to yield, which will become its return value. Please see the reference on next for more information.

The traversal on an iterator can be wrapped into a convenience function. This is achieve by calling the iterator constructor. This call will return a function that can be called to traverse the iterator. Each call to this function will result in returning the next value from the iterator. When all values run out, this function will return false.

let iter = iterator(rand(100))

// 59
// 95

9.3 Custom Sequences

As we saw in the previous sections, first and rest are the generic means for accessing many types of sequences. Their genericity is not limited to built-in sequences like lists, lazy sequences and generators. You can define your own custom sequence objects that can be destructured with these functions. This can is accomplished by defining the object as a closure that responds to the 'first and 'rest messages.

In this section we will define a custom sequence that respond to these messages and see how it automatically gets fitted into the larger context of the language. The data structure we are going to define is simply called a bag. You create a bag with a collection of objects. These objects can be taken out one by one, but there is no guarantee to the order in which they will come out, similar to picking out items from a real bag without looking into it.

The following program defines the bag data structure. The constructor make_bag takes a finite sequence as argument. This is sequence is shuffled and stored in a local list. The returned object redirects the first and rest messages to this list.

function make_bag(objects)
  letfn (objs = shuffle(objects))
    | 'first -> first(objs)
    | 'rest -> rest(objs)

// utility functions for shuffling a sequence.
function shuffle(objects)
  letfn loop (objs = objects, result = [])
    if (not(objs)) result
    else let (len = count(objs))
      if (len == 0) result
      else let (i = random_integer(len))
        loop(remove_nth(i, objs), nth(i, objs):result)

function remove_nth(n, objects)
  letfn loop (objs = objects, i = 0, result = [])
    if (not(objs)) reverse(result)
    else if (i == n) loop(rest(objs), i + 1, result)
    else loop (rest(objs), i + 1, first(objs):result)

This is how you would use the new bag sequence:

let b = make_bag([1,2,3,4,5])
// the results you get may be different,
// because the list is randomly shuffled by the bag.
// 4
// 3

// destructuring a bag:
let [a,b,c,d,e] = b;
// [4, 3, 1, 2, 5]

You can make the bag data structure more versatile by overriding the ref and ref_set messages. See the references for more information.

1The algorithm we use to generate random numbers may not be suitable for real-world use. For production code, use the built-in functions random_integer or random_real.

Next | Previous | Contents