In Chapter 7 on State and Modularity, we learned how to use assignment as a tool for modeling the real world. In the preceding chapter we came face-to-face with some of the complications that assignment can raise if used along with concurrency. The root cause of the problem is that a variation in the object being modeled will lead to a state variation in the program as well. Along with the problems of concurrent assignments, objects with varying state has another drawback – they are ignorant of their own history. In other words, once a state change has occurred, it is not possible to figure out how the object looked at a point in time in the past.
In this chapter we will explore another way to model change without assignments. The idea, at an abstract level, is to represent the varying
object as a function f(x, t), which return the state of the object x
at time t
. This function can be implemented
using an association list or a hash table which maps a time to the state of x
at that time. This will work as long as we don't
want the state at an arbitrary time in the future or in the past. But for this abstraction to be useful, we should be able to represent very large
or even infinite sequences of states. Slogan realizes this idea in the form of lazy sequences.
Before looking at how lazy sequences help us model systems with state without ever using assignment, we should get familiar with the programming technique that make these sequences possible. This technique is known as lazy or delayed evaluation.
If we are allowed to package an expression in such a way that it can be evaluated later on demand, we have delayed evaluation in our language. In fact you already know how to do delayed evaluations! When we package an expression as a function, we are effectively delaying the evaluation of that expression until the function is called. For example, if we want an expensive computation to be evaluated only when its value is actually needed by the program, we package the expression into a function. We call that function at the point in the program where the value needs to be computed and used.
let expensive_computation = ^() { showln("performing long computation...")
task_sleep(2)
12345 }
expensive_computation()
//> performing long computation...
// 12345
expensive_computation()
//> performing long computation...
// 12345
The problem with the expensive_computation
function is that each time it is called, the whole computation is repeated again.
A better version of the function will cache the result once computed and return that value when the function is called again.
let expensive_computation = let (cached_result = false)
^() if (cached_result) cached_result
else { showln("performing long computation...")
task_sleep(2)
cached_result = 12345
cached_result }
expensive_computation()
//> performing long computation...
// 12345
expensive_computation()
// 12345
This optimization is known as memoization. Slogan provides
the delay operator (~
) for expressing memoized functions easily.
Any expression can be delayed by prefixing it
with the delay operator. The delay operator's return value is an object called the promise which can be forced
to evaluate and return the expression's value. This value is cached by the promise object and returned when it is forced again.
We can rewrite the expensive_computation
function using delay as shown below:
let expensive_computation = ~{ showln("performing long computation...")
task_sleep(2)
12345 }
expensive_computation
// <promise>
force(expensive_computation)
//> performing long computation...
// 12345
force(expensive_computation)
// 12345
With that introduction to delayed expressions, we are in a position to understand sequences that can conceptually extend to infinity. The fundamental data structure that can be used to represent a sequence of values is the list. As we saw earlier in this book, a list is just a pair of pairs that is terminated by an empty list. To make this clear, look at the following definitions of two lists with similar layouts:
let xs = [1, 2, 3, 4, 5]
let ys = 1:2:3:4:5:[]
ys
// [1, 2, 3, 4, 5]
xs == ys
// true
The problem with lists created from pairs is that they have to eventually terminate. What if we need a list from which we can fetch values to our soul's content? This can be achieved by making a pair whose head is the first value in the sequence and whose tail is a promise to generate the next value when required. This idea is expressed in the function below, which represents an infinite sequence of integers:
function infinite_integers(i)
i:~infinite_integers(i+1)
Infinite_integers
calls itself recursively to generate the next value in the sequence, but that call is delayed until the next value is
actually needed. So calling the function will not go into an infinite loop.
let xs = infinite_integers(1)
xs
// [1, ...]
head(xs)
// 1
head(force(tail(xs)))
// 2
To have to explicitly force each element in the sequence can be a bit awkward. The functions first
,
rest
and nth
know how to access elements in a sequence,
invoking force
at the right places.
first(xs)
// 1
first(rest(xs))
// 2
nth(0, xs)
// 1
nth(10, xs)
// 11
In Slogan, list is the default data structure for representing and manipulating a sequence of values. When coupled with
higher-order functions like map
, fold
and
filter
,
programs that manipulate lists can lead to elegant solutions to many data transformation problems. To illustrate this, consider
a program for computing the sum of all prime numbers in an interval. As a first attempt, let us define the program in an iterative style:
function sum_primes(a, b)
letfn loop (count = a, accum = 0)
if (count > b) accum
else if (is_prime(count)) loop(inc(count), count + accum)
else loop(inc(count), accum)
// usage example:
sum_primes(1, 20)
// 77
sum_primes(10000, 1000000)
// 37544665627
We can define this function in a more declarative style by using the higher-order functions we saw in Chapter 6.
Though the second sum_primes
is shorter and sweeter than the first, it is grossly inefficient when it comes to memory consumption.
Filter
cannot do any work until range
returns a whole list. Then a new list is generated by filter
which
is reduced to a sum by fold
.
The inefficiency in using lists becomes painfully apparent if we use the same paradigm to compute the second prime in a large interval:
function nth_prime(n, a, b)
nth(n, filter(is_prime, range(a, b)))
nth_prime(1, 1, 20)
// 3
nth_prime(1, 10000, 1000000)
// 10009
The function does find the nth prime, but the computational overhead is outrageous for calls that search over a large range of
numbers. For instance, the call nth_prime(10000, 1000000)
will construct a list of almost a million integers, filter this
list by testing each element for primality, and then ignore almost all of the result. If we were programming in the iterative style,
we would interleave the enumeration and the filtering, and stop when we reached the second prime.
With lazy sequences we can achieve the best of both worlds: We can formulate programs elegantly as sequence manipulations,
while attaining the efficiency of incremental computation. There are higher-order functions that can work with lists and lazy sequences
alike. Filter
and map
are two examples. When working with lazy sequences, they will construct only that
part of the sequence that is actually required for the current computation. As the computation makes progress,
more of the sequence will be realized maintaining an illusion of an infinite stream of values.
Both the sum_primes
and nth_prime
functions can be rewritten to take advantage of generic functions that
takes lazy sequences as arguments. Specifically, we have to replace the call to range
with
enumerate
and the call to fold
with
accumulate
. While range
returns a fully realized list of values,
enumerate
will give us a lazy sequence of the same values. Accumulate
knows how to reduce a lazy sequence
to a single value, incrementally.
First we will write the sum_primes
function using the new sequence functions:
function sum_primes(a, b)
accumulate(`+`, 0, filter(is_prime, enumerate(a, b)))
When we call sum_primes
with a large range, it will return immediately with a lazy sequence of sums, deferring the actual
computation of the sum only when it is explicitly requested by a sequence accessor such as nth
:
let s = sum_primes(10000, 1000000)
nth(10000, s)
// 628132235
The performance enhancement we get for the nth_prime
function is even more profound.
function nth_prime(n, a, b)
nth(n, filter(is_prime, enumerate(a, b)))
nth_prime(1, 10000, 1000000) // this call will return immediately
// 10009
As we saw in Chapter 7, one of the major benefits of introducing assignment is that we can increase the modularity of our systems by hiding or encapsulating its state in local variables. Let us revisit this idea with the example of a simple Pseudo-random Number Generator. First let us implement a purely functional version of the PRNG that do not make use of local state or assignments1.
function rand(seed)
rand_update(seed)
function rand_update(x)
let (a = 27, b = 26, m = 127)
modulo(a * x + b, m)
The functional purity of the rand
function comes with a cost, users must take care of passing the accurate current state of the
PRNG to each successive call to the function. This is because its implementation is not modular enough to hide and keep track of its
internal state.
rand(10)
// 42
rand(42)
// 17
rand(17)
// 104
By bringing assignment to the scene, we will be able to create PRNG objects that can keep track of their own internal state and are more modular.
function make_rand(seed)
let (x = seed)
^() { x = rand_update(x)
x }
let r = make_rand(10)
r()
// 42
r()
// 17
r()
// 104
If we realize the PRNG using lazy sequences, we will be able to have the benefits of modularity by not incurring the costs
associated with assignment. With lazy sequences, we will have a stream of random numbers
produced by successive calls to rand_update
.
function rand(seed)
let (x = rand_update(seed))
x:~rand(x)
let rs = rand(10)
nth(0, rs)
// 42
nth(2, rs)
// 104
nth(99, rs)
// 20
We get considerable modularity with this approach, yet we do not have to maintain a changing local state.
Generators are functions that can behave as if they return an infinite sequence of values. A generator function can be made to return
a value before its body is fully evaluated. This is accomplished by calling the yield
expression, which
returns a pair of the return value and an object known as an iterator. The iterator object knows how to restart the function where
yield
was called. If the function is written in such a way that it will return another
value:iterator
pair, the caller effectively gets a sequence of infinite values.
In the next example, we will write a function that will give us an infinite sequence of random numbers, generated with the help of
yield
.
function rand(seed)
letfn loop (x = rand_update(seed))
{ yield x
loop(rand_update(x)) }
let rs = rand(10)
first(rs)
// 42
first(rest(rs))
// 17
first(rest(rs))
// 104
Note that once an iterator is asked to move forward, there is no way to ask it to get a value from the past. Because of this, they may not be a
stand-in replacement for lazy sequences created by the delay
operator. A function that generates an iterator should also not call
itself recursively. Instead use a named letfn
as shown in the example.
By default, yield
returns void
. The next
function can
be called to pass a user-defined value to yield
, which will become its return value. Please see the reference
on next
for more information.
The traversal on an iterator can be wrapped into a convenience function. This is achieve by calling the
iterator
constructor. This call will return a function that can be called to traverse the iterator.
Each call to this function will result in returning the next value from the iterator. When all values run out, this function will return
false
.
let iter = iterator(rand(100))
iter()
// 59
iter()
// 95
As we saw in the previous sections, first
and rest
are the generic means for accessing many types of sequences.
Their genericity is not limited to built-in sequences like lists, lazy sequences and generators. You can define your own custom sequence objects
that can be destructured with these functions. This can is accomplished by defining the object as a closure that responds to the
'first
and 'rest
messages.
In this section we will define a custom sequence that respond to these messages and see how it automatically gets fitted into the larger context of the language. The data structure we are going to define is simply called a bag. You create a bag with a collection of objects. These objects can be taken out one by one, but there is no guarantee to the order in which they will come out, similar to picking out items from a real bag without looking into it.
The following program defines the bag
data structure. The constructor make_bag
takes a finite sequence as argument.
This is sequence is shuffled and stored in a local list. The returned object redirects the first
and rest
messages to
this list.
function make_bag(objects)
letfn (objs = shuffle(objects))
^(msg)
| 'first -> first(objs)
| 'rest -> rest(objs)
// utility functions for shuffling a sequence.
function shuffle(objects)
letfn loop (objs = objects, result = [])
if (not(objs)) result
else let (len = count(objs))
if (len == 0) result
else let (i = random_integer(len))
loop(remove_nth(i, objs), nth(i, objs):result)
function remove_nth(n, objects)
letfn loop (objs = objects, i = 0, result = [])
if (not(objs)) reverse(result)
else if (i == n) loop(rest(objs), i + 1, result)
else loop (rest(objs), i + 1, first(objs):result)
This is how you would use the new bag sequence:
let b = make_bag([1,2,3,4,5])
// the results you get may be different,
// because the list is randomly shuffled by the bag.
first(b)
// 4
first(rest(b))
// 3
// destructuring a bag:
let [a,b,c,d,e] = b;
[a,b,c,d,e]
// [4, 3, 1, 2, 5]
You can make the bag data structure more versatile by overriding the ref
and
ref_set
messages. See the references for more information.
1The algorithm we use to generate random numbers may not be suitable for real-world use.
For production code, use the built-in functions random_integer
or
random_real
.