The Slogan Handbook - State and Modularity

The world around us is populated by independent objects. If an object's behavior is influenced by its history, then it is said to have state. For example, a bank account has state if it reports a new balance after a successful withdrawal operation. The bank account we defined in the previous chapter do not have state, because the withdrawal operation returned a new bank account instead of updating the current balance. This is not how bank accounts in the real world works!

To create bank account object that behave exactly like its real world counterpart, we need an operation that can change the value associated with its local variable. Slogan provides the assignment operator for this purpose. Assignment is accomplished by the statement:


var = new_value

Using assignment, we will define a new bank account abstraction that support three fundamental operations – balance for checking the current balance, deposit for adding money to the current balance and withdraw for getting money out of the account.


function make_bank_account(balance)
  where balance > 0
{ function deposit(amount)
    balance = balance + amount

  function withdraw(amount)
    where balance - amount > 0
    balance = balance - amount
    
  ^(message)
  | 'balance -> balance
  | 'deposit -> deposit
  | 'withdraw -> withdraw }

As it can now update its internal state, a single instance of bank account is able to keep track of the running balance over a series of transactions:


let account = make_bank_account(100)
account.deposit(10)
account.balance
// 110
account.withdraw(80)
account.balance
// 30

Assignment helps a lot in improving the modularity of a system. To understand this better, consider a function that adds the number 2 to its argument. For the purpose of reporting some metrics, it should also log the number of times it was called. Without assignment, this is how we would define and use this function:


function add2(x, n)
{ showln("add2: ", n+1)
  x+2:n+1 }

let fn = 0
let r:fn2 = add2(10, fn)
//> add2: 1
r
// 12

let r:fn3 = add2(21, fn2)
//> add2: 2
r
// 23

The user of add2 only cares about the result of calling the function, i.e x added with 2. But now he has to worry about the implementation details of add2 and define extra variables to correctly keep track of the metrics. If the caller is careless about the second argument, the metrics log will become wrong and useless.

The implementation of add2 in not modular because it leaks its internals to other parts of the program. Assignment allows us to easily fix the problem. We will maintain the count in a changing variable visible only to add2:


let add2 =
  let (call_count = 0)
    function (x)
    { call_count = call_count + 1
      showln("add2: ", call_count)
      x + 2 }

add2(10)
//> add2: 1
// 12

add2(21)
//> add2: 2
// 23

7.1.1 The Costs of Assignment

As we saw in the preceding section, assignment helps to model objects in the real world in a modular fashion. But this convenience comes with a price. Once we introduce assignment, it becomes harder to reason about the functions we write. As long as we don't do assignments, two invocations of the same function with the same arguments will produce the same result. In other words, functions without assignment are computing mathematical functions with easily verifiable properties. This style of programming is known as functional programming.

Consider a simple function that decrements a number by a given amount. Let us first write a functional version and how see how simple reasoning techniques can be applied to verify its correctness.


function make_dec(balance)
  ^(amount) balance - amount

Does the function returned by make_dec work as expected? Is it possible to verify its correctness without actually running it? For instance, consider the function returned by the call make_dec(100):


let d = make_dec(100)

What will be the result of calling d(20) followed by a call to d(45)? It will produce 80 and 55. We can verify this by simply substituting the variables balance and amount in the function body with their respective values. We can "expand" the function call to the following steps:


d(20)
=> balance - amount
=> 100 - 20
=> 80

d(45)
=> balance - amount
=> 100 - 45
=> 55

We are able to substitute the variable names with their values because the variables are guaranteed not to change. In other words, they are said to be referentially transparent.

Now let us look at a new version of the decrementor function that depend on assignment:


function make_dec(balance)
  ^(amount)
  { balance = balance - amount
    balance }

Will the simple reasoning based on value substitution work for this new implementation? Let us find out by manually evaluating the same sequence of calls as before:


d(20)
=> { balance = balance - amount; balance }
=> { balance = 100 - 20; 100 }
=> 100

d(45)
=> { balance = balance - amount; balance }
=> { balance = 100 - 20; 100 }
=> 100

Our reasoning is obviously wrong¹. If we actually run the code, we will get 80 and 35 as results. Once assignment is brought in we loose the ability to do such simple reasoning about functions. We can no longer assume that a variable is just a name for a value. It now reflects on the underlying computer architecture. The variable has become a reference to a location in the computer's memory, which can change. The programming model that makes extensive use of assignment is known as imperative programming.

Assignment also affects the way how two objects are compared for equality. Without assignment two objects that were equal when they were created continues to be so for the rest of their lives. This is because their internal state never changes. In fact, a smart compiler can even optimize on storage by putting both objects at the same location in memory and doing a fast equality check by just comparing memory addresses. But once we add assignment to the language, for any equality check to be reliable, the structure of both objects has to be inspected.

Assignment tends to be an expensive operation in modern multi-core chips with multiple levels of caches. In functional programming variables are initialized once and they never changes. This leads to efficient caching of frequently accessed values at the hardware level. When a new value is assigned, the cached values has to be flushed and written back to the main memory. This might make a computation with assignments costlier than a purely functional one.

Assignment is also more susceptible to bugs because the order of doing assignments is important. The situation becomes more complicated once we have multiple threads of execution in the same program. Slogan provide elegant solutions for managing state in concurrent programs. This will be the topic of the next two chapters.

7.2 Imperative Loops

Earlier in this book, we learned how to do iterations with recursive function calls. The named letfn construct is a generalization on this to enable local iterations. The iterations we did so far were functional because they don't have to change state to keep track of the loops boundary conditions. For example, consider the following loop. On each iteration, a new value is created and passed to the recursive call to loop. No mutation of existing state is involved.


letfn loop (i = 0)
  when (i <= 2)
  { showln(i)
    loop(i + 1) }

//> 0
    1
    2

If you have programmed in an imperative language like C or Java, you must be more familiar with writing this kind of iterations using the for or while statements. Though both styles will work for simple loops like the one above, some algorithms can be implemented efficiently and cleanly only in an imperative style. Slogan provides the for looping construct to meet such a requirement. The preceding example can be rewritten using for as shown below:


for (i = 0 to 2)
  showln(i)

//> 0
    1
    2


for (i = 2 downto 0)
  showln(i)

//> 2
    1
    0

You are allowed to have explicit control over the looping counter and condition:


for (i = 5; i < 15; i+3)
  showln(i)

//> 5
   8
   11
   14

The for loop construct is an expression in Slogan. It returns the value of the last expression evaluated in its body.


let s = for (j = 0, i = 5; i < 15; i+3)
{ j = j + i
  j }

s
// 38

You can exit early from the loop by calling the break expression. This expression returns false by default. Break can be asked to return another value, by calling it like a function and passing the required value as argument.


for (i = 0; i < 10; i + 1)
  when (i == 5) break
// false

for (i = 0; i < 10; i + 1)
  when (i == 5) break(i + 1)
// 6

The continue expression allows control to return to the top of the loop. Just like break, an optional argument can be passed to continue. This will become the value of the loop if continue happens to be the last expression evaluated in its body.


for (j = 0, i = 0; i < 10; i + 1)
  if (remainder(i, 3) == 0) continue
  else { j = j + i; j }
// false

for (j = 0, i = 0; i < 10; i + 1)
  if (remainder(i, 3) == 0) continue(j)
  else { j = j + i; j }
// 27

for (j = 0, i = 0; i < 10; i + 1)
  if (remainder(i, 2) == 0) continue
  else { j = j + i; j }
// 25

If your style of programming demands more imperative control structures like the for loop, you can add them to the language! We will figure out how to do this and more in the last chapter of Part I.

7.3 Variations on a Theme

Once we have assignment in our language, we also have mutable data structures. For instance, it is now possible to have functions that change the values in the head or tail of a pair:


let p = 10:20
p
// 10:20

set_head(p, 100)
set_tail(p, head(p) * tail(p))
p
// 100:2000


let xs = [10, 20, 30]
xs[1] = xs[0] + xs[2]
xs
// [10, 40, 30]

let ys = #[10, 20, 30]
ys[1] = ys[1] * ys[2]
ys
// #[10, 600, 30]

let zs = #{'name:'alice, 'age:3}
zs['age] = 2
zs;
// #{name: alice, age: 2}

// for faster updates, use type-specific mutators/accessors
// instead of the generic [] operator.
list_set(xs, 0, "hello")
xs
// [hello, 40, 30]

array_set(ys, 0, \a)
ys
// #[\a, 600, 30]

hashtable_set(zs, 'name, 'bob)
zs
// #{name: bob, age: 2}

7.3.1 An Imperative Queue

In this section, we will see how to define new mutable data structures. We will define a queue, a sequence in which items are inserted at one end (called the rear of the queue) and deleted from the other end (the front). The queue data structure will support the following operations:


make_queue      – construct and return an empty queue
front_queue     – select and return the first item in the queue
insert_queue    – insert a new item at the rear of the queue
delete_queue    – remove the first item in the queue
is_empty_queue  – check if the queue is empty

These operations will internally use the following functions to access and modify the pointers to front and rear of the queue. Two lists will represent the front and rear parts and they enable constant time lookup and insertion operations.


function front_ptr(queue) head(queue)
function rear_ptr(queue) tail(queue)
function set_front_ptr(queue, item) set_head(queue, item)
function set_rear_ptr(queue, item) set_tail(queue, item)


function make_queue() []:[]


function is_empty_queue(q) is_empty(front_ptr(q))

Insertion should check for two conditions. If the queue is empty a list is created with the new item and both the front an rear pointers point to this list. If the queue is not empty, the rear pointer is updated to point to this list. Then the rear pointer is "truncated" to make sure the next insertion need not traverse all the elements in the queue.


function insert_queue(q, item)
  let (new_entry = [item])
    if (is_empty_queue(q))
    { set_front_ptr(q, new_entry)
      set_rear_ptr(q, new_entry) }
    else
    { set_tail(rear_ptr(q), new_entry)
      set_rear_ptr(q, new_entry) }

To select the item at the front of the queue, we return the head of the pair indicated by the front pointer.


function front_queue(q)
  if (is_empty_queue(q))          
    error("front called on an empty queue")
  else
    head(front_ptr(q))

To delete the item at the front of the queue, we merely modify the front pointer so that it now points at the second item in the queue.


function delete_queue(q)          
  if (is_empty_queue(q))
    error("delete called on an empty queue")
  else
    set_front_ptr(q, tail(front_ptr(q)))


let q = make_queue()
insert_queue(q, 1)
insert_queue(q, 2)
front_queue(q)
// 1
delete_queue(q)
insert_queue(q, 3)
front_queue(q)
// 2
delete_queue(q)
front_queue(q)
// 3

Each insert and delete operation mutates the internal state of the queue. An advantage of this is that, we are able to do all operations in constant time. On the downside, we lose a version of the queue with an insert or delete operation. There is no way to "go back" and see the contents of the queue at point past in time. We also cannot easily share this queue across simultaneously executing computations (threads or tasks) without adding code for data synchronization. For applications that require these properties, we need a queue that is immutable.

7.3.2 A Functional Queue

In the pure functional model that disallow assignments, the best we can do is to get amortized constant-time insert and delete operations. That is, any sequence of n insert and delete operations takes a total time that is proportional to some constant times n. Any individual operation might not be constant-time, however.

The functional queue is also represented as a pair of lists, one for the front and the other for the rear. At any instant, the queue content is given by append(front, reverse(rear)). An element is inserted by adding it to the head of rear and deleted by removing it from the head of front. To make this representation work, each element in rear has to be moved to front sooner or later. This should be done only occasionally, when front becomes empty. Then rear is reversed and moved to front. The reverse takes time proportional to the length of rear, i.e., to the number of elements it reverses. Each element that goes through the queue is passed exactly once from rear to front. Allocating the execution time of reverse to each element therefore gives a constant time per element. This is why the queue is amortized.

Having described how our immutable queue works, let us move on to its implementation. The queue constructor needs to accept the front and rear lists as optional arguments. This is because each insert and delete operation has to create a new queue, instead of mutating the existing one. We will also define a helper function called check_queue whose purpose is to reverse and move rear to front if front is empty.


function make_queue(f = [], r = []) f:r
          
function check_queue(q)
  if (is_empty(front_ptr(q)))
    make_queue(reverse(rear_ptr(q)))
  else q

The insert_queue function adds the new item to the head of rear and makes sure the queue is ready to return the next item from its front. Similarly, delete_queue also ensures the required invariant after removing the head of front.


function insert_queue(q, item)
  check_queue(make_queue(front_ptr(q), item:rear_ptr(q)))

function delete_queue(q)
  check_queue(make_queue(tail(front_ptr(q)), rear_ptr(q)))

The program below show operations on the queue. As this queue is immutable, we need to explicitly keep track of instances to reach a desired final state.


let q = insert_queue(insert_queue(make_queue(), 1), 2)
front_queue(q)
// 1
front_queue(delete_queue(insert_queue(delete_queue(q), 3)))
//3

7.4 Modules and Namespaces

The functions that implemented the immutable queue has effectively over-written the imperative queue implementation. Is it possible to have both implementations in the same program, while maintaining a single "protocol" for operations on queues? The obvious solution is to implement the queues as two closures, one imperative and the other functional.

Slogan provide a convenient way to describe closure objects, without having to explicitly define and return a "message-handler" function. This is accomplished by the module expression. A module definition has the following syntax:


module <name> (<export_list>) <definitions>

The export_list contain the names from definitions that must be visible outside the module. These are essentially the "messages" an instance of the module can respond to.


module mutable_queue(make, is_empty:is_empty_queue,
                     front, insert, delete)
{ function front_ptr(queue) head(queue)
  function rear_ptr(queue) tail(queue)
  function set_front_ptr(queue, item) set_head(queue, item)
  function set_rear_ptr(queue, item) set_tail(queue, item)          

  function make() []:[]

  function is_empty_queue(q) is_empty(front_ptr(q))

  function insert(q, item)
    let (new_entry = [item])
      if (is_empty_queue(q))
      { set_front_ptr(q, new_entry)
        set_rear_ptr(q, new_entry) }
      else
      { set_tail(rear_ptr(q), new_entry)
        set_rear_ptr(q, new_entry) }

  function front(q)
    if (is_empty_queue(q))          
      error("front called on an empty queue")
    else
      head(front_ptr(q))  

  function delete(q)          
    if (is_empty_queue(q))
      error("delete called on an empty queue")
    else
      set_front_ptr(q, tail(front_ptr(q))) }

The functions make, is_empty_queue, front, insert and delete are exported and made visible to the consumers of the module. These names constitute the public interface of the module. The functions front_ptr, rear_ptr, set_front_ptr and set_rear_ptr are used in the implementation of the module and need not be published to the outside world.

Note that for all public functions, except for is_empty_queue we have omitted the _queue suffix. This is because the module name makes the extra suffix redundant. We retained the suffix for is_empty_queue so that it won't conflict with the is_empty built-in function used in its implementation. But note that we export is_empty_queue as is_empty. This achieved by the export specification – is_empty:is_empty_queue. The exported name will not conflict with the global is_empty because it is scoped by the module name.


let mq = mutable_queue
let q = mq.make()
mq.insert(q, 1)
mq.insert(q, 2)
mq.front(q)
// 1

mq.delete(q)
mq.insert(q, 3)
mq.front(q)
// 2

mq.delete(q)
mq.front(q)
// 3

Modules are first-class objects in Slogan. They can be assigned to variables, they can be passed as arguments to functions and they can be dynamically created and returned from functions. This is why we could assign the shorter alias mq to refer to the mutable_queue module.


module immutable_queue(make, is_empty:is_empty_queue,
                       front, insert, delete)          
{ function front_ptr(queue) head(queue)
  function rear_ptr(queue) tail(queue)

  function make(f = [], r = []) f:r
          
  function check(q)
    if (is_empty(front_ptr(q)))
      make(reverse(rear_ptr(q)))
    else q

  function is_empty_queue(q) is_empty(front_ptr(q))
    
  function front(q)
    if (is_empty_queue(q))          
      error("front called on an empty queue")
    else head(front_ptr(q))

  function insert(q, item)
    check(make(front_ptr(q), item:rear_ptr(q)))

  function delete(q)
    check(make(tail(front_ptr(q)), rear_ptr(q))) }


let imq = immutable_queue
let q = imq.insert(imq.insert(imq.make(), 1), 2)
imq.front(q)
// 1
imq.front(imq.delete(imq.insert(imq.delete(q), 3)))
// 3

Now we have two separate queue modules that share a single protocol. How can we package these two together so that users can choose between the mutable and immutable implementations as they wish? One way to do this is to put them into a "super" module as shown below:


module queue(mutable_queue, immutable_queue)
{ module mutable_queue(...) { ... }
  module immutable_queue(...) { ... } }


namespace queue
{ module mutable_queue(...) { ... }
  module immutable_queue(...) { ... } }

All members in a namespace are public and are accessible via the :: operator.


let mq = queue::mutable_queue
let imq = queue::immutable_queue

The primary purpose of a namespace is to arrange modules and other namespaces into logical groups. Just like modules, namespaces are first-class objects in Slogan.

¹We did not substitute balance before the assignment operator because the expression 100 = 100 - 20 does not make sense and cannot be evaluated.

7