A Heap for Proletarians

2017-June-22

Hash tables and arrays typically support efficient access to arbitrary elements. What if we need efficient access to elements based on some priority? Say for example, we need to fetch the minimum element in O(1) time. The data type that provide this kind of access is known as a priority queue. Priority queues are typically implemented as heap-ordered trees, in which the element at each node has higher "priority" than the elements at its children. Under this invariant, the element with the top priority will always be at the root of the tree. The priority of each node will be determined by a predicate, like one of the comparison operators.

In this post we take a detailed look at the implementation of a purely functional priority queue using a variant of the heap data structure known as the leftist heap. We will do the implementation in two steps. First, a core set of Clojure functions that manipulate a leftist heap is implemented. Then we will wrap these functions into an object that can be recognized by the built-in sequence functions – first, rest and conj.

The Leftist Heap

In a leftist heap, each node has an associated s-value or rank which is the node's distance to the nearest leaf (an empty-node). The leftist heap also maintains the invariant that the right subtree of each node has the lowest rank. In other words, the left subtree of a node will be at least as large as the right subtree. It is usually taller than the right subtree, giving the heap its name. As a consequence, the right subtree will always have the shortest path to an empty node.

The right subtree of a leftist heap is stored in sorted order. This means two leftist heaps can be merged by simply merging their right subtrees and then swapping the nodes along this path as necessary to restore the leftist property.

Merging leftist heaps has worst-case O(log n) complexity and this is what makes this data structure interesting. The merge operation we just described can be expressed in Clojure as,


(defn heap-merge
  [h1 h2]
  (cond
    (empty-heap? h1) h2
    (empty-heap? h2) h1
    :else (let [[_ x a1 b1] h1
                [_ y a2 b2] h2]
            (if (<= x y)
              (make-node x a1 (heap-merge b1 h2))
              (make-node y a2 (heap-merge h1 b2))))))

Heap-merge makes use of a few helper functions. The make-node function encodes a single node as the list [rank, node-value, left-subtree, right-subtree]. It also calculates the rank of the node and swaps the left and right subtrees as needed.


(defn make-node
  [obj subtree-a subtree-b]
  (let [ra (node-rank subtree-a)
        rb (node-rank subtree-b)]
    (if (>= ra rb)
      [(inc rb) obj subtree-a subtree-b]
      [(inc ra) obj subtree-b subtree-a])))

The node-rank function returns the rank associated with a node:


(defn node-rank
  [h]
  (if (empty-heap? h)
    0
    (first h)))

An empty heap is represented by the keyword :E.


(def empty-heap :E)

(defn empty-heap?
  [h]
  (= h empty-heap))

With the efficient merge function in place, we may proceed to implement the interface of the priority queue. This consists of three functions – pq-insert, pq-find-min and pq-delete-min.

Pq-insert creates a single-node tree with the new element and merges it with an existing heap:


(defn pq-insert
  [obj h]
  (heap-merge [1 obj empty-heap empty-heap] h))

Pq-find-min return the root of the heap:


(defn pq-find-min
  [h]
  (when-not (empty-heap? h)
    (let [[_ obj _ _] h] obj)))

Pq-delete-min will discard the root and return a new heap constructed by merging the two subtrees:


(defn pq-delete-min
  [h]
  (if (empty-heap? h)
    h
    (let [[_ obj left right] h]
      (heap-merge left right))))

As pq-insert and pq-delete-min basically does a merge, both of them run in O(log n) time. Pq-find-min runs in O(1) time.

With these core functions in place, we are ready to use the leftist-heap priority queue!


clojure> (def pq (pq-insert 231
                   (pq-insert 154
                    (pq-insert 900 empty-heap))))
clojure> (pq-find-min pq)
;; 154
clojure> (def pq (pq-delete-min pq))
clojure> (pq-find-min pq)
;; 231
clojure> (def pq (pq-delete-min pq))
clojure> (pq-find-min pq)
;; 900
clojure> (def pq (pq-delete-min pq))
clojure> (pq-find-min pq)
;; nil

Exercise 1. Heap-merge uses the <= numeric comparison function for prioritizing the nodes. Make this function more generic by making the priority-operation configurable by the user.

The Priority Queue as a Seq

Clojure defines many useful operations in terms of sequences. A sequence is immutable and persistent. As a result, they are thread-safe and can share their state freely.

The priority queue we implemented in the previous section is also immutable and persistent. So it would be a good candidate for being a sequence. Let us make the priority queue respond to the first, rest and conj functions. These functions in turn can be used to describe more sophisticated sequence processing operations. This way our custom data structure is given a chance to blend nicely with the rest of Clojure.

To be treated as a sequence, a type has to implement the clojure.lang.ISeq interface. We will use the deftype macro to define a LeftistHeapPriorityQueue that implements the ISeq interface.

While iterating over a sequence, it is idiomatic in Clojure to call the seq function to check for termination. A priority queue may sometimes be involved in an iteration, as part of a debugging session or for fetching all values sorted by priority. So let us also make our new type respond properly to the seq function call. This function is declared in the clojure.lang.Seqable interface.

All these requirements leads to the following final definition of LeftistHeapPriorityQueue:


(deftype LeftistHeapPriorityQueue [pq]
  clojure.lang.ISeq
  ;; return a seq of the rest of the elements,
  ;; return nil if no elements remain.
  (next [_] (let [pq-rest (pq-delete-min pq)]
              (when-not (empty-heap? pq-rest)
                (LeftistHeapPriorityQueue. pq-rest))))

  ;; return the first element.
  (first [_] (pq-find-min pq))

  ;; return the sequence with the first item removed,
  ;; or an empty sequence if no items remain.
  (more [this] (or (next this) '()))`

  ;; return a new sequence with `obj` prefixed,
  ;; invoked via the global `conj`.
  (cons [_ obj] (LeftistHeapPriorityQueue. (pq-insert obj pq)))

  clojure.lang.Seqable
  (seq [this] (when (.next this) this)))

The helper function pq-insert-all used by cons is,


(defn pq-insert-all
  "Insert all elements from the sequence `objs` into the
   priority queue `pq`."
  [objs pq]
  (loop [xs objs, pq pq]
    (if (seq xs)
      (recur (rest xs) (pq-insert (first xs) pq))
      pq)))

Exercise 2. The pq-insert-all function runs in O(n log n) time. Is a more efficient (as in O(n)) alternative implementation possible?

To make it convenient to create priority queues from a sequence of values, let us define a "constructor" of LeftistHeapPriorityQueues:


(defn priority-queue
  ([objs]
   (LeftistHeapPriorityQueue. (pq-insert-all objs empty-heap)))
  ([]
   (priority-queue [])))

Finally, by implementing the print-method multimethod, we can let the REPL know how to display a LeftistHeapPriorityQueue literal:


(defmethod print-method LeftistHeapPriorityQueue
  [tree ^java.io.Writer w]
  (.write w (str "#lhpq<" (.first tree) " ...>")))

Before we say goodbye, here is a REPL session with our new sequence type:

    
clojure> (def a (priority-queue [1 123 12 45]))
clojure> (seq a)
;; #lhpq<1 ...>
clojure> (seq? a)
;; true
clojure> (first a)
;; 1
clojure> (def a (rest a))
clojure> (first a)
;; 12
clojure> (def a (conj a 50))
clojure> (first (rest a))
;; 45
clojure> (first (rest (rest a)))
;; 50

Downloads

The full source code for this post.

References

The original implementation of the leftist-heap priority queue can be found in Chapter 3 of Chris Okasaki's Purely Functional Data Structures.
Inside Clojure/Sequences.