Implementing a Custom Sequence Data Structure in Clojure

March 30, 2016

Tonight we’re partying with Clojure, and it’s Bring Your Own Data structure.

I was working on a problem on exercism.io recently, and I lost my mind a little bit. I learned from The Joy of Clojure that Clojure has some fantastic facilities for extending the language. You can create your own data types that work with lots of core functions (like map, reduce, conj, etc.). This problem seemed like a great chance to try that out!

So welcome to the party! Let’s get started.

Clojure Sequences

Anyone who has played with Clojure, even briefly, is sure to have encountered sequences. Try running map on a vector:

(map even? [1 2 3])         ;; => (false true false)
(class (map even? [1 2 3])) ;; => clojure.lang.LazySeq

You get a sequence back, instead of another vector! This is a great example of Clojure’s interface-oriented philosophy. Anything that implements the clojure.lang.ISeq interface is a sequence. All the basic collection types in Clojure (vectors, lists, hash-maps, sets, strings, etc) implement a function called seq that returns a sequence that is specialized to iterate over each particular collection type. So in our example, when we run map over the vector, we’re actually getting back a sequence that was generated by iterating over our vector.

This sequence interface is what drives the functions map, reduce, first, rest, conj, and more. Any type that implements the sequence interface can be used with any of these functions automatically. So we can let our Binary Search Tree type get in on that action by implementing ISeq. The interface consists of 7 functions:

cons - add an element to the sequence
empty - return a new, empty sequence of the same type
equiv - compare two sequences for equality
first - return the first item in the sequence
more - return the sequence with the first item removed, or an empty sequence if no items remain
next - return the sequence with the first item removed, or nil if no items remain
seq - return the sequence if it has any items, otherwise return nil

`defrecord` vs `deftype`

When you start looking into ways to implement your own types in Clojure, you quickly run across two possibilities that may seem very similar: defrecord and deftype. It was hard for me to suss out the difference, but here is what I’ve gleaned.

defrecord is great for putting some structure around hash-maps. Think SQL tables; each record in the table has the same columns and its own values. Clojure Records give you that same sort of semantic, with additional power as well. Records come with a lot of hash-map behavior out-of-the-box, so if you’re looking to put some guarantees around associative data structures, records might be just what you need.

deftype, on the other hand, seems better suited for defining your own basic data structures. Think linked lists, stacks, queues, etc. Types defined with deftype don’t give you any semantics (hash-map or otherwise) out-of-the-box. But that leaves things cleaner for building your own low-level data structures.

So for our binary search tree, we’ll use deftype.

The implementation

The syntax for deftype macro threw me at first, but apparently it’s a common pattern in Clojure (see also reify, defprotocol, Om’s defui). deftype essentially boils down to this: define arguments for constructing an instance of your type, and the interfaces with their functions your type implements.

So lets see it already! Here’s my implementation of the binary search tree:

(defprotocol IBinarySearchTree
  (left [this])
  (right [this])
  (value [this]))

(deftype Tree [root]
  IBinarySearchTree
  (left [this] (Tree. (:left root)))
  (right [this] (Tree. (:right root)))
  (value [this] (:value root))

  clojure.lang.ISeq
  (cons [this v]
    (letfn [(cons* [{:keys [left right value] :as node} v]
              (if node
                (if (> v value)
                  (assoc node :right (cons* right v))
                  (assoc node :left (cons* left v)))
                {:value v}))]
      (Tree. (cons* root v))))
  (empty [this] (Tree. nil))
  (equiv [this other] (= root (.root other)))
  (first [this]
    (letfn [(first* [{:keys [left value]}]
              (if left
                (recur left)
                value))]
      (first* root)))
  (more [this]
    (letfn [(more* [{:keys [value left right] :as node} path]
              (cond
                left (recur left (conj path :left))
                (seq path) (Tree. (assoc-in root path right)) ;; right may be nil
                :else (Tree. right)))]
      (more* root [])))
  (next [this] (seq (.more this)))
  (seq [this] (when (contains? root :value) this)))

I want to circle back to the algorithms, but first, let’s see how it works:

(def a-tree (Tree. nil)) ;; => #Tree{}
(value a-tree) ;; => nil

(def another-tree (conj a-tree 2 1 3)) ;; => #Tree{:value 2, :left {:value 1}, :right {:value 3}}

(.value another-tree) ;; => 2
(.left another-tree) ;; => #Tree{:value 1}
(.right another-tree) ;; => #Tree{:value 3}

(first another-tree) ;; => 1
(rest another-tree) ;; => #Tree{:value 2, :right {:value 3}}
(map inc another-tree) ;; => (2 3 4)
(reduce + another-tree) ;; => 6

So now the party’s started

The algorithm

Let me first disclaim that I know these implementations aren’t super optimal. For example, they use mundane recursion extensively. Please forgive me, because the point of this post is to show how to implement custom data structures, not how to implement binary search tree algorithms.

Ok, onto the implementation! Most of the functions are straight forward once you know what they’re supposed to do (documentation for all of this kinda sucks at the moment). There are really just two functions that aren’t trivial: cons and more, but before we dig into those I want to talk about the overall strategy.

The binary tree is implemented simply with nested hash-maps. There is a root node (given in the constructor) that has keys :value, :left, and :right. Traversal down the tree follows the :left and :right branches. nil indicates there is no branch in that direction.

`cons`

(cons [this v]
  (letfn [(cons* [{:keys [left right value] :as node} v]
            (if node
              (if (> v value)
                (assoc node :right (cons* right v))
                (assoc node :left (cons* left v)))
              {:value v}))]
    (Tree. (cons* root v))))

This function needs to add an element to our collection. For a binary search tree, that means we need to walk the nodes from the root, going left if the new value is smaller than the node’s value and right if it’s bigger. The function needs to return an instance of our data type. So we simply use recurse down through the tree until we find the right spot for the new value, and we assoc it in. Once our nested hash-map has been added to, we return a new tree with this new root node.

`more`

(more [this]
  (letfn [(more* [{:keys [value left right] :as node} path]
            (cond
              left (recur left (conj path :left))
              (seq path) (Tree. (assoc-in root path right)) ;; right may be nil
              :else (Tree. right)))]
    (more* root [])))

More needs to return an instance of our Tree with the first element removed. The strategy I went with is to go left down the tree until you can’t go any farther left. Now, you’ve found the smallest item in the tree. This is the element we want to remove for the next Tree. So if it has a :right branch, replace this smallest node with that branch. If it doesn’t have a :right branch, then replace this smallest node with nil.

Some practical tips

Override the print method for your type

I skipped this part in the implementation section above to stay on point, but overriding the print method for your data type is very useful. In fact, I got weird errors in the REPL before I did this:

bst=> (conj (Tree. nil) 1)
UnsupportedOperationException nth not supported on this type: Tree  clojure.lang.RT.nthFrom (RT.java:947)

For me, this error came from this line in the default printing function Clojure uses behind the scenes.

If we wanted to, we could implement the Indexed interface and define nth ourselves, but I don’t think it makes much sense to do so for a binary tree.

Instead, another approach we can take is to implement the print-method multimethod for our data type. For this type, I just went with printing the hash-map prefixed with the word “Tree”:

(defmethod print-method Tree [tree ^java.io.Writer w]
  (.write w (str "#Tree" (.root tree))))

Write a helper function to create instances more easily

Creating trees by using conj repeatedly is annoying and clumsy. Write little helper functions to make creating instances of your data type easier:

(defn tree [& items]
  (reduce conj (Tree. nil) items))

(tree 2 1 3) ;; => #Tree{:value 2 :left {:value 1} :right {:value 3}}

Take advantage of your closed over constructor arguments

Even though the syntax looks funny, the deftype macro is still a closure. This means you have full access to your type’s constructor arguments from within your function definitions! Take advantage of that. There’s no need to implement “getter”-style functions to read the data you were constructed with. Remember — objects are a poor man’s closures.

Speaking with a LISP

October, 24 2016

Reading code is like reading any language. With practice, even LISPs like Clojure can become second nature. Learn to read and write Clojure fluently.

Read

Scripting a Dev Environment with Tmux

February, 13 2016

Read

Implementing a Custom Sequence Data Structure in Clojure

Clojure Sequences

`defrecord` vs `deftype`

The implementation

The algorithm

`cons`

`more`

Some practical tips

Override the print method for your type

Write a helper function to create instances more easily

Take advantage of your closed over constructor arguments

Further reading

Categories

Other posts

Speaking with a LISP

Scripting a Dev Environment with Tmux

Implementing a Custom Sequence Data Structure in Clojure

Clojure Sequences

defrecord vs deftype

The implementation

The algorithm

cons

more

Some practical tips

Override the print method for your type

Write a helper function to create instances more easily

Take advantage of your closed over constructor arguments

Further reading

Categories

Other posts

Speaking with a LISP

Scripting a Dev Environment with Tmux

`defrecord` vs `deftype`

`cons`

`more`