Thursday, January 16, 2014

99 Clojure Problems – 7: Flatten a Nested List Structure

Example:

(deftest p07-flatten
  (is (= '(1 1 2 3 5 8) (flatten (list (list 1 1) 2 (list 3 (list 5 8)))))))

Solution:

This was the first problem that forced me think a bit harder. It is marked with two asterisks in the original prolog version of the "challenge", meaning a skilled programmer should be able to solve the problem in 30-90 minutes.

It is also the first problem where I'm not happy with my solution. It feels quite imperative too me and its nested if-structures are not exactly easy on the eyes. I used a multiple arity function, which means the function accepts two (or more) different parameter lists. I did this to supply an initial value for an accumulator. Thinking about it now it seems cleaner to me to hide these implementation details completely from the user. This could be achieved by using a let binding or something similar inside the function without polluting the API. The following first "if" checks whether we are dealing with a sequence at all. If not we are just conjoining this non-sequential element with the accumulator. If the input is a sequence we need another "if" to check for an empty sequence which is used as the escape hatch in the recursion that follows: We thread the accumulator through two recursive calls of the function one for the head of the sequence, one for the rest.

Feeling a bit unsatisfied with what I had come up with I went looking for other people's solutions, again, something I had not done before. I found rodnaph github repo with a rather elegant solution based on reduce (I have included it with minor modifications in my code for easy comparison). Reduce saves one noisy if and leaves us with just one check to decide whether an element is a sequence or not. If it is we call the function recursively. But instead of joining individual elements as I did he just concats the resulting flattened sublist onto the accumulator.

Finally I had a quick look at Clojure's built-in flatten. It interprets the nested lists as a tree and uses its tree-seq function to turn this tree into a lazy sequence of its elements. Tree-seq is a higher-order function that takes a branch detector function as its first argument to decide when to descent into the depths of the given data structure. By using sequential? as the branch detector we follow the nested lists as we walk the tree. Tree-seq then conses all nodes together in a lazy sequence. We are not quite done with the resulting sequence because it contains lots of "duplicates" as each branch node also contains in our case all of its children—the root node of our tree for example is the whole nested input list. Therefore we have to filter out all sequences in the result thereby removing the branching nodes. We return the remaining leaves—our flattened list.

Read more about this series.

No comments: