A comparison of Peg-Thing written in Clojure, Python and functional Python
2019-08-19
Currently, I am learning Clojure reading the great (and free) book Clojure for the brave and true. In chapter 5, the game Peg-Thing is implemented. I thought this would be a good opportunity to compare Python and Clojure. I reimplemented the game in Python two times. First in the 'normal' Python way and a second time in a way that seemed to me like the ideal functional implementation in Python.
Peg-Thing, for those that don't know, is a single player board game. The board has a triangular shape, is missing one peg in the beginning. The following images were ruthlessly stolen from the aforementioned book:
To make a move, you take one peg and move it over another peg to a free position, and remove the peg that was passed over:
The goal is to remove as many pegs as possible before the board reaches a state without valid moves, in which case the game is over.
All implementations utilize the series of triangular numbers, which is defined as follows:
So the series begins with 1, 3, 6, 10, 15, ...
. These numbers represent the
number of the last field in the nth row of a triangle:
Before we start looking at Clojure code, I will provide a little crash course on how to read it.
In Clojure, a function call is written as a list where the first element is the
function name and the remaining elements are its arguments. For example, calling
a function named foo
without arguments would look like this: foo()
in Python
and like this: (foo)
in Clojure. This is the only way to invoke a function in
Clojure, therefore foo(x + 1)
would be (foo (+ x 1))
in Clojure. But
actually a clojure programmer would much rather write (foo (inc x))
as inc
is short for "increment" and returns one plus its argument.
Functions are defined via defn
like this:
(defn <function-name> [<args>]
<Body>)
Functions can be overloaded in which case a function definition looks like this:
(defn <function-name>
([<args1>] <Body1>)
([<args2>] <Body2>))
Anonymous functions can be defined via fn
like this: (fn [<args>] <Body>)
and can be overloaded too. A shortcut to define an anonymous function is
#(<Body>)
. In the second version the %-sign can be used to refer to the
argument if it is the only one or %<n>
for the nth argument if there are
more.
To assign a value to a symbol in the global namespace you can use def
e.g.
(def my-val 1)
or (def my-func (fn [] (println "Hello")))
. def
always
creates a symbol in the global scope, even if called from within a function. To
create a local variable let
is used like this:
(let [<name1> <value1>
<name-n> <value-n>]
<Body>)
The symbols defined by let
are only valid within its body.
When I first started writing this article I wanted to present every version one
by one and then compare them afterwards. I had to realize that this would make
one very long article and therefore decided to directly compare interesting
code snippets. You can download the code for all versions here. For
convenience, I packed together some functional utility functions from other
libraries with Toolz and added some aliases and imports I commonly use. Now I
can use from toolzEx import *
and have everything at my fingertips and it's
used in the functional Python version. In case you want to run them yourself,
you can find it here.
First, we need to generate the triangular numbers. In Clojure it's done like this:
(defn tri*
([] (tri* 0 1))
([c-sum n]
(let [new-sum (+ c-sum n)]
(cons new-sum (lazy-seq (tri* new-sum (inc n))))))
The first line says "define a function named tri*" (Clojure is very generous
with allowed characters in symbol names). This function has two versions: one
without arguments, which is defined in the second line, and one with two
arguments which is defined from the third line on. The first version does
nothing but calling the second version with the arguments 0
and 1
.
The second version is a little mind-bending - first, I'll quickly explain the
elements it consists of:
So calling tri*
without arguments will return a lazy sequence which contains
the already computed value for the first element and the recipe to compute
the rest.
This sequence is then stored with the name tri
like this:
(def tri (tri*))
Then, there are a few utility functions to work with the sequence. I am not going to explain them in depth (if you are interested in that, you can find more details in the book ), but the comments will explain what they do:
; Check whether n is triangular
(defn triangular? [n]
(= n (last (take-while #(<= % n) tri))))
; Get the nth triangular number, i.e. the number of the nth row.
(defn row-tri [n]
(last (take n tri)))
; Find the row of pos by checking the index of the first tri-num
; that is greater than pos
(defn row-num [pos]
(inc (count (take-while #(> pos %) tri))))
This has a certain elegance to it as at any time any number can be requested and only the minimally required amount of computation will take place.
In Python, we could use a generator for this:
def tri_num_gen():
acc = 0
n = 1
while True:
yield acc
acc += n
n += 1
tri_nums = [tri for tri, i in zip(tri_num_gen(), range(26))
def row_num(x):
return next(i for i, tri in enumerate(tri_nums)
if tri >= x)
This approach is much more blunt. We simply generate the first 26 numbers and
hope that they will suffice. On the other hand it has some advantages. We don't
need to define a function to check whether n
is a triangular number as we can
use n in tri_nums
and we don't need an equivalent for row-tri
because we can
use tri_nums[n]
.
The only thing that differs in the functional Python version is:
tri_nums = ltake(26, tri_num_gen())
def row_num(x):
return first(i for i, tri in enumerate(tri_nums)
if tri >= x)
take
and first
are from Toolz,
and ltake(...)
is short for list(take(...))
. I think this version is
a little nicer to read, but it's basically the same.
The Clojure community follows the motto "it is better to have 100 functions operate on one data structure than 10 functions on 10 data structures". And consequently the Clojure implementation uses nested maps (or dicts in Python lingo) to represent the board. The data structure looks like this:
{1 {:pegged true, :connections {6 3, 4 2}},
2 {:pegged true, :connections {9 5, 7 4}},
3 {:pegged true, :connections {10 6, 8 5}},
4 {:pegged true, :connections {13 8, 11 7, 6 5, 1 2}},
5 {:pegged true, :connections {14 9, 12 8}},
6 {:pegged true, :connections {15 10, 13 9, 4 5, 1 3}},
7 {:pegged true, :connections {9 8, 2 4}},
8 {:pegged true, :connections {10 9, 3 5}},
9 {:pegged true, :connections {7 8, 2 5}},
10 {:pegged true, :connections {8 9, 3 6}},
11 {:pegged true, :connections {13 12, 4 7}},
12 {:pegged true, :connections {14 13, 5 8}},
13 {:pegged true, :connections {15 14, 11 12, 6 9, 4 8}},
14 {:pegged true, :connections {12 13, 5 9}},
15 {:pegged true, :connections {13 14, 6 10}},
:rows 5}
as opposed to Python, no colon is needed between keys and values. Also, commas are not required to separate key value pairs, but can be used since they are treated as white space. Just imagine the words starting with a colon were strings. In this map, the positions are used as keys. For every position, we store whether it is pegged, and the target fields that you can jump to together with the fields you jump over to get there. For example: from field 1 you can jump to 6 by hopping over 3 and to 4 by hopping over 2.
But we don't define the board statically. We compute it by the number of rows it should have:
(defn new-board [rows]
(let [max-pos (row-tri rows)]
(reduce (fn [board pos] (add-pos board max-pos pos))
{:rows rows}
(range 1 (inc max-pos)))))
it reduces the possible positions over add-pos
with {:rows rows}
as initial
value. In case you don't know how reduce
works, you can find an explanation
here.
Then add-pos
will add all valid connections:
(defn add-pos [board max-pos pos]
(-> (assoc-in board [pos :pegged] true)
(connect-right max-pos pos)
(connect-down-left max-pos pos)
(connect-down-right max-pos pos)))
assoc-in
will return a new map where the key-path [pos :pegged]
is set to
true. The ->
macro will then take the output of that and provide it as
leftmost input to the connect-right
function. So the line with connect-right
becomes (connect-right (assoc-in board [pos :pegged] true) max-pos pos)
,
the result of that passes as leftmost argument to
connect-down-left
and the result of that to connect-down-right
. The last
result will then be returned. The connection functions will always create two
connections, the one they are named after, and the inverse one. I think this is
a pretty elegant approach as it defines a pipeline of computations that the
initial board "flows" through.
The three connect functions are all pretty similar and I will only show you one of them:
(defn connect-down-left [board max-pos pos]
(let [row (row-num pos)
neighbor (+ row pos)
destination (+ 1 row neighbor)]
(connect board max-pos pos neighbor destination)))
First, row
is defined and then used to get the left down neighbor (in row n,
you have to add exactly n to get there) and then this is repeated to get the
destination. Finally connect
is called to create the connection.
connect
looks like this:
(defn connect [board max-pos from over to]
(if (> to max-pos)
board
(reduce (fn [nb [from to]]
(assoc-in nb [from :connections to] over))
board
[[from to] [to from]])))
If to
(the destination) is greater than max-pos
it would not be on the
board, therefore nothing is done, and the original board is returned. This way
invalid connections are ignored. The anonymous function that is used in reduce
will simply add a the connection, first with the connection ([from to]
) then
with the inverse connection ([to from]
).
In this version, I implemented the board as a class that has a list of fields. A field is a class itself, containing a variable to note whether it is pegged, its own id and a list of possible connections which are named tuples:
Connection = namedtuple('Connection', 'start, to, by')
@dataclass
class Field:
id: int
pegged: bool
connections: List[Connection]
class Board:
def __init__(self, rows):
self.fields = [Field(i, True, []) for i in range(tri_nums[rows] + 1)]
self.rows = rows
self.score = 0
for f in self.fields:
Board._connect(f, self.right_of)
Board._connect(f, self.down_left_of)
Board._connect(f, self.down_right_of)
In vanilla Python, one would use a mutable object here, so I first created the
board with all its fields and then added the connections via the static method
_connect
, which looks like this:
@staticmethod
def _connect(field, next_field):
n = next_field(field)
nn = next_field(n) if n else None
if n and nn:
field.connections.append(Connection(field, nn, n))
nn.connections.append(Connection(nn, field, n))
This time, the targets of the connections are no integers but the actual field
objects. One of the next_field
methods looks like this:
def right_of(self, field):
if field.id not in tri_nums:
return save_get(self.fields, field.id + 1)
The nice thing about this is that it is much easier to understand than the clojure version. Another advantage will become apparent when we take a look at the other methods. Nevertheless this is mutable, and I really have developed a strong appreciation for immutability which tends to prevent a lot of nasty bugs.
Actually, I implemented this version twice, the first time mimicking the clojure implementation, but found that I prefer object notation, which I find easier to read, so I did it a second time, which is the version I'll show here, though I did include the first one in the zip.
I wanted something closer to the vanilla Python version. For that I used
the library Pyrsistent. It provides immutable equivalents for the standard
data structures as well as means to create immutable classes. I used
their PRecord
class which allows to make a class that is basically an
immutable dict with key access like a namedtuple:
Connection = namedtuple('Connection', 'start, end, by')
class Position(PRecord):
pegged = field(bool, initial=True)
connections = pvector_field(Connection)
class Board(PRecord):
rows = field(int)
positions = pmap_field(int, Position)
So far this looks pretty similar to the vanilla version, the difference becomes
apparent if we look at the initialization which I moved to a static method as I
didn't want to mess with the default __init__
. A notable difference is that
here a pmap
is used to store the fields.
@curry
def connections(max_pos, dir_func, pos):
by = dir_func(pos)
end = dir_func(by) if by else None
return [Connection(pos, end, by),
Connection(end, pos, by)]\
if end is not None and end <= max_pos else [None]
class Board(PRecord):
rows = field(int)
positions = pmap_field(int, Position
@staticmethod
def new(rows):
cs = connections(tri_nums[rows])
con_map = thread_last(range(1, tri_nums[rows] + 1),
(mapcat, juxt(cs(right), cs(down_right), cs(down_left))),
concat,
(filter, bool),
(groupby, X.start))
return Board(rows=rows, positions=pmap(
valmap(lambda cons: Position(connections=pvector(cons)),
con_map)))
This code is a little harder to read again. I guess that's natural as going for
immutability is a restriction. Let's look at the new
method.
thread_last
is the same as Clojure's ->
macro
except for the fact that the result from one call is provided as the last argument
to the next call (which makes it completely identical to Clojure's ->>
).
The logic here is as follows: We can't create the fields first and add the
connections afterwards as the fields are immutable, so we have to add the
connections to the fields when they are created. Therefore the connections are
generated first by the connections
function. It is curried which means that
calling it with incomplete parameters will return the function with the provided
parameters already set (i.e.
connections(x, y, z) == connections(x)(y,z) == connections(x)(y)(z)
).
juxt
returns a function that applies all provided functions (the three connect
versions) to its parameter, and returns the results in a list. mapcat
is a
normal map with a concatenate afterwards, so instead of a list of lists of lists
of connections only a list of lists of connections will be returned.
This is then concatenated again, so that we are left with a list of connections
and Nones for those cases where the requested connection was invalid. These are
then filtered out in the next step, where every entry in the list that evaluates
to False
(i.e. the None
s) will be filtered out. The X
in X.start
is just a
rename of fn's underscore, and returns a
function that accesses the start
attribute of its argument. Consequently the
list of connections is transformed into a dict where all connections that start
from the same positions are found under that position as key.
This dict is then transformed into a dict with the same keys where the values are
Position
objects (by valmap
), which is then assigned to the Board
that is
returned by the new method.
For completeness, this is the right
function:
def right(pos):
return pos + 1 if not pos in tri_nums else None
In Clojure the valid-move?
function looks like this:
(defn valid-move? [board from to]
(get (valid-moves board from) to))
It queries all valid moves for the position where move starts, and tries to get
the key with the target position from the resulting dict. If the key exists, an
integer is returned, which evaluates to true (in clojure everything
evaluates to true
except for nil
and false
). valid-moves
looks like
this:
(defn valid-moves [board pos]
(into {}
(filter (fn [[destination jumped]]
(and (not (pegged? board destination))
(pegged? board jumped)))
(get-in board [pos :connections]))))
The point where the computation starts is (get-in board [pos :connections])
which returns the connection map for the given position. In Clojure, iterating a
dict will automatically yield the key value pairs (like dict.items
in Python).
These pairs are then filtered by checking whether the destination position is
unpegged and the jumped position is pegged. The results are then integrated into
a map again with into
. pegged?
is defined as follows:
(defn pegged? [board pos] (get-in board [pos :pegged]))
Logic-wise I really like this implementation, but having to start to read at the
bottom of a valid-moves
feels a little strange. That might be due to lacking
familiarity though.
This version is pretty straightforward:
def valid_connection(self, start, to):
connections = [c for c in self.fields[start].connections
if c.to.id == to]
if len(connections) == 0:
return False
c = connections[0]
if c.start.pegged and c.by.pegged and (not c.to.pegged):
return c
This is where we can really profit of the object notation, it allows for easy and very readable handling of board logic.
The structure of the programs is not completely identical as I wrote them with multiple day gaps, in the functional version the function looks like this:
def find_connection(self, start, end):
try:
con = first(con for con in self(start).connections
if con.end == end)
if self(con.start).pegged and self(con.by).pegged \
and not self(con.end).pegged:
return con
except StopIteration:
return None
I added a __call__
method to the class to directly access the fields by index.
I think this is easy to read as well as elegant. Of course it could have been
done like this in vanilla Python too, if the first
was replaced by a next
.
The main difference is how we access the pegged
fields. I like the vanilla
version in that regard much more, and I thought about how to do this in
this version for quite a while until I realized it is not possible. In the
vanilla version there is a reference circle, which can only be created if you
can manipulate an object after its creation. While I liked the notation for the
pegged
checks that it enables, this is a typical scenario where a more complex
app could have some nasty bugs. And making this mistake is impossible by using
functional idioms. Yay!
In clojure, applying a move to the board looks like this:
(defn make-move [board from to]
(if-let [jumped-pos ( valid-move? board from to )]
(-> (remove-peg board jumped-pos)
(move-peg from to))))
If the move is valid, the jumped peg is removed, and the jumping peg is moved.
This represents the game logic quite well.
Here are move-peg
and remove-peg
:
(defn remove-peg [board pos] (assoc-in board [pos :pegged] false))
(defn place-peg [board pos] (assoc-in board [pos :pegged] true))
(defn move-peg [board from to] (place-peg (remove-peg board from) to))
In the Python version I made sure that only valid moves would be passed to this function, therefore there is no check for validity here.
def update(self, move):
move.start.pegged = False
move.by.pegged = False
move.to.pegged = True
self.score += 1
move
is a connection object as returned by valid_connection
. And we can profit
again from that pesky mutability.
The functional version looks like this:
def make_move(self, start, end, by):
return self.toggle(start).toggle(end).toggle(by)
I like this version the most by far. Each toggle returns a new version of the board,
with the applied change, so we can chain the operations with a dot. It is easy
to read and immutable, jackpot! Here is the toggle
method:
def toggle(self, pos):
return self.transform(['positions', pos, 'pegged'], op.not_)
transform
is defined by PRecord
and as assoc-in
takes a 'path' to the value.
Unlike assoc-in
it takes a function by which the value is changed instead of
taking a new value.
In clojure the main logic is distributed along the IO-functions:
prompt-rows
asks for how many rows the board should have, then creates
the board and calls prompt-empty-peg
which asks which peg to remove, and
removes it. prompt-empty-peg
then calls prompt-move
to ask the player for
a move. Depending on whether the move was valid or not prompt-move
will call
user-entered-invalid-move
, or prompt-move
again, or possibly game-over
.
This is a lot of boring IO-code, and I will not show it here. I don't like this
design as it could (even in clojure) blow up the stack if someone decides to
play enough rounds (which is in the millions and completely unrealistic, but
still).
The normal python version looks like this:
def main():
size = query_size()
board = Board(size)
start_pos = query_valid_start_pos(board)
board.remove_peg(start_pos)
while board.valid_move_exists():
print('This is your board:')
print(board)
move = query_valid_move(board)
board.update(move)
print('This is your final board state:')
print(board)
print('You removed', board.score, 'pegs')
and the functional Python version looks like that:
@trampoline
def play(board):
if not board.valid_move_exists():
return board
print('\nThis is your board:')
print(board)
move = query_valid_move(board)
return R(board.make_move(*move))
def main():
size = query_size()
init_board = Board.new(size)
print(init_board)
start_pos = query_valid_start_pos(init_board)
final_board = play(init_board.toggle(start_pos))
print('This is your final board state:')
print(final_board)
print('You removed', final_board.missing_pegs() - 1, 'pegs')
To avoid mutability I replaced the while loop with a recursive function, the
trampoline
is stolen from fn and allows the use of recursion without
blowing up the stack. Returning R
means "call me again, and use these
parameters".
It's safe to say that the mutable python code is the easiest to read, while the clojure code is the hardest (at least for a Python programmer). Using functional idioms in python is comfortably possible, and makes it easier to avoid bugs by using libraries like Toolz, Fn and Pyrsistent. In Python the functional versions will run a little slower but in most scenarios that does not matter, Clojure on the other hand will probably run faster (I didn't run any benchmarks). In terms of size they are all pretty similar: the clojure file has 142 lines, the vanilla version 121 and the functional version has 124. That being said, I've put way more effort into the functional version than into the vanilla one, and it could certainly be shorter. The same goes for the clojure version, which I assume was optimized by the author of Clojure for the brave and true to be understandable and not to be short or efficient. While it still seems a little strange I am definitely looking forward to learn more Clojure. Again, you can download the source here.
Thanks for reading!