Functionalish Programming
2018-11-03
There already is a myriad of blog posts on functional programming, with this post, I don't try to give an introduction to it, but I want to highlight, how we can get inspirations from it, to improve our code. This is why the post is called "functionalish" programming, instead of "functional" programming.
But before I can talk about what can be learned from it, it is necessary to talk about what functional programming actually is. Let's see what Wikipedia has to say:
In computer science, functional programming is a programming paradigm - a style of building the structure and elements of computer programs - that treats computation as the evaluation of mathematical functions and avoids changing-state and mutable data.
There are already two important principles in there:
The first one, avoiding state change, is a no brainer. Most of us learn very early that it is a bad idea to make a function depend on a global variable. There are some rare cases where that might be a good idea, but generally it's not. If a function is pure, that is its output only depends on its input, and it does not change any sort of global state, like a display for example, then it's easy to test and reliable. Great. But wait, if we aren't allowed to do any i/o because that's state change we cannot write any useful programs. So yeah, we should try to keep functions pure, but only where applicable.
I want to showcase two bugs that cost me a lot of time:
shuffeled_elems = random.shuffle(elems)
. This
function shuffles its argument, it does not return a shuffled version of it.
I was not aware of that, and the fact, that
this
is the first result when googling "python random shuffle" shows that I'm not
the only one. Since in python, a function without return value simply returns
None
there was no easily catchable error.Admittedly, both of those problems could have been prevented by reading the documentation more carefully, or at all in the second case, but avoiding this kind of behavior in your own functions will reduce the probability of errors, which is a good thing.
The second point, "Avoid mutable data", is more controversial. In C++ it would
translate to: only use const
variables, and in python it would translate to:
don't change a variable after its creation.
Of course, I don't want to tell you to not use lists anymore. Since python
lacks some important concepts that functional languages tend to have, like
e.g. tail recursion, it would really not be wise to ban all mutability from your
code. But still, I'd advice to use namedtuples or tuples whenever possible.
If I had used a tuple instead of a list, I would have caught that second bug
instantly.
Another important point in functional programming is to use functions as arguments, and return values. This does not come intuitively, but it can increase code readability a lot.
An example:
coords_of_interest = pipe(mat_acc_map.shape, iter_3D_vol_coords,
filter(lambda coord: mat_acc_map[coord] != 0),
tuple)
I guess two things need to be explained before this becomes clear:
pipe
is a function that takes anything as first argument, and then a bunch
of functions for its varargs. It then feeds the first argument as input to the
first function, the output of the first function as input to the second
function, and so on, and in the end it will return the return value of the last
function. (And in the above example, the input is a numpy array)filter
is curried, which means that if you call it with an incomplete set
of arguments, it will return a function which only takes the remaining
arguments, and remembers the one, that were already set. So if you have a
curried version of filter
you can define a function that takes an iterable
and only returns those elements that are even with
even_only = filter(lambda x: x % 2 == 0)
(By the way, pipe
and filter
are both defined in the awesome toolz library)
So, provided the knowledge over pipe
and curried functions, the source code
example above becomes quite easy to read: "the coordinates of interest are
those coordinates where the value in the array is not zero"
The Code above also has another advantage which does not immediately become obvious.
iter_3D_vol_coords
and filter
both return generators. Which means every
result is computed lazily, i.e. when it's queried. So, tuple
looks at the
first element, that filter
returns, then filter
looks at the first element,
that is returned by iter_3D_vol_coords
, if it is suitable it will be
returned, otherwise the next element will be tested. This avoids to first have
a complete list of coordinates, and then filter it. In this example the
difference isn't too big, but in a lot of scenarios it can reduce the memory
consumption by a lot. If you don't know generators, I recommend
this
answer on stackoverflow.
I want to conclude this post with a few rules of thumb, which I try to respect as much as possible:
pipe
and curry
(but it also has a lot of other
very convenient functions)Following these few rules has made my code safer, easier to read, easier to parallelize, and less RAM intensive.