Tuesday, April 14, 2015

Python Idiom: Collection Pipeline

A common implementation involves calling a set of functions sequentially with the results of the previous call be passed to the subsequent call.

from math import sqrt, ceil
def transform(value):
   x = float(value)
   x = int(ceil(x))
   x = pow(x, 2)
   x = sqrt(x)
   return x

This is less than ideal because it's verbose and the explicit variable assignment seems unnecessary.  However, the inline representation may be a little tough to read, especially if you have longer names, or different fixed arguments.

from math import sqrt, ceil
def transform(value):
   return sqrt(pow(int(ceil(float(value))), 2))

The other limitation is that the sequence of commands is hard coded.  I have to create a function for each variant I may have.  However, I may have a need for the ability to compose the sequence dynamically.

One alternative is to use a functional idiom to compose all the functions together into a new function.  This new function represent the pipeline the previous set of functions ran the value through.  The benefits are that we extract the functions into their own data structure (in this case a tuple). Each element represents a step in the pipeline.  You can also build up the sequence dynamically should that be a need.

Here we use foldl aka reduce and some lambda's to create the pipeline from the sequence of functions.

fn_sequence=(float, ceil, int, lambda x: pow(x, 2), sqrt)
transform = reduce(lambda a, b: lambda x: b(a(x)), fn_sequence)
return transform('2.1') # => 3.0

Now I have a convenience function that represents the pipeline of functions.  We can extrapolate this type of pipeline solution for more complex and/or more dynamic pipelines, limited only by the sequence of commands.  

The unfortunate cost to this idiom is the additional n-1 function calls created by the use of lambdas when composing the sequence of functions together.  Given this cost, and the cost of function calls in Python is would probably be better to use this in cases where there will be additional reuse of intermediate or final forms of the composition.



  1. I can't say that I find the convenience function anything less than unreadable. The first version is, perhaps, verbose, but it is crystal clear in both the order of operations and the operations themselves. The second version requires something of a reverse-Polish notation parsing, but the third is just ... a mess. Imagine trying to maintain that. It is a decent example of the utility of reduce and lamba functions, but it would never pass code review and even as a personal snippit I'd keep the first version as a block comment above the third, assuming you wanted to keep it around as a reference.

  2. It doesn't seem to me to be a "mess". However, I think it may depend on your familiarity and experience with programming is mixed paradigms, especially functional programming. Pipelines are well known functional programming because of their similarity to function compositions. It may also not be the correct tool for the job. It is simply another way of organizing code that has a different set of costs and benefits different from the more straightforward, albeit verbose and hard coded solution.