Tuesday, April 14, 2015

Python Idiom: Collection Pipeline


A common implementation involves calling a set of functions sequentially with the results of the previous call be passed to the subsequent call.

from math import sqrt, ceil
def transform(value):
   x = float(value)
   x = int(ceil(x))
   x = pow(x, 2)
   x = sqrt(x)
   return x

This is less than ideal because it's verbose and the explicit variable assignment seems unnecessary.  However, the inline representation may be a little tough to read, especially if you have longer names, or different fixed arguments.

from math import sqrt, ceil
def transform(value):
   return sqrt(pow(int(ceil(float(value))), 2))

The other limitation is that the sequence of commands is hard coded.  I have to create a function for each variant I may have.  However, I may have a need for the ability to compose the sequence dynamically.

One alternative is to use a functional idiom to compose all the functions together into a new function.  This new function represent the pipeline the previous set of functions ran the value through.  The benefits are that we extract the functions into their own data structure (in this case a tuple). Each element represents a step in the pipeline.  You can also build up the sequence dynamically should that be a need.

Here we use foldl aka reduce and some lambda's to create the pipeline from the sequence of functions.

fn_sequence=(float, ceil, int, lambda x: pow(x, 2), sqrt)
transform = reduce(lambda a, b: lambda x: b(a(x)), fn_sequence)
return transform('2.1') # => 3.0


Now I have a convenience function that represents the pipeline of functions.  We can extrapolate this type of pipeline solution for more complex and/or more dynamic pipelines, limited only by the sequence of commands.  The unfortunate cost to this idiom is the additional n-1 function calls created by the reduce when composing the sequence of functions together.


§ 

Friday, March 6, 2015

Python: unittest setUp and tearDown with a ContextManager

Python unittest follows the jUnit structure, but is extremely awkward.  One of the more awkward portions are the use of setUp and tearDown methods.  Python has an elegant way of handling setup and teardown, it's called a ContextManager.  So let's add it.



import unittest
from functools import wraps
from contextlib import contextmanager

def addContextHandler(fn, ctx):
    @wraps(fn)
    def helper(self, *a, **kw):
        if not hasattr(self, ctx):
            return fn(self, *a, **kw)

        with getattr(self, ctx)():
            return fn(self, *a, **kw)

    return helper

unittest.TestCase.run = addContextHandler(unittest.TestCase.run, 'contextTest')

class TestOne(unittest.TestCase):

    @contextmanager
    def contextTest(self):
        print "starting context"
        yield
        print "ending context"

    def testA(self):
        print "testA"
        self.assertTrue(True)

    def testB(self):
        print "testB"
        self.assertTrue(True)

    def testC(self):
        print "testC"
        self.assertTrue(True)
        
if __name__ == "__main__":
    unittest.main()


Or if you want to play nice with unittest.TestCase and not modify directly you can subclass it.


import unittest

class MyTestCase(unittest.TestCase):
    ctx = 'contextTest'

    def run(self, *a, **kw):
        if not hasattr(self, self.ctx):
            return super(MyTestCase, self).run(*a, **kw)

        with getattr(self, self.ctx)():
            return super(MyTestCase, self).run(*a, **kw)
     
§

Thursday, September 4, 2014

Python Idiom: First Occurence

Finding the first occurrence in a collection of data is a common problem. 
 

# Non Idiomatic
found_line = None
for line in logfile:
   if regex.match(line):
      found_line = line
      break
return found_line

Compared to

# Idiomatic
return next(line for line in logfile if regex.match(line), None)


or

# Idiomatic (thanks to Suresh V)
from itertools import dropwhile
return next(dropwhile(lambda x: not regex.match(x), logfile), None)


The idiomatic solution is not only more compact, but it reads better.   It also gives the interpreter the opportunity to be more efficient in how it allocates memory due to the generator expression

§ 

Saturday, August 30, 2014

Singletons Reconsidered

TL;DR

Don't make it a global, use it only for stateful resources, and don't use them if you can't implement them properly due to language or ability. Add management controls to the interface so that you can control the behavior of the Singleton in cases like testing, debugging or resetting.

Introduction

 Everyone by now knows arguments.

Testability

The typical complaint is that singletons are global and that makes them hard to test and be in tests.  In most languages we can address those issues directly.

  1. Don't make the Singleton global, make it scoped to the Singleton class or module.
  2. Support management controls like a reset or clear method.
There is no reason to make a Singleton global. You should be able to import the class that will return the Singleton. Ideally you make the Singleton truly instantiate with the first constructor call. Any other constructor call would just be returning the already constructed object.  For all usages it becomes just another constructor call that happens to return the same object.

The Singleton should persist state, which does make it harder to test. However, if you add management controls then the Singleton poses no testing problems.  If you add a reset or destroy to the Singleton class is completely testable.

Hidden Dependencies

If it's no longer a global that means you have an explicit import or include.  It's inclusion is no longer assumed, and as a result you know if a given module uses the Singleton because it has the import.  The dependencies are no longer hidden, they are explicit and clear.

Violates the Single Responsibility Principle

No it doesn't.  At it's core SRP refers to cohesion and coupling.  Two things that aren't cohesive should not be coupled together because changes in one should not impact the other.  However if they are in the same class you have coupled them together so when either responsibility changes the entire class has to change as well.  This is tight coupling.

This has nothing to do with an object being a Singleton unless it is somehow exporting it's ability to be a Singleton (like a meta class, mixin or template class might).  Being a Singleton is a property of the class, that doesn't mean the behavior is primary, ie. The intent of the class is not to provide Singleton behavior out to other objects. Since the Singleton behavior is encapsulated and not exposed SRP remains intact.

Doesn't Work Right in Language X

Yeah, well that's self explanatory.  Don't use language X or if you have to use language X then don't use Singletons. 

Threading

Now that is a real argument.  Yes Singletons can suck in a threaded application unless the Singleton has  semaphores or mutexes to create the appropriate critical sections.  Yes it's hard to get right, and you may not know you didn't get it right until that weird bug happens in production. HOWEVER, that is an ongoing risk of threaded programming regardless of Singleton usage.  Singletons might make it a little more likely you screw it up, but it's not going to be in some novel way.

This risk is also completely mitigated in the case of a read only Singleton, such as a Config object.

Singletons Done Right IMO

Okay, so I'm not a hotshot programmer.  I consider myself a decent bordering on good programmer.  With all those caveats upfront,  here is how I do Singletons.

Override Instantiation to return the same instance always, or the same instance given the constructor arguments as a unique key.

Make the actual instantiation of the Singleton lazy. So it just does the right thing regardless of actually creating the object the first time underneath the covers, or simply returning the same object that already exists.

Always provide an explicit reset or destroy for the Singleton to facilitate testing.


Sunday, April 20, 2014

Creating a local email archive with: offlineimap and procmail

I synchronize my imap folders to maildir on my local laptop often so I can both have access to my email without a network and utilize my preferred search and email clients.  In order to facilitate how I use email I keep a local archive which created and filtered by procmail.

Here is an approximation of my crontab (cron doesn't start a shell, so I put most of the commands in a script):

% crontab -l
HOME=/home/myhome
MAIL=$HOME/maildir
PROCMAILD=$HOME/.procmail.d
0-59/5 9-18 * * * $HOME/bin/syncemail 


Here is the syncemail script:

#!/bin/sh

offlineimap 2>&1 | logger -t offlineimap

for i in `find $MAIL/Disney -type f -newer $PROCMAILD/log `; do
  cat "$i" | procmail
done
[


and here are the relevant portions of my .procmailrc:

PMDIR=$HOME/.procmail.d
VERBOSE=off
MAILDIR=$HOME/maildir
DEFAULT=$MAILDIR/mbox
LOGFILE=$PMDIR/log
LOGABSTRACT=all
ARCHIVEBY=`date +%Y-%m`
ARCHIVE=$MAILDIR/archives/$ARCHIVEBY
MKARCHIVE=`test -d ${ARCHIVE} || mkdir -p ${ARCHIVE}`

# Prevent duplicates
:0Wh: $PMDIR/msgid.lock
| /usr/bin/formail -D 100000 $PMDIR/msgid.cache

:0c
${ARCHIVE}/


§

Sunday, March 23, 2014

REST: POST vs PUT for Resource Creation

Questions often come up about whether to use PUT or POST for creating resources in REST APIs.

I've found both are appropriate in different situations.

PUT

PUT is best used when the client is providing the resource id.
PUT https://.../v1/resource/<id>
Per spec PUT is for storing the enclosed entity "under the supplied Request-URI".  This makes it the ideal HTTP method for use when creating or "storing" a resource.  Only when all the requirements for PUT can't be met should POST be considered.   The perfect example of when the client cannot provide the resource id.


POST

POST is best used when the client doesn't know the resource id a priori.
POST https://.../v1/resource
POST shouldn't be the first choice for resource creation is because  it's really more of a catchall method.
"The actual function performed by the POST method is determined by the server and is usually dependent on the Request-URI."
It doesn't require anything be created, or made available for later.
"A successful POST does not require that the entity be created as a resource on the origin server or made accessible for future reference. That is, the action performed by the POST method might not result in a resource that can be identified by a URI."
 § 

Wednesday, February 12, 2014

Python: Aggregating Multiple Context Managers

If you make use of context managers you'll eventually run into a situation where you're nesting a number of them in a single with statement.  It can be somewhat unwieldy from a readability point of view to put everything on one line:

with contextmanager1, contextmanager2, contextmanager3, contextmanager4:
    pass


and while you can break it up on multiple lines:

with contextmanager1, \
           contextmanager2, \
           contextmanager3, \
           contextmanager4:
    pass


sometimes that still isn't very readable.  This is more of a problem if you're using the same set of context managers in a number of places.  Ideally you should be able to put the context managers in a variable and use that with however many with statements need them:

handlers = (contextmanager1, contextmanager2, contextmanager3, contextmanager4)
with handlers:
    pass


Of course this doesn't work because handlers is a tuple, not a context manager. This will cause with to throw a exception.  What you can do is create a context manager that aggregates other context managers:

from contextlib import contextmanager
import sys

@contextmanager
def aggregate(handlers):
    for handler in handlers:
        handler.__enter__()
 
    err = None
    exc_info = (None, None, None)
    try:
        yield
    except Exception as err:
        exc_info = sys.exc_info()

    # exc_info get's passed to each subsequent handler.__exit__
    # unless one of them suppresses the exception by returning True
    for handler in reversed(handlers):
        if handler.__exit__(*exc_info):
            err = False
            exc_info = (None, None, None)
    if err:
        raise err

So now you can aggregate all the context managers into one and use that one in the with statement:

handlers = (contextmanager1, contextmanager2, contextmanager3, contextmanager4)
with aggregate(handlers):
    pass


You can build up the list of context managers however you want and use aggregate when using them in a with statement.

§ 

Friday, January 17, 2014

Python Metaprogramming: A Brief Decorator Explanation

A brief explanation on how to think about Python decorators.  Given the following decorator definition:


def decorator(fn):
    def replacement(*a, **kw):
        ...
    return replacement

This usage of the decorator

@decorator
def fn():
    return

is functionaly equivalent to

def fn():
    return
fn = decorator(fn)

Note that fn is not being executed.  Instead decorator is being passed the callable object fn, and is in turn returning a callable object replacement which is then bound to the name fn.  Whether or not the original callable ever gets called is up to decorator and the replacement callable.

Another thing to consider, which often causes people problems, is the timing of the decorator's execution, which is to say during the loading of the module.  If you want to execute a particular piece of logic during fn's call, then that logic needs to be placed in the replacement callable, not in the decorator.

So now that everything is clear it's obvious that


@decorator
@make_decorator(args)
def fn():
    return


Is really just

def fn():
    return
fn = decorator(make_decorator(args)(fn))


Which means the first decorator in a stack is the last to be evaluated.

§

Wednesday, December 25, 2013

Development Server: Automatic Reload on Code Change

There are actually many ways of automatically reloading code when it is modified.  Some are platform/language specific and some are not, although they do depend on certain common behaviors.  This is one I'm using to develop my Python/Gunicorn application and it isn't specific to Python or Gunicorn; however it does require that you have inotify-tools installed, your server can run in the foreground and that it reloads the project when it receives a SIGHUP signal.

wrapper:

#!/bin/sh

SERVER=$1
WATCHED_DIR=.
$SERVER &
RUNNING_PID=$!

trap 'kill -TERM $RUNNING_PID 2> /dev/null; exit' SIGINT

while /bin/true; do
 echo "Starting '$SERVER'..."
 inotifywait -q --exclude '.*\.py[co]$' \
           -e modify -e close_write -e move \
           -e create -e delete \
           -r $WATCHED_DIR
 kill -HUP $RUNNING_PID
 echo "Hupping '$SERVER'..."
done



You can then call it like this:

wrapper 'gunicorn project:main'

This will watch the current directory your in '.' and anytime a modification, creation, deletion or move occurs on any file in the current directory, inotify will issue the notification and stop waiting.  This will cause kill to send a SIGHUP to the the server forcing it to reload the project.  inotify will then wait on the next filesystem event.

 

Variants/Alternatives

 

There are a few variations on this theme which are fairly simple.

  • If you must kill and then restart the process in order to reload you can move the execution of $SERVER & into the while loop.  You should also change the HUP to a TERM in this case to make sure the process is terminated.
  • If you need to reload when anything changes in multiple directories you can just append the full list to inotifywait or generalize the wrapper and take the directories to watch as an argument.
  • If you want to or have to use something different from inotify-tools you can.  This same process should be usable by any of the file system event notification frameworks as long as the have a script that waits on events or allows you to write a script that waits on event.

§