Thursday, September 4, 2014

Python Idiom: First Occurence

Finding the first occurrence in a collection of data is a common problem. 

# Non Idiomatic
found_line = None
for line in logfile:
   if regex.match(line):
      found_line = line
return found_line

Compared to

# Idiomatic
return next(line for line in logfile if regex.match(line), None)


# Idiomatic (thanks to Suresh V)
from itertools import dropwhile
return next(dropwhile(lambda x: not regex.match(x), logfile), None)

The idiomatic solution is not only more compact, but it reads better.


Saturday, August 30, 2014

Singletons Reconsidered


Don't make it a global, use it only for stateful resources, and don't use them if you can't implement them properly due to language or ability. Add management controls to the interface so that you can control the behavior of the Singleton in cases like testing, debugging or resetting.


 Everyone by now knows the arguments.


The typical complaint is that singletons are global and that makes them hard to test and be in tests.  In most languages we can address those issues directly.

  1. Don't make the Singleton global, make it scoped to the Singleton class or module.
  2. Support management controls like a reset or clear method.
There is no reason to make a Singleton global. You should be able to import the class that will return the Singleton. Ideally you make the Singleton truly instantiate with the first constructor call. Any other constructor call would just be returning the already constructed object.  For all usages it becomes just another constructor call that happens to return the same object.

The Singleton should persist state, which does make it harder to test. However, if you add management controls then the Singleton poses no testing problems.  If you add a reset or destroy to the Singleton class is completely testable.

Hidden Dependencies

If it's no longer a global that means you have an explicit import or include.  It's inclusion is no longer assumed, and as a result you know if a given module uses the Singleton because it has the import.  The dependencies are no longer hidden, they are explicit and clear.

Violates the Single Responsibility Principle

No it doesn't.  At it's core SRP refers to cohesion and coupling.  Two things that aren't cohesive should not be coupled together because changes in one should not impact the other.  However if they are in the same class you have coupled them together so when either responsibility changes the entire class has to change as well.  This is tight coupling.

This has nothing to do with an object being a Singleton unless it is somehow exporting it's ability to be a Singleton (like a meta class, mixin or template class might).  Being a Singleton is a property of the class, that doesn't mean the behavior is primary, ie. The intent of the class is not to provide Singleton behavior out to other objects. Since the Singleton behavior is encapsulated and not exposed SRP remains intact.

Doesn't Work Right in Language X

Yeah, well that's self explanatory.  Don't use language X or if you have to use language X then don't use Singletons. 


Now that is a real argument.  Yes Singletons can suck in a threaded application unless the Singleton has  semaphores or mutexes to create the appropriate critical sections.  Yes it's hard to get right, and you may not know you didn't get it right until that weird bug happens in production. HOWEVER, that is an ongoing risk of threaded programming regardless of Singleton usage.  Singletons might make it a little more likely you screw it up, but it's not going to be in some novel way.

This risk is also completely mitigated in the case of a read only Singleton, such as a Config object.

Singletons Done Right IMO

Okay, so I'm not a hotshot programmer.  I consider myself a decent bordering on good programmer.  With all those caveats upfront,  here is how I do Singletons.

Override Instantiation to return the same instance always, or the same instance given the constructor arguments as a unique key.

Make the actual instantiation of the Singleton lazy. So it just does the right thing regardless of actually creating the object the first time underneath the covers, or simply returning the same object that already exists.

Always provide an explicit reset or destroy for the Singleton to facilitate testing.

Sunday, April 20, 2014

Creating a local email archive with: offlineimap and procmail

I synchronize my imap folders to maildir on my local laptop often so I can both have access to my email without a network and utilize my preferred search and email clients.  In order to facilitate how I use email I keep a local archive which created and filtered by procmail.

Here is an approximation of my crontab (cron doesn't start a shell, so I put most of the commands in a script):

% crontab -l
0-59/5 9-18 * * * $HOME/bin/syncemail 

Here is the syncemail script:


offlineimap 2>&1 | logger -t offlineimap

for i in `find $MAIL/Disney -type f -newer $PROCMAILD/log `; do
  cat "$i" | procmail

and here are the relevant portions of my .procmailrc:

ARCHIVEBY=`date +%Y-%m`
MKARCHIVE=`test -d ${ARCHIVE} || mkdir -p ${ARCHIVE}`

# Prevent duplicates
:0Wh: $PMDIR/msgid.lock
| /usr/bin/formail -D 100000 $PMDIR/msgid.cache



Sunday, March 23, 2014

REST: POST vs PUT for Resource Creation

Questions often come up about whether to use PUT or POST for creating resources in REST APIs.

I've found both are appropriate in different situations.


PUT is best used when the client is providing the resource id.
PUT https://.../v1/resource/<id>
Per spec PUT is for storing the enclosed entity "under the supplied Request-URI".  This makes it the ideal HTTP method for use when creating or "storing" a resource.  Only when all the requirements for PUT can't be met should POST be considered.   The perfect example of when the client cannot provide the resource id.


POST is best used when the client doesn't know the resource id a priori.
POST https://.../v1/resource
POST shouldn't be the first choice for resource creation is because  it's really more of a catchall method.
"The actual function performed by the POST method is determined by the server and is usually dependent on the Request-URI."
It doesn't require anything be created, or made available for later.
"A successful POST does not require that the entity be created as a resource on the origin server or made accessible for future reference. That is, the action performed by the POST method might not result in a resource that can be identified by a URI."

Wednesday, February 12, 2014

Python: Aggregating Multiple Context Managers

If you make use of context managers you'll eventually run into a situation where you're nesting a number of them in a single with statement.  It can be somewhat unwieldy from a readability point of view to put everything on one line:

with contextmanager1, contextmanager2, contextmanager3, contextmanager4:

and while you can break it up on multiple lines:

with contextmanager1, \
           contextmanager2, \
           contextmanager3, \

sometimes that still isn't very readable.  This is more of a problem if you're using the same set of context managers in a number of places.  Ideally you should be able to put the context managers in a variable and use that with however many with statements need them:

handlers = (contextmanager1, contextmanager2, contextmanager3, contextmanager4)
with handlers:

Of course this doesn't work because handlers is a tuple, not a context manager. This will cause with to throw a exception.  What you can do is create a context manager that aggregates other context managers:

from contextlib import contextmanager
import sys

def aggregate(handlers):
    for handler in handlers:
    err = None
    exc_info = (None, None, None)
    except Exception as err:
        exc_info = sys.exc_info()

    # exc_info get's passed to each subsequent handler.__exit__
    # unless one of them suppresses the exception by returning True
    for handler in reversed(handlers):
        if handler.__exit__(*exc_info):
            err = False
            exc_info = (None, None, None)
    if err:
        raise err

So now you can aggregate all the context managers into one and use that one in the with statement:

handlers = (contextmanager1, contextmanager2, contextmanager3, contextmanager4)
with aggregate(handlers):

You can build up the list of context managers however you want and use aggregate when using them in a with statement.


Friday, January 17, 2014

Python Metaprogramming: A Brief Decorator Explanation

A brief explanation on how to think about Python decorators.  Given the following decorator definition:

def decorator(fn):
    def replacement(*a, **kw):
    return replacement

This usage of the decorator

def fn():

is functionaly equivalent to

def fn():
fn = decorator(fn)

Note that fn is not being executed.  Instead decorator is being passed the callable object fn, and is in turn returning a callable object replacement which is then bound to the name fn.  Whether or not the original callable ever gets called is up to decorator and the replacement callable.

Another thing to consider, which often causes people problems, is the timing of the decorator's execution, which is to say during the loading of the module.  If you want to execute a particular piece of logic during fn's call, then that logic needs to be placed in the replacement callable, not in the decorator.

So now that everything is clear it's obvious that

def fn():

Is really just

def fn():
fn = decorator(make_decorator(args)(fn))

Which means the first decorator in a stack is the last to be evaluated.