Friday, April 19, 2013

Python str with custom truth values

I've run into multiple instances where I get a string from some external service like a config file or a database.  There are values that I want to treat as true and others as false for the purposes of logic in my code.  For example let's say I have a status string with a few possible values, some false and some true.
  1. "Yes", "on", "true"
  2. "No", "off", "false"
I can map the values to True or False in Python.  Now I just need a string I can configure what it's truth values are.

class BooleanString(str):
    def __nonzero__(self):
        return self.lower() in ('on', 'yes', 'true') 

bool(BooleanString("YES")) # return True
bool(BooleanString("some other value")) # returns False

The thing is I don't want to hard code the truth values in the class definition.  I want to pass them in to the creation of the derived class.

def mkboolstr(truth):
    def __nonzero__(self):
        return self.lower() in truth
    return type('BooleanString', (str, ), dict(__nonzero__=__nonzero__))

BooleanString = mkboolstr(('on', 'yes', 'true'))

bool(BooleanString("TRUE")) # return True
bool(BooleanString("some other value")) # returns False

And now we have a function which creates a BooleanString class using a parameterized string as the comparable truth.



  1. Could you implement this pattern using a simple function instead? One serious drawback to the readability of your solutions is that the term `Boolean` or `bool` provides no hint as to *what* about the string might be true or false. A function with a name like `is_synchronized()` would make the generic term `bool` almost disappear, replacing it with the convention `is_*`, and would in addition actually explain *what* about the string might have trueness or falsehood.

    1. Yes you could do that as well. It depends on what you trying to accomplish obviously. For my case for example I had an object with your is_synchronized() method. I wanted it to return a boolean with the actual string value. That's how the idea came about. This would allow is_synchronized() to be treated as a boolean and not lose any information because the actual string value would be preserved.

    2. I don't think it depends on what you're trying to accomplish at all, really. It's always going to be better to have an expressive name attached to your operation. The fact that you want all these features out of a "string" or a "boolean" just suggests neither one is the right data structure. If you don't like the idea of passing a string around and using a function like "is_synchronized" to check whether it satisfied some condition, then make a class, give it an attribute carrying your string data and give it another method for checking whether it satisfied the condition. "is_synchronized" is still a good candidate for the name of the method, but even if you stuck with "__nonzero__" as the name, you've still accomplished what you wanted and you've avoided making a str subclass.

      Separately, `True if self == truth else False` is an anti-pattern. What you meant to write there was `self == truth`.

    3. I've fixed the anti-pattern. Good catch, I should have caught it.

      Regarding the idiom itself:

      I am not trying to tie together a bunch of disparate features together. I am, singularly, adding a truth value to the string. However, even if I was mixing and matching different features that doesn't invalidate the use of the data structure. Otherwise we wouldn't have data structures like treaps, tries, ordered associative maps, etc.

      I do agree that you shouldn't try and mash together too many features or values ala the god object anti-pattern. However, a BooleanString is miles away from a god object. This is an object which is being extended in a very specific way, AND the extension is to add a derived value/evaluation from the existing value.

      I have taken your criticism and tried to improve the example.

  2. Guess I'm missing what this solution offers over creating some "constants", putting them in a tuple and using the 'in' operator. Seems to me that would accomplish the same thing with better performance and make PEP20 happy.

  3. Good call on the antipattern. I should have seen that.

    With regard to the idiom. We aren't talking about merging a bunch of different features. There is precisely one extension to string. That is treating a particular value as true. Perhaps my example is not the best choice, however there are cases where it would be.

    Treat the string 'yes' or 'on' or 'true' as true may be more demonstrative.