Wednesday, April 16, 2008

KeyError breaks the unwritten rules

I ran across an interesting little "feature" in Python today; it cropped up as a bug in an old version of DocumentTemplate that I am dependent upon. It turns out that, unlike other Exceptions in Python, strings within KeyErrors don't round-trip. In other words,


"hello" != str(KeyError("hello"))



This is certainly unintuitive; I would have made the same mistake as the Zope guys. In fact, I have tried it for other Exception classes and most do round-trip. To quoteth the Zen:

Special cases aren't special enough to break the rules.


What is so special about KeyError that it gets to break the rules?

Here is a little test I whipped up to explore this:



import unittest

class TestExceptionStringRoundTrip(unittest.TestCase):
def assertStrCast(self, ExceptionClass):
s = "hello"
self.assertEqual(s, str(ExceptionClass(s)))

for builtin in dir(__builtins__):
if builtin.endswith('Error') or builtin.endswith('Exception'):
def _test(self, ExceptionClass=builtin):
self.assertStrCast(getattr(__builtins__, ExceptionClass))
setattr(TestExceptionStringRoundTrip, "test%s" % builtin,
_test)

if __name__ == "__main__":
unittest.main()



There is only one failure: KeyError. There are three errors from UnicodeDecodeError, UnicodeEncodeError and UnicodeTranslateError because they don't accept just a string as an argument (they probably should!)

Anyway, a fun little aside.

UPDATE:
Amaury Forgeot d'Arc has beaten me to the punch with a patch to fix this. I was kind of hoping I would get a crack at this, but this guy is just a coding machine :). Looks like he has a lot more experience with Python's source, so, I trust his patch is probably a little nicer than mine would have been. Here's the bug report for those who are interested (the patch is linked from the report).

Mac OS X quirks

Today, I roll into work around 10:30, free Starbucks Pike Place Blend in hand (yummy!), login to my workstation, and see the strangest thing. I do my customary git-status and see that I have made huge changes to one of our unit-tests, TestAccountant, last night. My checkout resides on my personal workstation, not a shared server, so how could one of my co-workers modified my local tree? I don't even have an ssh server enabled! The only logical conclusion is that I must have sleep-drove over, sleep-coded huge chunks of fairly logical looking tests, and sleep-drove back, all before I woke up this morning. And, I don't even feel tired!

Well, that probably didn't happen. So, what-the-dilly-yo?

Well, my first guess was that someone had accidentally cloned my repository and was now pushing their changes back to me.

Background: We use git, which is a distributed source control system like Bazaar or BitKeeper. This means that, if setup to do so, anybody can clone anybody else's repository, make changes and push those changes back-to 'origin'. This is very different from svn or cvs which are client-server based and require a single centralized master repository. Distributed source control is very powerful, but since I don't understand it all that well, I must admit, it makes me a little queasy to think of how many things could go wrong.

This theory, as it turned out, was wrong. What happened, you say? Case-insensitve filenames, that's what happened.

I've been using a Mac at work for the past couple of years now. I generally like it except that Macs tend to be so darn different:

/Users instead of /home
/Library instead of /usr/lib or /usr/local/lib.

And now, I've discovered, case-insensitive file names.

It turns out that we have two unit-tests, one named testAccountant and one named TestAccountant. You see where this is going. Only lower-case testAccountant is appearing in my local tree; my Mac has kindly decided to randomly pick which file I may work with :). However, this copy is being diffed against TestAccountant in my local repository. Two completely different files = massive differences.

Not a huge deal, but, let this be a reminder: Never rely on case alone to distinguish files. It ain't guaranteed to be respected across all systems and the bugs that do crop up can manifest themselves in some pretty strange ways.

Now, back to sleep-coding...

Friday, April 11, 2008

Bash History (meme)

$ history|awk '{a[$2]++} END{for(i in a){printf "%5d\t%s\n",a[i],i}}'|sort -rn|head
177 ls
122 cd
91 git-status
62 vi
49 git-commit
44 git-pull
38 python
36 git-push
35 git-diff
33 matrix

Wow, I compulsively type 'ls'. I get the feeling I type that in the brief moment when I don't know what I'm going to do next, even if I have no desire to see the contents of the current directory. Habit, really.

Thursday, April 10, 2008

ASPN Linear Equation Solver In Three Lines!

Just ran across this recipe in the Cookbook. It's a linear equation solver in three lines of Python code. Really clever.

Wednesday, April 9, 2008

pycsvdiff Initial Release

I have just finished my first cut at a csv file differ, pycsvdiff, written in Python. I decided to write my own differ, when, after a cursory glance around, I saw only two promising utilities. One was a cgi script that required uploading your files (not gonna happen!). The other was written in perl ('nuff said).

Some notes:

I have included a --run-tests option to verify things are working

The csv differ is built on top of more generic Table diffing code which is, in turn, built on top of even more generic sequence diffing code. I looked at SequenceMatcher in difflib, but this wasn't quite what I was looking for.

This was my first project written from the ground up using TDD. Needless to say, the ability to refactor with confidence was just, well, amazing! Not to mention the satisfaction that comes from the parade of dots that bolts across the screen. (dots FTW!). Now I wish I got one of those nosetests shirts at PyCon.