Wednesday, August 27, 2008

Wakeup, Facepalm, Yield, repeat...

Today I dove into threading in Python for the first time. Usually the best way to learn is by experimenting, so, I decided to write a small test program. To begin, I started with what I thought would be the absolute minimal "threading" program, a single threaded program that just imports the "threading" module. Not useful, but, lets just do a sanity check.


import threading
print "hi"


To my utter amazement, I get back


hi
hi


Good thing I did this sanity check? Something is going on here I don't understand.

So, I start probing. I flush stdout before importing threading-- it still prints twice.

I import time and sleep for a second before importing threading-- it still prints twice.

It gets worse. I fire up an interpreter and try to import threading and... wait for it... I still get 'hi'. WTF?!?


Python 2.5.1 (r251:54863, Feb 4 2008, 21:48:13)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import threading
hi
>>>


It takes me a few minutes after seeing this, but I finally get it.

My test file was named "threading.py". D'oh!. I was importing myself, hence, the duplicate "hi"s. A simple but painful lesson in avoiding module name collisions.

Wednesday, April 16, 2008

KeyError breaks the unwritten rules

I ran across an interesting little "feature" in Python today; it cropped up as a bug in an old version of DocumentTemplate that I am dependent upon. It turns out that, unlike other Exceptions in Python, strings within KeyErrors don't round-trip. In other words,


"hello" != str(KeyError("hello"))



This is certainly unintuitive; I would have made the same mistake as the Zope guys. In fact, I have tried it for other Exception classes and most do round-trip. To quoteth the Zen:

Special cases aren't special enough to break the rules.


What is so special about KeyError that it gets to break the rules?

Here is a little test I whipped up to explore this:



import unittest

class TestExceptionStringRoundTrip(unittest.TestCase):
def assertStrCast(self, ExceptionClass):
s = "hello"
self.assertEqual(s, str(ExceptionClass(s)))

for builtin in dir(__builtins__):
if builtin.endswith('Error') or builtin.endswith('Exception'):
def _test(self, ExceptionClass=builtin):
self.assertStrCast(getattr(__builtins__, ExceptionClass))
setattr(TestExceptionStringRoundTrip, "test%s" % builtin,
_test)

if __name__ == "__main__":
unittest.main()



There is only one failure: KeyError. There are three errors from UnicodeDecodeError, UnicodeEncodeError and UnicodeTranslateError because they don't accept just a string as an argument (they probably should!)

Anyway, a fun little aside.

UPDATE:
Amaury Forgeot d'Arc has beaten me to the punch with a patch to fix this. I was kind of hoping I would get a crack at this, but this guy is just a coding machine :). Looks like he has a lot more experience with Python's source, so, I trust his patch is probably a little nicer than mine would have been. Here's the bug report for those who are interested (the patch is linked from the report).

Mac OS X quirks

Today, I roll into work around 10:30, free Starbucks Pike Place Blend in hand (yummy!), login to my workstation, and see the strangest thing. I do my customary git-status and see that I have made huge changes to one of our unit-tests, TestAccountant, last night. My checkout resides on my personal workstation, not a shared server, so how could one of my co-workers modified my local tree? I don't even have an ssh server enabled! The only logical conclusion is that I must have sleep-drove over, sleep-coded huge chunks of fairly logical looking tests, and sleep-drove back, all before I woke up this morning. And, I don't even feel tired!

Well, that probably didn't happen. So, what-the-dilly-yo?

Well, my first guess was that someone had accidentally cloned my repository and was now pushing their changes back to me.

Background: We use git, which is a distributed source control system like Bazaar or BitKeeper. This means that, if setup to do so, anybody can clone anybody else's repository, make changes and push those changes back-to 'origin'. This is very different from svn or cvs which are client-server based and require a single centralized master repository. Distributed source control is very powerful, but since I don't understand it all that well, I must admit, it makes me a little queasy to think of how many things could go wrong.

This theory, as it turned out, was wrong. What happened, you say? Case-insensitve filenames, that's what happened.

I've been using a Mac at work for the past couple of years now. I generally like it except that Macs tend to be so darn different:

/Users instead of /home
/Library instead of /usr/lib or /usr/local/lib.

And now, I've discovered, case-insensitive file names.

It turns out that we have two unit-tests, one named testAccountant and one named TestAccountant. You see where this is going. Only lower-case testAccountant is appearing in my local tree; my Mac has kindly decided to randomly pick which file I may work with :). However, this copy is being diffed against TestAccountant in my local repository. Two completely different files = massive differences.

Not a huge deal, but, let this be a reminder: Never rely on case alone to distinguish files. It ain't guaranteed to be respected across all systems and the bugs that do crop up can manifest themselves in some pretty strange ways.

Now, back to sleep-coding...

Friday, April 11, 2008

Bash History (meme)

$ history|awk '{a[$2]++} END{for(i in a){printf "%5d\t%s\n",a[i],i}}'|sort -rn|head
177 ls
122 cd
91 git-status
62 vi
49 git-commit
44 git-pull
38 python
36 git-push
35 git-diff
33 matrix

Wow, I compulsively type 'ls'. I get the feeling I type that in the brief moment when I don't know what I'm going to do next, even if I have no desire to see the contents of the current directory. Habit, really.

Thursday, April 10, 2008

ASPN Linear Equation Solver In Three Lines!

Just ran across this recipe in the Cookbook. It's a linear equation solver in three lines of Python code. Really clever.

Wednesday, April 9, 2008

pycsvdiff Initial Release

I have just finished my first cut at a csv file differ, pycsvdiff, written in Python. I decided to write my own differ, when, after a cursory glance around, I saw only two promising utilities. One was a cgi script that required uploading your files (not gonna happen!). The other was written in perl ('nuff said).

Some notes:

I have included a --run-tests option to verify things are working

The csv differ is built on top of more generic Table diffing code which is, in turn, built on top of even more generic sequence diffing code. I looked at SequenceMatcher in difflib, but this wasn't quite what I was looking for.

This was my first project written from the ground up using TDD. Needless to say, the ability to refactor with confidence was just, well, amazing! Not to mention the satisfaction that comes from the parade of dots that bolts across the screen. (dots FTW!). Now I wish I got one of those nosetests shirts at PyCon.

Wednesday, February 6, 2008

SQLinForm Beautification

Thanks to the magic of ORMs, the number of SQL queries I have to write is pleasantly small. For those times when I do have to hack out a beast of a query, it's nice to know SQLInForm is out there.

SQLinForm is a SQL beautifier–like tidy is for HTML: you feed it raw SQL, no matter how munged, cobbled or kludged, and it will produce a nicely formatted version of that query that you can then paste back into your code.

Today, I was playing around with it with queries of varying degrees of complexity and it seemed to handle all of them quite nicely. It supports many common SQL dialects including MS-SQL, Oracle, and MySQL among others. There are quite a few configuration options available, from spacing between tokens, indentation, colorization, etc. It even has presets for "Large SQL", "Small SQL", and "1-Line SQL". Very nifty.

Unfortunately, this app is currently only available via the Web, so no chance of automated code cleanup right now at least. Apparently there used to be a desktop version which is no longer available, though the developer says there will be new version out in March 2008. I really hope there is a nice CLI front-end for it. Heck, I'd even be willing to plunk down a few bucks for something as time saving as this.