Tuesday, March 29, 2005

Sorting in Python

List sorting in Python (the list.sort method) is fast: sorting has been heavily optimised in successive versions of Python.

However, there's always been a penalty to pay if you don't want the sort's default ordering. Maybe you want to sort on one field, for example sorting a list of log items by timestamp. Or maybe you want to sort by some function of each item, for example sorting a list of vectors by magnitude.

Historically there have been two methods of doing this. The simplest is to pass a custom comparison function to list.sort:

def cmp_magnitude(a, b):
    return cmp(a[0]*a[0] + a[1]*a[1],
               b[0]*b[0] + b[1]*b[1])

vectors.sort(cmp=cmp_magnitude)


Here we're sorting 2D vectors by magnitude, using the standard trick to save an expensive square-root operation: sqrt(x) is monotonic in x, so sorting by the square of the magnitude is equivalent to sorting by magnitude.

This code is easy to understand, but slow. It makes a Python function call for each comparison made in the sort; and a sort will typically make many more comparisons than there are elements to sort.

The second, trickier method is known in the Python world as decorate, sort, undecorate (DSU) and in the Perl world as the Schwartzian Transform. The trick here is to decorate each element in the list by prepending a key to each element; to sort the decorated list with the default sort; and then to undecorate the list by stripping off the keys, leaving the sorted list of elements.

decorated = [(val[0]*val[0] + val[1]*val[1], val)
             for val in vectors]
decorated.sort()
vectors = [val for (key, val) in decorated]


This avoids multiple calculation of the key by precomputing each key once; and avoids the multiple expensive calls to a custom comparison function. However, the auxiliary list, decorated, takes extra memory and extra time to construct and deconstruct. And the code is not straightforward to the untrained eye: once you're familiar with DSU, you'll easily identify it as such, but if you haven't seen DSU before it takes a little puzzling through.

And there's another potential pitfall to DSU: it may not have the same behaviour in the face of equal keys as sort-with-cmp. In sort-with-cmp, elements with equal keys will be subject to whatever behaviour sort implements for equal elements. In Python 2.4, sort is guaranteed stable, so the existing order will be preserved. In previous Python versions, sort was not guaranteed stable; although in practice it usually was.

In DSU, elements with equal keys will be sorted by value. This may or may not be a problem, depending on the application. To avoid this and retain stability in elements with equal keys, decorate with both key and index:

decorated = [(val[0]*val[0] + val[1]*val[1], i, val)
             for (i, val) in enumerate(vectors)]
decorated.sort()
vectors = [val for (key, i, val) in decorated]


Or you could borrow from a typical C programming pattern: don't sort elements, sort pointers or indexes to elements:

indexed = [(val[0]*val[0] + val[1]*val[1], i)
           for (i, val) in enumerate(vectors)]
indexed.sort()
vectors = [vectors[i] for (key, i) in indexed]


Python 2.4 adds an extra optional argument to list.sort, key, which can be passed a custom key-computation function. One call to key is made for each element in the list. The list is then sorted by the key of each element, rather than the value of each element. In effect, this internalises DSU within list.sort's implementation. The code is straightforward:

def key_magnitude(a):
    return a[0]*a[0] + a[1]*a[1]

vectors.sort(key=key_magnitude)


This is as clear and concise, and has the same stability in the face of equal keys, as sort-with-cmp. But it also avoids most of the overhead of sort-with-cmp, calling a key function once per element rather than a comparison function multiple times per element.

Some timings, sorting float vectors of various lengths N (code); all times in seconds:

sortN=
1001,00010,000100,0001,000,000
sort(cmp)0.001170.01910.2723.4743.6
DSU0.0002120.002740.2653.5741.9
stable DSU0.0002250.002890.2733.6945.4
index DSU0.0002280.002950.2753.7344.6
sort(key)0.0001680.002020.02780.4486.28
normal sort0.0001050.001650.02390.3785.38

And the same data in a log-log chart:

Chart.

What this shows is that, for small N, the DSU pattern is significantly more efficient than sort-with-cmp. However, at N=10,000 and above, DSU's overhead in list allocation, decoration, and undecoration starts to bite, leading to similar performance to sort-with-cmp. The two stable DSU patterns have slightly higher cost than plain DSU, with index DSU in general performing slightly worse: possibly the indexing overhead in reassembling the undecorated list is to blame.

But in all cases, sort-with-key is an order of magnitude faster than any other sorting pattern, with performance close to that of a regular sort. If you're targetting Python 2.4, it's time to let go of DSU: sort-with-key is the new state-of-the-art.

Comments: