jlm-blog
~jlm

26-Jul-2006

Haskell vs. C readability stawman

Filed under: programming — jlm @ 17:03

Every time I think about learning Haskell, I head over to haskell.org, read the Introduction link, and am offended by the astoundingly absurd quicksort example they provide for its superiority over C and leave in disgust.

First, let’s take a look at their Haskell implementation of quicksort:

    qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs)

Wait one second here! Something’s fishy. At the bottom, the basic operation quicksort does is a swap. Where’re the swaps? I don’t see any swapping. I do see splicing though; quicksort doesn’t splice. This isn’t quicksort; quicksort is an in-place sort, this thing here is a list sort. This is not just an implementation detail, it’s fundamental to what quicksort is. It doesn’t splice, it makes swaps. If you have some linked lists, you sort them with something like mergesort. But for arrays, you can’t beat quicksort.

Now, we can start on the C implementation they provide…

    qsort( a, lo, hi ) int a[], hi, lo;

Huh?

If you’re a C programmer, your first impression is to boggle at this syntax. It’s been deprecated for 17 years, and most C programmers nowadays probably don’t even recognize it. What moth-eaten tome did they dig this out of? Variables of hi and lo and h and l? l+1 and l-1 as expressions? That’s the kind of thing you see at the IOCCC. Let’s choose an actual decent example of quicksort for a more honest comparison, like the one from K&R2:


/* qsort:  sort v[left]...v[right] into increasing order */
void qsort(int v[], int left, int right)
{
    int i, last;
    void swap(int v[], int i, int j);

    if (left >= right)    /* do nothing if array contains */
        return;           /* fewer than two elements */
    swap(v, left, (left + right)/2); /* move partition elem */
    last = left;                     /* to v[0] */
    for (i = left+1; i <= right; i++)   /* partition */
        if (v[i] < v[left])
            swap(v, ++last, i);
    swap(v, left, last);        /* restore partition elem */
    qsort(v, left, last-1);
    qsort(v, last+1, right);
}

Much more readable than that strawman example they have on the Haskell intro page!
But it’s still an apples-to-oranges comparison, because the C implementation is doing a real quicksort, and the Haskell is doing a list sort. Show me a real in-place quicksort in Haskell: It’ll turn out looking almost like the implementation in K&R, I wager.

Now, Haskell was written by people who know their computer science. They know all about quicksort, they know their example is comparing different algorithms, and they can write better C code than what they use. Why do they have to be dishonest when promoting the virtues of Haskell, hmm?

6-Jan-2006

C++ style

Filed under: programming — jlm @ 21:44

In C, you only have call-by-value (with some wierdness for arrays). To pass a reference, you explicitly take the variable’s address. This has the benefit that you can easily tell a function call which won’t modify a variable from one which probably will:
  int i;
  f(i); /* i is unchanged */
  g(&i); /* i is likely changed */

However, for composite types, you often want to avoid making a copy when you’re not modifying the variable, so you end up with a function h(const T*) which is called as h(&x) anyway.

C++ adds call-by-reference, so that a call which looks like f(i) no longer tells you that i isn’t modified. This is a shame. Some style rules will give you the benefit of being able to tell modifying from non-modifying functions, while still getting the copy avoidance without burdening the caller with that implementation detail:
  f(T x) /* Call by value: x is a local copy. Use this when you want to play with x's value in f() without affecting the caller. */
  f(const T x) /* Call by value: x is a local copy and const. Use this for easily copied types which f() will only be reading. */

  f(T& x) /* Call by reference: x is shared with the caller and modifyable. Avoid this. */
  f(const T& x) /* Call by reference: x is shared with the caller but not modifyable. Use this for composite types which you don't want to gratuitously copy. Because x is const to f(), the caller can pretty much treat it as if f() got a copy. (The gotcha is that f() has to be aware that it'll see modifications to x made through other references, unlike with a copy.) */
  f(T* x) /* Call by pointer: Use when you want to have f() modify *x for the caller. */
  f(const T* x) /* Call by pointer, but not modifyable through it. Avoid without good casue, as this is generally semantically a call-by-reference. */

With the f(T&) case gone, we once again can be confident that f(i) won’t change i but g(&i) is likely to. More likely than with the C rules even, because we prefer f(const T&) to f(const T*) in cases where we’re merely using pointers to avoid an expensive copy. Pretty nice. Now I just need wide adoption and lint enforcement.

26-Oct-2005

Protecting running scripts from modifications

Filed under: programming — jlm @ 13:48

The shell reads scripts in as it runs, and not even nicely line-by-line, so when you modify a running script, weird things can happen as the shell gets inconsistent versions. Usually this just results in harmless errors like “/home/jlxmodmap not found”, but if you’re like me you worry about “rm tmpfile; process importantfile” turning into “rm importantfile”. The usual technique for the paranoid is to “mv script script.old; cp script.old script”; the shell sees the old script, which you keep unmodified, while you edit the copy. But it’d be nice if you didn’t have to do this everytime…

So, someone told me the trick of having all the work a script does happen in a shell function, which you call at the end of your script. I’m using this for my .xsession (canonical example of a script modified while it’s running), which looks like:

main() {
  do stuff
  do more stuff
  do even more stuff
  exit
}
 
main

voilà. It’s already read in all of main() by the time it runs it, so you can modify it freely, except when it’s just starting up and parsing things.

21-Sep-2005

Berkeley DB locking

Filed under: programming — jlm @ 18:28

The locking in Berkeley DB is fundamentally broken.

This isn’t something that’s bitten me recently, but talk about it did come up recently, and many BDB users are unaware of the problem. It’s especially surprising when you first encounter it because BDB is so solidly done in the rest of its implementation and the problem isn’t advertised. Of course locking will work — everything else does!

The problem is that BDB locks are bare mutexs, and no forcible release mechanism is provided. There is no way to rollback to a state before a lock was acquired. There is no way to retry. You take the lock, do your modifications, and release. Or you take the lock, do some of your modifications, and the circuit breaker trips, power is quickly restored, the server reboots, and now your BDB is hosed. You can’t take the lock because BDB thinks your old instance has it. You can’t release the lock because you don’t have it. If you could release the lock, your database is inconsistent because you can’t rollback. The only way you can do anything is to blow away the BDB environment and make a new one (suffering collateral damage when you just want to destroy a mere lock) and hope that the partial transaction you’re unleashing as a result isn’t a problem (and if it isn’t, why did you need a lock?).

So, what can you do? You can try to build your own transaction system on top of BDB, but that’s a lot of work. A lot of work, when there’s better featured databases out there already. If you were using BDB and need more, MySQL is probably good enough for you. It has partial transaction support (probably more than you want to write!), and does have a library implementation (libmysqld) so you can avoid running a server and just use data files, like in BDB. If you need full transaction support (good luck implementing that on your own), there’s always ye olde enterprise-grade PostgreSQL. Or maybe you don’t need transactions after all? Then Berkeley DB is solid as a brick.

15-Aug-2005

CalDAV

Filed under: programming, web — jlm @ 17:00

One of the sessions I attended at OSCon was about the upcoming CalDAV standard. As calendaring is a weak spot in the Linux/Free Software application space, advances here are particularly welcome.

There was a panel of implementers there, representing:

  • Novell / Hula
  • Mozilla Foundation / Sunbird and Lightning
  • RPI / UW Calendar server
  • OSAF / Cosmo and Scoobie and Chandler

(organization) / (programs)

A lot of the session talked about features of their various products, but I’ll ignore those in favor of looking at the protocol.
CalDAV in a nutshell is WebDAV holding iCalendar files and extended to handle calendar-type queries (“when is Joe Blow free?”). The idea is the basic common protocol one: An open spec will let all these products interoperate, so you can use the calendar app which is your favorite without having to bother with which calendar system is at the other end. (Do you know what software your email server is running?)

The IETF tried to standardize calendaring before, with its CAP (“Calendar Access Protocol”), but no one implemented it. Starting from WebDAV means implementers have less work to do, and hey!, it’s working.

Stuff like room / multiparty scheduling is not addressed, they’re leaving that for a later protocol. “Clearly more research is needed.” Conflicting schedule changes are detected using ETags; resolution is unspecified and left to the clients to decide.

If you’re doing any calendaring, you absolutely want to take a look at it. You’re already supporting iCalendar (right?), and there are WebDAV implementations out there you can use (though alas not particularly cleanly), and from there to CalDAV isn’t a big step for an implementer but is a giant leap for interoperability.

Powered by WordPress