July 15, 2008

Backwards Thinking

When I think about it, it strikes me as odd that someone who thinks E-Ink and modern languages are the best thing since transistors would enjoy a paperback on antiquated compiler techniques. Of course, it's hard to argue when I haven't been able to put the thing aside for the last week. Simply put, Knuth's "Selected Papers on Computer Languages" reads like a novel and is, in my humble opinion, better than his other books. The real difference is in flow (also, I find algorithm complexity immensely dull, but I digress). While TAoCP holds an overwhelming amount of information, the task of documenting that amount of material is simply too much; it just feels rushed. This collection, however, is Knuth in his element. Many of the papers were never formally presented; some are letters, others are just tidied-up notes. It's a very honest writing style and a real pleasure to read because of it. I find myself forgetting that some of the work I'm reading fundamentally changed the world of compilers when it was originally published. Here's to my favorite dead tree in the last several years.

July 4, 2008

Boilerplate is basically bad

or "How Huffman should help my hurting hands"

Typing breaks make me irritable. Usually, I occupy myself by finding someone to bother and striking up a conversation, but this week it seems I was alone in deciding not to vacation for the entire work week. Yesterday, this confluence came to a head and produced a brain-tangent concerning the volume of text that programming demands (I.E.- way too freaking much). Not being one to let things go even when I should, I spent the evening creating tools to shortcut some of the common things I do. In the end, I met with only moderate success, and I doubt I will actually adopt any of them during my day-to-day tasks, but I did come out with a better understanding of why I am so annoyed by the status quo.

A programs does two things: 1) it tells a computer what to do, and 2) it tells a reader what you told the computer to do. An efficient program does both as concisely as possible. The phrase "as X as possible" should tip you off to the fact that this is an optimization problem. More specifically, it has become an information theory optimization problem in the vein of Kolmogorov Complexity. Of course, it's not computable because the latter evaluation criterion requires modeling the human brain, but the point remains. Why should I write this:
#include <stdio.h>
int main( int argc, char **argv ) {

return 0;
}
When 99% of my code uses that same text? If we were to shove 100 C programs through a Huffman-esque meat grinder, that entire quote (and probably more!) would condense down to a single bit. Of course, it would be unreadable, but I believe I've made it clear that the middle-ground for optimizing our problem is much closer to a single bit than it is to that horrible monstrosity. (For the record, yes, I realize C is guilty of this far more than other languages, but I'm trying to make a point.)

I'll spare you the rest of my ravings on simplicity and coding, but suffice to say, I'll be digging out an information theory textbook and giving this a second look at some point. There's too much code out there, and something needs to be done about it all.