July 4, 2008

Boilerplate is basically bad

or "How Huffman should help my hurting hands"

Typing breaks make me irritable. Usually, I occupy myself by finding someone to bother and striking up a conversation, but this week it seems I was alone in deciding not to vacation for the entire work week. Yesterday, this confluence came to a head and produced a brain-tangent concerning the volume of text that programming demands (I.E.- way too freaking much). Not being one to let things go even when I should, I spent the evening creating tools to shortcut some of the common things I do. In the end, I met with only moderate success, and I doubt I will actually adopt any of them during my day-to-day tasks, but I did come out with a better understanding of why I am so annoyed by the status quo.

A programs does two things: 1) it tells a computer what to do, and 2) it tells a reader what you told the computer to do. An efficient program does both as concisely as possible. The phrase "as X as possible" should tip you off to the fact that this is an optimization problem. More specifically, it has become an information theory optimization problem in the vein of Kolmogorov Complexity. Of course, it's not computable because the latter evaluation criterion requires modeling the human brain, but the point remains. Why should I write this:
#include <stdio.h>
int main( int argc, char **argv ) {

return 0;
}
When 99% of my code uses that same text? If we were to shove 100 C programs through a Huffman-esque meat grinder, that entire quote (and probably more!) would condense down to a single bit. Of course, it would be unreadable, but I believe I've made it clear that the middle-ground for optimizing our problem is much closer to a single bit than it is to that horrible monstrosity. (For the record, yes, I realize C is guilty of this far more than other languages, but I'm trying to make a point.)

I'll spare you the rest of my ravings on simplicity and coding, but suffice to say, I'll be digging out an information theory textbook and giving this a second look at some point. There's too much code out there, and something needs to be done about it all.

No comments: