Two Crocodiles

June 5, 2014

Using LLVM Passes with Clang

LLVM is a useful system for toying with compiler extensions: its ISA is clean and well-documented, its structure is modular and easy to extend, and its user community is large and active. Thus, when I began writing instrumentation in LLVM, I was a little surprised that there wasn't a straightforward path for integrating custom passes into a build flow. Nearly all of the documentation suggests using opt to load and run a custom pass. That leads to a flow that looks something like:

clang -O3 -emit-llvm -o source.ll source.c
opt -S -load custom_pass.so -custompass -o source_opt.ll source.ll
llvm-link -o out.ll source_opt.ll ...
llc -filetype=obj -o out.o out.ll
gcc out.o

It's not that complicated, but let's say that you're trying to instrument an existing project that's been built with autotools. Do you rewrite the `.in` files? Do you try to hack together a wrapper script that pretends like it's gcc but actually does all of the stuff above? What I'd really like to do is just put my custom pass into clang and use it instead:

clang -O3 -"magic" source.c

Well, as it so happens, LLVM has magic built-in, but it's not where you might think. There's no option to tell clang to load and run a custom pass in a certain phase. There is, however, an option to tell clang to load a custom extension, and there's a mechanism to allow that extension to register itself in a particular phase. This is just as good, so long as we're okay with the specific insertion points that LLVM provides (there are about half a dozen).

So what's the magic?

clang -Xclang -load -Xclang your_custom_pass.so ...

And a couple of lines inserted at the end of your pass code. I've written some demo code to show how it works, and it's available on github:

https://github.com/rdadolf/clangtool

Enjoy!

February 20, 2014

Rogue "heta": How I learned about raw strings in Python.

I don't claim to be a python wizard. Yes, I use python almost every day, but I use it differently than C. I understand C well, or more accurately, I know enough about the language, compiler optimization, and processor architecture to be dangerous when it comes to getting a computer to do tricks. I honestly don't care so much about what the python interpreter is doing because if I'm running into performance are scalability problems, python gets ditched. It's my quick-and-dirty tool.

Well, yesterday things got ugly, and I mean that literally. I was setting axes on a plot generated using matplotlib, and I was met with this on my X-axis:

Hmmm. I don't remember having a "heta" variable. I do remember having a theta, however, so I assumed that I had simply mistyped my label:

ax.set_xlabel('$\theta_2$='+str(theta2))

Double hmmm. Matplotlib supports LaTeX in strings, and I take full advantage of this fact quite often. Now I know some greek letters don't exist in LaTeX. Or rather, some capital greek letters are not included as special LaTeX commands because they are equivalent to their roman versions (for example, \epsilon exists, but \Epsilon does not, but you can just as easily write E and be done with it). None of this, however, applies here, because lowercase and uppercase theta do exist, and I was writing the lowercase version anyways. Omicron is the only greek letter to have neither upper nor lowercase LaTeX commands.

Dead-set on thinking I had somehow screwed up my LaTeX invocation, it never occurred to me that it could be a python problem until I turned to the smart guy I always annoy when I'm being dumb. A couple minutes and surgical Google trips later, I discovered that I had somehow managed to overlook raw strings for over three years (when I first picked up python to replace perl). Instead of writing a single LaTeX command, I had inadvertently written a tab character (\t) dutifully followed by four LaTeX variables h, e, t, and a. In hindsight, this should've been an obvious oversight: perl and bash both use single quote and double quotes to distinguish between interpreted and uninterpreted strings.

A single r prefixing the string, and sure enough, my axes are back to greek, I'm a little less ignorant, and all is good.

February 6, 2014

Mathphobia

I am the first to admit that my mathematics is weak. It has always bothered me, and while I've made strides (many of them recently) to improve myself in this area, I've always had the nagging suspicion that I will always be a weak mathematician. While irrational fears cannot be conquered simply by reading an essay, I did take a small amount of solace when I ran across this today.

Edsger Dijkstra, one of the most famous computer scientists, is usually known for inventing much of modern structured programming, but he spent the last several decades of his life on formal methods---how to write programs that are provably correct.

"Looking back I cannot fail to observe my fear of formal mathematics at the time. In 1970 I had spent more than a decade hoping and then arguing that programming would and should become a mathematical activity; I had (re)arranged the programming task so as to make it better amenable to mathematical treatment, but carefully avoided creating the required mathematics myself. I had to wait for Bob Floyd, who laid the foundation, for Jim King, who shoed me the first example that convinced me, and for Tony Hoare, who showed how semantics could be defined in terms of the axioms needed for the proofs of properties of programs, and even then I did not see the significance of their work immediately. I was really slow."

-Edsger Dijkstra, EWD1308, 10 June 2001

July 2, 2010

Zen and the Art of Text Editor Programming

I like technology. I surround myself with computers and electronics, I put a lot of faith in the scientific method as a general solution to life's daily problems, and I believe the pursuit of knowledge is one of the purest aspirations one can have.

I also realize that it can go horribly wrong, which is why when I was idly wandering through the vast information jungle that is the Internet, I was so pleased to stumble across this gem of a program:

OmmWriter

It's a text editor. What distinguishes it from a lot of other software is that it takes such a focused approach to solving one particular problem: providing a peaceful, distraction-free environment so that a writer can focus on writing.

Now you're probably thinking, "I thought you were a vi bigot. What gives? Have you renounced your faith and joined the mindless horde of infidels?" The answer is, of course, no – I will probably use vi for code until the day my fingers fall off. Still, I don't always write code. Sometimes I write words (real ones, with punctuation and everything!), like the ones you're reading now. When I write prose, all the little tricks that make vi a great programming editor are somewhat distracting. I want to write. Just me and the words.

This is where OmmWriter excels. It's a full-screen editor with rudimentary features. You can load files and save files. You can choose one of three fonts and one of three font sizes. It only edits text files, so you cannot format anything. You can adjust the writing area, and you can scroll up and down if necessary. Finally, you can change the background image and sound effects. Now there are probably two questions in your head: "why is he listing features?" and "sound effects!?" To answer the first, it's because I just enumerated ALL of the features. All of them. It is a wonderfully minimalist editor. To answer your second question, don't knock it until you've tried it. I really thought that the sound/background thing was just a huge art gimmick to get publicity for their program, but I've been writing on it all day, and I actually find it really effective. It doesn't distract, and it puts me in a great, relaxed frame of mind.

So in conclusion, here's to you, artists. The unlikely combination of a bit of programming skill, a light brush, and a very focused vision for the right way to write has produced a wonderful little tool and won me over completely.

...oh, and I should probably also mention that because I am an unabashed terminal fanatic, I couldn't stand not being able to run OmmWriter as a command-line tool. So I fixed that little oversight: omm. You're welcome!

June 3, 2010

Stupid Parser Tricks, Part 1: Two Lexers, One Token Stream

So lately I’ve been working with ANTLR, a lexer/parser generator by Terrence Parr at UCSF. It’s been pretty enjoyable, for the most part. There have been a couple of times where I wanted to pull my hair out, but overall, it has really saved me a lot of time and effort in my attempt to write a DSL for some folks at work.

Anyways, in the course of working with it, I stumbled across a couple of neat hacks that I thought I’d share. The first is how to write a hybrid lexer for an embedded DSL. I’m not the first to try something like this: Parr’s island-grammar example (examples-v3.tar.gz) shows how to traverse a hybrid grammar but stops short of merging the ASTs, and there’s a pretty hairy discussion of a more complicated scenario on the ANTLR wiki, but neither of these enable us to make two lexers transparently behave like one (which makes the programmer’s life easier and the downstream code cleaner, if you can get away with it).

Basically, the recognizer classes that ANTLR builds for a grammar are self-contained enough to be called recursively without exploding. The island-grammar example I linked above does this, and it’s a very clever feature. What it doesn’t show you is that it’s possible to splice together the two token streams. The only tricky part is that by default, ANTLR is not written to handle recording multiple tokens for a single lexer rule (ostensibly for efficiency reasons). This means that when the lexer for the embedded language is invoked, we need to modify the base recognizer to handle the deluge of tokens that is produced. It’s not a difficult fix -- we just add a token buffer in-line with the function responsible for passing tokens up the chain and modify the emit function to feed the buffer instead. Once that plumbing is in place, we drop the embedded lexer in, collect the tokens, and feed them one by one to the outer lexer. The driver simply calls the outer lexer and the parser never knows the difference.

There’s not enough space to walk through the code, but I’ve provided a python implementation below. Enjoy!

stupid_parser_tricks.tgz (tarball)
-or-
stupid_parser_tricks (individual files)

(caveat emptor: if you don’t have lexer-level syntax for delineating the embedded language, you’re out of luck with this method. That’s where the more complicated scenario I mentioned arises.)

November 14, 2009

Out with the old, in with the new

Well, it's official. I've quit my job and shipped my things out west. In a week or two, I'll be starting my new job in the CASS-MT group at PNNL.

Here's to the start of a new adventure! Wish me luck.

March 28, 2009

Learning a language by writing another

Not too long ago, I was labeled an anachronist during a debate over trends in modern languages. At the time, I was rather amused, given how much time I spend in my professional life fighting to push research techniques into practical use. In reflection, though, his point did have one valid facet: many of the language techniques I bring up often come from variants of older languages even if the languages themselves are much newer (e.g.- multilisp, Cilk, UPC). So, under the auspices of broadening my horizons, I decided to learn Haskell.

I hit google, and after trying out several of tutorials on the language, I found myself extremely frustrated with writers who constantly tried to ignore half of Haskell's features in the name of providing a "simple" picture to the reader. I understand that some of the intended audience doesn't have a programming background, but when the examples are so simplistic that they actually fail to acknowledge that Haskell's type system even exists, I draw the line.

Thankfully, there was an alternative, in the form of a Wikibooks document called "Write Yourself a Scheme in 48 Hours." Naturally, I didn't actually expect it to take 48 hours (though such things have been done), but I'm a big fan of using a non-trivial example program as a learning tool. My history of working in and on various scheme implementations certainly doesn't hurt either; the territory is pretty well-traveled.

As I work my way through the PDF and all of its exercises, I am certainly accumulating my share of quips with Haskell, but what would a real language be without that familiar love-hate relationship? Something tells me that I'll be happy I went through the motions in about 3-6 months. Here's to you, future-me.