June 5, 2014

Using LLVM Passes with Clang

LLVM is a useful system for toying with compiler extensions: its ISA is clean and well-documented, its structure is modular and easy to extend, and its user community is large and active. Thus, when I began writing instrumentation in LLVM, I was a little surprised that there wasn't a straightforward path for integrating custom passes into a build flow. Nearly all of the documentation suggests using opt to load and run a custom pass. That leads to a flow that looks something like:
clang -O3 -emit-llvm -o source.ll source.c
opt -S -load custom_pass.so -custompass -o source_opt.ll source.ll
llvm-link -o out.ll source_opt.ll ...
llc -filetype=obj -o out.o out.ll
gcc out.o
It's not that complicated, but let's say that you're trying to instrument an existing project that's been built with autotools. Do you rewrite the `.in` files? Do you try to hack together a wrapper script that pretends like it's gcc but actually does all of the stuff above? What I'd really like to do is just put my custom pass into clang and use it instead:
clang -O3 -"magic" source.c
Well, as it so happens, LLVM has magic built-in, but it's not where you might think. There's no option to tell clang to load and run a custom pass in a certain phase. There is, however, an option to tell clang to load a custom extension, and there's a mechanism to allow that extension to register itself in a particular phase. This is just as good, so long as we're okay with the specific insertion points that LLVM provides (there are about half a dozen).

So what's the magic?
clang -Xclang -load -Xclang your_custom_pass.so ...
And a couple of lines inserted at the end of your pass code. I've written some demo code to show how it works, and it's available on github:



February 20, 2014

Rogue "heta": How I learned about raw strings in Python.

I don't claim to be a python wizard. Yes, I use python almost every day, but I use it differently than C. I understand C well, or more accurately, I know enough about the language, compiler optimization, and processor architecture to be dangerous when it comes to getting a computer to do tricks. I honestly don't care so much about what the python interpreter is doing because if I'm running into performance are scalability problems, python gets ditched. It's my quick-and-dirty tool.

Well, yesterday things got ugly, and I mean that literally. I was setting axes on a plot generated using matplotlib, and I was met with this on my X-axis:
Hmmm. I don't remember having a "heta" variable. I do remember having a theta, however, so I assumed that I had simply mistyped my label:
Double hmmm. Matplotlib supports LaTeX in strings, and I take full advantage of this fact quite often. Now I know some greek letters don't exist in LaTeX. Or rather, some capital greek letters are not included as special LaTeX commands because they are equivalent to their roman versions (for example, \epsilon exists, but \Epsilon does not, but you can just as easily write E and be done with it). None of this, however, applies here, because lowercase and uppercase theta do exist, and I was writing the lowercase version anyways. Omicron is the only greek letter to have neither upper nor lowercase LaTeX commands.

Dead-set on thinking I had somehow screwed up my LaTeX invocation, it never occurred to me that it could be a python problem until I turned to the smart guy I always annoy when I'm being dumb. A couple minutes and surgical Google trips later, I discovered that I had somehow managed to overlook raw strings for over three years (when I first picked up python to replace perl). Instead of writing a single LaTeX command, I had inadvertently written a tab character (\t) dutifully followed by four LaTeX variables h, e, t, and a. In hindsight, this should've been an obvious oversight: perl and bash both use single quote and double quotes to distinguish between interpreted and uninterpreted strings.

A single r prefixing the string, and sure enough, my axes are back to greek, I'm a little less ignorant, and all is good.

February 6, 2014


I am the first to admit that my mathematics is weak. It has always bothered me, and while I've made strides (many of them recently) to improve myself in this area, I've always had the nagging suspicion that I will always be a weak mathematician. While irrational fears cannot be conquered simply by reading an essay, I did take a small amount of solace when I ran across this today.

Edsger Dijkstra, one of the most famous computer scientists, is usually known for inventing much of modern structured programming, but he spent the last several decades of his life on formal methods---how to write programs that are provably correct.

"Looking back I cannot fail to observe my fear of formal mathematics at the time. In 1970 I had spent more than a decade hoping and then arguing that programming would and should become a mathematical activity; I had (re)arranged the programming task so as to make it better amenable to mathematical treatment, but carefully avoided creating the required mathematics myself. I had to wait for Bob Floyd, who laid the foundation, for Jim King, who shoed me the first example that convinced me, and for Tony Hoare, who showed how semantics could be defined in terms of the axioms needed for the proofs of properties of programs, and even then I did not see the significance of their work immediately. I was really slow."

  -Edsger Dijkstra, EWD1308, 10 June 2001