Review: Dragon Book, 2nd Edition

\<a href="http://www.amazon.com/exec/obidos/ASIN/0321486811/thinkinginnet-20"" target="_blank" rel="noopener noreferrer">Compilers: Principles, Techniques, & Tools 2nd Ed. by Aho, Lam, Sethi, & Ullman is the perfect book for two niches: people writing C compilers for embedded processors and CS students running the gantlet of compiler courses. I don't want to say that those niches don't exist and that for people in that situation The Dragon Book, as it's universally known, is anything but essential reading. For the rest of the world, that is, that broader-than-professional-compiler-writers-but-still-a-marked-subset-of-the-world-of-professional-programming niche, the book is not compelling. The Dragon Book, as the text has been universally known, is widely considered a "hard read" and, like Shakespeare or Joyce, even a smart person reading it without guidance or prior exposure will have difficulty gauging the amount of attention one should give a particular point. It's near-universal reach probably does make it "the one book to have if you're having only one," and it's depth has kept it a spot on many a shelf for many a year, but in all honesty I found myself disappointed while reading it.

The entire second half of the book is devoted to optimization and OS- and machine-level code generation. It remains the definitive text on this subject, but this subject is outside the realm of interest for the large majority of people who might be interested in writing compilers. Today, most people interested in compiler construction are far more interested in language design issues than in chip-level instruction sets and graph-based optimizations. Whether a minority or a majority, I can't say, but many (potential) compiler writers are interested in targeting the CLR and/or the JVM and thereby embrace a simplified runtime environment.

It seems sadly out of touch with current debates: the index lacks references to, for instance, "duck typing" or "packrat parsing," two techniques that are unquestionably, on the one hand, "all the rage" and, on the other, theoretically intriguing. To not cover either, in a book that spends 200 pages on lexical and syntactical analysis, seems almost willfully pig-headed. The book covers its chosen principles in depth (bordering on "is this a book on compilers or automata theory?") but does not venture outside those lines.

Another instance: on page 966, in the appendix, there're two paragraphs discussing "object-oriented vs. phase-oriented" compiler design. To call that perfunctory is to be kind -- perfunctory coverage would at least mention theVisitor design pattern.

Again: just in the past few days, there's been talk about how one could / should generate method dispatch in a dynamic language. Joel Spolsky says that Ruby is slow because of method dispath. Avi Bryant says that's true, but only because of naive code generation (Ruby today does not follow The Hugunin Directive : "Use fast paths for common cases, using native constructs if possible.") Ted Leung refers to Avi's discussion as "the 20-year-old method of in-line method caching." The Dragon Book, in contrast, explains how to construct a stack frame.

Similarly, The Dragon Book spends a tremendous amount of time on "lexing" efficiently (lexical analysis is the first step of writing a compiler: transforming a stream of characters into a stream of tokens) and such is the influence of theory that I would be shocked to ever see a lexer not based on deterministic finite automata (DFAs). Anyone who ever wrote a lexer using, say, String.Split() and a hashtable would be banished from The He-Man's Compiler Writers Club. This despite the fact that such a lexer would be highly transparent in purpose and design and would almost certainly be able to handle any reasonably-sized source code input (I was going to qualify this, but for heaven's sake, we've got gigabytes and gigahertz today).

As for "tools," the discussion is solely focused on lex & yacc. A quick example of how creaky the discussion is: the one source code snippet showing input to lex shows a function that returns a pointer and whose signature specifies it as returning an int. So...there's a piece of C code that hasn't been updated in twenty years. True, lex & yacc are the ancestors of virtually every compiler-construction tool. More's the pity, as the general design of these tools could be made recognizable to those who've used tagging languages like PHP or ASP. For that matter, a generation that's familiar with DOMs and XML parsers brings more than a blank slate to the tasks of tree-walking and rewriting.

So The Dragon Book sadly exemplifies the barriers-to-entry in the world of language design and compiler construction. This is a great book for the 500 teams worldwide building C compilers (yeah, seriously, there are hundreds of C compilers). For better-or-worse, it's undoubtedly the textbook of choice for many CS curricula. It is not a good text for those interested in today's active debates and discussions about language design and implementation.