Explicit Typing, Trail Blazing, and Packrat Parsing

(Warning / Reassurance: this is not a technical post on how to use packrat parsing to evaluate type declarations.)

Steve Yegge calls this image a "treatise on non-inferring static type systems...one of the most insightful Computer Science papers ever published":

He then goes on to say "[Different styles are] perfectly OK with me, as long as they understand that it's not some proven silver bullet ? it's just a style thing." which is exactly true. But the funny cartoon has been reproduced here and there by implicit typing advocates. John Lam wonders "if static typing advocates really look at the world this way."

No, we don't. We look at type declarations this way:

Explicit type declarations are primarily for the task of understanding code.

As Jeff Atwood recently pointed out, "if you ask a software developer what they spend their time doing, they'll tell you that they spend most of their time writing code. However, if you actually observe what software developers spend their time doing, you'll find that they spend of their time trying to understand code."

Now, people of good faith can argue about what should be the result of evaluating "1/3" but silent conversion to an integer of value 0 (as happens in Ruby) is something where one can imagine a misunderstanding arising. Explicit typing advocates think that the additional four keystrokes of "int x = 1/3" is a small price to pay (especially because there is always a type intention during assignment).

(Incidentally, I use the terms "explicit" and "implicit" typing, not the misleading-for-this-purpose terms "static" and "dynamic." See this column from 2004.)

Steve Yegge, in his original post, laments that he finds it "pretty farging depressing that a simple declaration" like ArrayList\<int> myListOfInt = new ArrayList\<int>(); "can be haiku-sized in Java." But ignores that what he's declared has capabilities (i.e., constraints) that go beyond the equivalent "myListOfInt = Array.new"

The fair comparison is:

ArrayList myListOfInt = new ArrayList();
myListOfInt = Array.new

As far as the finger-typing burden goes, for a 60WPM (360CPM) typist, it's 40 characters versus 23 -- 1/9 of a second versus 1/16 of a second. Sometimes I get farging depressed from programming, but not over stuff like that.

Again, people of good faith can argue about which is actually easier and less error-prone to understand. I'd certainly agree that an expression like (int)(((float)a/b)*c) is annoying both to write and understand (did I balance my parentheses? Did I do the necessary casts?). But whether typing is implicit or explicit, the closer a program approaches the ideal of one statement per line the easier it is to understand. And, as programs approach that ideal, the relative burden of typing explicitness largely falls away.

Since it's a blog post, I will use an anecdote to "prove" my point: Packrat parsing is the hot new thing in compiler techniques. It is described very clearly in this article, which uses Haskell as the implementation language. Readable as it is, for me to "get" a complex algorithm, I like to implement it myself. So last Sunday, while watching football, I worked through the article's example in Ruby.

Packrat parsing evolves from a top-down predictive (aka recursive-descent) parser. That means parsing an expression like "1+2*3" from left-to-right, suspending judgement on "what you've got" as long as "what's remaining" is ambiguous. (So, for instance, when "what you've got" is "1+" you don't know if "what's remaining" is going to be a number or, as is in this case, the result of a multiplication.) Writing top-down parsers is easy and, sure enough, by the time Bledsoe even tried to hit T.O., I was getting "7" back my Ruby work.

The knock on recursive-descent parsing is that when an ambiguity is resolved, you have to backtrack up "what you've got" until you find an alternative and then you have to reparse the expression from that point forward. (Traditionally, you'll also hear that top-down parsing cannot handle as many grammar forms as bottom-up, but in the realm of programming languages, that's a bagatelle.) This gives naive recursive-descent parsers such as my own poor Big O characteristics (Intuitively, I'd guess O(n log n) -- anyone know?).

(At this point in the anecdote, I will bite my lip from a long tangent that boils down to "With today's hardware resources and modern tastes in compilation-unit size, is poor Big O a sufficient reason to complicate parsing?")

So anyway, if you read the paper, you'll see that the core of Packrat parsing is to change "what's remaining" from a string to a doubly-recursive data structure that represents "the remaining string, plus partial results of all previous attempts at disambiguation." The paper explains it very well, moving along in Haskell.

According to most arguments about implicit and explicit typing, a football-game-length project is exactly the sort of thing in which an implicitly typed language ought to shine over an explicitly-typed one. But I really had a hard time refactoring my Ruby into a packrat parser. This was due, in no small part, because of the difficulty in understanding the types being built-up in the data structure. The paper, in explicitly-typed Haskell: clear as a bell. My code's behavior on unit tests: virtually indistinguishable from random.

They were deterministic, of course (even I'm not clumsy enough to introduce randomness into an arithmetic parser), but anticipating which ones would be red and which green after making a change? Baffling. By the time the Phillies locked up the game, I was doing most of my work on a legal pad -- writing type annotations!

And then I tried to do a "use closures instead of functors" refactoring which exceeded my Ruby skills and, because I hadn't put the project under version control, succeeded only in screwing the pooch.

But by the time the late game came on, I was confident that if I were assigned the task of writing a four-function calculator, I could do it by writing a Packrat Parser -- in C#.

blogroll

social