On "On Intelligence"

On [On Intelligence]{style="FONT-STYLE: italic"} Sunday, May 28, 2006

9:11 AM On the suggestion of John Lam, I bought Jeff Hawkins' book On Intelligence and pretty much read it through in one sitting. That was possible because it's a very accessible book and also because, to a large extent, I was already exposed and sympathetic to the grand themes: minds are what brains do; pattern recognition, association, and prediction is close, if not indistinguishable, from "intelligence;"and the brain's mechanisms for doing such things are based on highly interconnected hierarchies of localized neuronal structures that exhibit "fire together, wire together," reinforcement / learning. (For those interested in such things, it should be noted early that Hawkins punts on the problems of [qualia]{style="FONT-STYLE: italic"}, so as far as I am aware, that debate still ends with Dennett and Searle.)

Hawkins biggest hypothesis, though, is one that I find intriguing but far from self-evident: that a single algorithm that produces "predictive ability [from] a hierarchical memory," is sufficient to achieve intelligence. Hawkins was a founder of Palm and Handspring and I was surprised that [On Intelligence]{style="FONT-STYLE: italic"} is rested on the structure of the neocortex and not on computational structures. Of course, a premise is that the neural structures he talks about [are]{style="FONT-STYLE: italic"} computational, and he presents his theory in terms of a few block diagrams, but the book is several steps away from presenting the source code, as it were. 

This is a little disappointing, because Hawkins' hypothesis could be tested with relatively easy computational experiments. That the human brain has a gazillion neurons and a bazillion interconnections is largely irrelevant [if ]{style="FONT-WEIGHT: bold"}the phenomena that Hawkins claims arise from relatively small collections of neurons. A testbed that maps well into Hawkins theories is the world of "complete information, zero-sum, binary placement gridded games." Tic-Tac-Toe, Reverise/Othello, and Go are all this type of game: they all take place on a grid and involve sequential placement of opposing symbols. Unlike Poker, there's no hidden knowledge or probability issues; winning is binary; and unlike Chess or Checkers, you don't have to understand movement. A "play" is the changing of a position from an indeterminate state to a determinate one and the consequences of that play. Helpfully, Tic-Tac-Toe is close to trivial and Go is computationally more difficult than Chess. Any [model]{style="FONT-WEIGHT: bold"} that could solve Tic-Tac-Toe and simple be scaled to Go would be an incredible triumph.

And although we probably don't have much insight into what Hawkins' theories predict for the form of a Go-solving intelligence, they are testable with Tic-Tac-Toe. It should be straightforward to create a "cortextlike memory system" whose "sensory inputs" are Tic-Tac-Toe placement sequences. So, for instance, a blank grid followed by an X in the upper-left and then an O in the lower-right might be encoded as: "??? ??? ???, X?? ??? ???, X?? ??? ??O". [If ]{style="FONT-WEIGHT: bold"}Hawkins' model is correct, then one would expect certain things to be [emergent ]{style="FONT-WEIGHT: bold"}in the [self-organizing ]{style="FONT-WEIGHT: bold"}higher-levels of his hierarchies. Note that [we are not testing the ability of a connectionist system to play Tic-Tac-Toe]{style="FONT-WEIGHT: bold"} (Tic-Tac-Toe is possible to "solve" with a traditional neural net approach, Go is almost certainly not). Specifically, it follows from Hawkins' theories that in Tic-Tac-Toe, if the first play is to a corner then a [single]{style="FONT-WEIGHT: bold"} high-level component should be responsible for "expecting" play to the opposite corner -- [no matter which corner ]{style="FONT-WEIGHT: bold"}is initially played[  ]{style="mso-spacerun: yes"}-- [even if]{style="FONT-WEIGHT: bold"} the system has never been exposed to play from a particular corner. Optimal play in Tic-Tac-Toe is very straightforward and recognizable, rotational equivalence is trivial compared to the types of sensory interpolation the book asserts are explainable, and yet, it is not self-evident that Hawkins' model will pass such a test.

The book has a companion site at OnIntelligence.org -- I just posted a similar comment to the above paragraph, we'll have to see if anything comes of it.