Thoughts on Hawkins’ “A Thousand Brains”

Jeffrey Hawkins is the CEO of Numenta, a company that has been pursuing machine intelligence since the early 2000s. Prior to that he was the founder of Palm Pilot, the most successful Personal Digital Assistant, the sale of which presumably has funded Numenta this part quarter-century.

Hawkins’ approach is iconoclastic. He has no interest in the “biologically inspired” artificial neural networks (ANNs) that the rest of the industry had largely abandoned by the early 2000s and returned to in the teens. Hawkins does not want to be “inspired” by the brain’s components, he wants to understand the brain’s processes and, presumably, recreate them.

Hawkins’ focus is the neocortex, the wrinkly folded surface of the brain, which is only about the size of a dinner napkin and a mere 2.5mm thick. I know nothing about brain structure, so I’ll just take every factual thing offered by Hawkins as true:

Under one square millimeter of neocortex (about 2.5 cubic millimeters), there are roughly one hundred thousand neurons, five hundred million connections between neurons (called synapses), and several kilometers of axons and dendrites…. here are roughly 150,000 cortical columns stacked side by side in a human neocortex.

The wild thing about those numbers is that, by the standards of today’s Machine Learning (ML) models, those are quite manageable sizes! It wouldn’t be trivial to gather enough computers to simulate 150,000 cortical columns, but simulating a few would seem do-able just from a back-of-the-envelope “How much RAM would the computer need?” perspective.1

Hawkins has wrote 2 general audience books on his work. In his 2006 book, On Intelligence, which I wrote about back in the day, he focused on the ability of neocortical columns to recognize patterns. In 2021’s A Thousand Brains, he claims to have cracked the nut of how the neocortex works. The first element is:

  1. The big insight I had was that dendrite spikes are predictions. A dendrite spike occurs when a set of synapses close to each other on a distal dendrite get input at the same time, and it means that the neuron has recognized a pattern of activity in some other neurons. When the pattern of activity is detected, it creates a dendrite spike, which raises the voltage at the cell body, putting the cell into what we call a predictive state. The neuron is then primed to spike.

Which neurons are spiking determine the predictions that neocortical column. Our memory state is stored in the synapses between neurons.

  1. If any input doesn’t match the brain’s prediction—perhaps my spouse fixed the igniter—then my attention is drawn to the area of mis-prediction. This alerts the neocortex that its model of that part of the world needs to be updated.

This is not that different, it seems to me, from the gradient descent used to train ANNs, which modify weights based on the “steepness” of their distance from a desired result. Hawkins introduces the subjective experience of “attention” which is intriguing, but seems to me speculative.2

And then comes the doozy:

  1. Reference frames were the missing ingredient, the key to unraveling the mystery of the neocortex and to understanding intelligence…. The hypothesis I explore in this chapter is that the brain arranges all knowledge using reference frames, and that thinking is a form of moving.

When Hawkins speaks of reference frames, he’s talking about no more and no less than the construct used in 3D programming:

reference frame

Well, maybe a little more, as he allows that “…it is possible that the reference frames that are most useful for certain concepts have more than three dimensions.” This is a big caveat that I’ll return to in a bit.

There are two things that jump out to me about Hawkins’ model:

  1. Generating reference frames is far from an easy problem, and
  2. If thinking is movement, then that should be testable. chess board “Is move e3-c3 a good move?” should take more time to answer than “Is move e3-e8 a good move?” because there’s more movement involved. Right? (Yes, I know there're no kings so it's not a real situation.)

Well, maybe not, if there are extra dimensions involved. If you add a new dimension, you can get from here to there without going through the intermediate locales in the old dimensions. (Refer to any science fiction movie where they try to explain “hyperjumps” and they fold up a piece of paper and stick a pin through it.)

So maybe when you know a lot about chess you don’t use a 2D or 3D reference frame, but one with a bunch of dimensions that allow you to quickly “hyperjump” between thoughts.

But if so, that raises several questions:

  • Is it easy to add dimensions to reference frames? Does it happen all the time or is it rare?
  • How many dimensions do we have for abstract concepts like “democracy,” or “politeness”? Do they have the same number of dimensions?
  • Are conceptual reference frames Euclidean, with orthogonal axes? What defines orthogonality in the reference frame for "democracy"? How are these geometries built?
  • Would it be accurate to say that word embeddings are also using “reference frames”? Are the referents of the high-dimensional reference frames we use for “democracy” and the like more accessible than those of Word2Vec? Are they polysemantic?

And most importantly:

  • When you’re talking about arbitrary-dimensional reference frames, does the claim “thinking is a form of moving” clarify things? Because movement in arbitrary-dimensional space is not very constrained: it is harder to create the types of “you can’t get there from here,” or “to get to there you have to pass through this intermediate area,” things that intuitively crop up when the claim is made about 2D or 3D reference frames.

I’m just a dumb programmer and while I know it’s too much to hope for a general-audience book to provide example code, or even a mathematical equation or two, the lack of any explanatory diagrams beyond a cutaway schematic of the cells in a neocortical column feels unsatisfying. The chasm between “this is definitely how the neocortex is structured,” and “this is how thinking works,” seems unbridged, at least for me.

Outside of his books, there doesn’t seem to be much code-based exploration of Hawkins’ ideas. There’s exactly one Github repository and paper that tries to convert his concepts into code and it seems pretty interesting, although hardly a breakthrough. What goes on inside Numenta seems a mystery: I’ve asked a few times over the past twenty years in various discussion groups to little response. Numenta was in the news last year for accelerating inference in Large Language Models, which seem like the perfect exemplar of the “biologically inspired but not really” approaches that Hawkins disdains.

1: Well, maybe not. It depends on the complexity of the operations done within the cortical columns. For the “biologically inspired” “deep learning” systems we have grown used to, these operations tend to actually be fairly simple and massively parallel. Perhaps that won’t hold true for the types of systems Hawkins envisions.

2:Hawkins’ “attention” is the common-sense meaning of the term, not the “the degree to which another thing should modify this” meaning that is how "attention" is used in modern ML.