Data Volumes Trumping Core Multiplication? Interesting Thought
Bill de h?ra makes an intriguing pitch that programming will be impacted by increasing data volumes more than by the transition to multi-/many-core. His basis is anecdotal -- we don't have the same metaphysical certainty that all of us will be dealing with much-larger datasets as we have the certainty that we will all be dealing with multiple and then many cores -- but is logical. The speed of a single stream of in-cache instructions is blazing: short of chaotic functions, it's hard to imagine perceptibly-slow scenarios that don't involve large amounts of data.
What I find especially thought-provoking about this argument is that it stands in opposition to another post I was going to make about YAGNI infrastructure. Not long ago, Alan Zeichick ranked databases and Ian Griffiths questioned whether he took price-performance into account. Even allowing that there are costs for OSS (training, tools, administration, etc.), I've noticed that few real-world CEOs understand where their companies stand in relationship to scaling. In my experience, they often over-buy software- and hardware- capacity and under-buy contingency capacity.
It seems to me that nowadays we work more and more with data streams and not data sets. On a transaction-to-transaction basis, I think it's an uncommon application that uses more data than can fit into several gigabytes of RAM (obvious exception: multimedia data). Never mind multi-node Map-Reduce; I'm saying that it seems to me that many "real" business systems could have a single-node non-relational data access layer.
It seems that what I'm saying is in direct contrast to what de h?ra is describing, and yet points to the same "maybe we ought not to start from the assumption of a relational DB" heresy. No conclusion... food for thought ...
Reflection: I think I let my attention wander -- the world de h?ra is describing is that high-performance computing and I wandered into general business-computing. The two intersect, of course, but are not generally the same. So the thought then becomes that powerful relational databases are being squeezed from both the low-end ("eh, just put in memory") and the high-end ("ok, so this is our distributed tuple-space...").