Raymond Chen Says Backward Compatibility Does Not Affect Windows Performance

For weeks, I've been chewing over \<a href="http://blogs.msdn.com/oldnewthing/archive/2007/07/23/4003873.aspx"" target="_blank" rel="noopener noreferrer">this post by Raymond Chen, in which he says:

[T]he real cost of compatibility is not in the hacks. The hacks are small potatoes. Most hacks are just a few lines of code (sometimes as few as zero), so the impact on performance is fairly low.

The idea of a non-backward compatible version of Windows is something that I've mused about (as has Alan Zeichick). I'm not going to pick an argument with Chen, of course, but I wonder if he's not being a little disingenuous. Even a few lines of code in a core routine can have an effect if it affects cache behavior; okay, that's niggling... But still, to say that a non-compatible version wouldn't be much faster but to go on to say:

[T]he real cost of compatibility is in the design.

If you're going to design a feature that enhances the window manager in some way, you have to think about how existing programs are going to react to your feature. These are programs that predate your feature and naturally know nothing about it. Does your feature alter the message order? Does it introduce a new point of re-entrancy? Does it cause a function to begin dispatching messages that previously did not? You may be forced to design your feature differently in order to accommodate these concerns. These issues aren't things you can "take out"; they are inherently part of the feature design.

Well, yeah. But isn't that kind of like saying "the real cost of compatibility is not how fast you can type in the code, it's in the work."?

Surely (well, not surely, but surely "likely") a version of Windows where backwards compatibility was negotiable would have more flexibility for the type of redesign / refactoring which Windows will need for the manycore era? If nothing else, ~~surely~~ intuitively one would think that the very concept of the Windows message-loop (much less message ordering) would become highly problematic when trying to figure out how to exploit many cores.