I used to be on top of Artificial Intelligence -- I wrote a column for and ultimately went on to be the Editor-in-Chief of AI Expert, the leading trade magazine in the AI field at the time. I've tried to stay, not professionally competent, but familiar with the field. That has been rather difficult because the AI field has largely put aside grand theories and adopted two pragmatic themes: statistical techniques and mixed-approaches.
Statistical techniques rely on large bodies of data that allow you to guess, for instance, that "push comes to"->"shove" not from any understanding of metaphor or causation but because the word "push" followed by "comes" followed by "to" is followed 87.3% of the time by the word "shove". Statistics excel at extracting patterns from large input sets.
Mixed approaches are ones which use different strategies to try to tackle different aspects or stages of a problem. Imagine a blackboard around which people raise their hands, come forward, add or erase a small bit of information, and step back into the crowd. For instance, one (relatively) simple tool might know that "X comes to Y" implies temporal ordering. Another might say that temporal ordering implies escalation. And another might say "A 'Shove' is an escalation of a 'Push'".
The more I read about Watson, the more it seems that while Watson used mixed approaches, what it's mixing are almost all statistical techniques. So while it would undoubtedly be able to answer that "shove" is what "push often comes to..." I think it would do so without any reasoning, or schema, about temporal ordering or escalation.
The problem with statistical techniques is they are not general.
If a child is shown how to win tic-tac-toe by always starting with a 'X' in the upper-left box, and then we asked them if they could always win by starting in another corner, we would be disappointed if they couldn't figure it out. Maybe not at first, but if tic-tac-toe was something they enjoyed, they would eventually recognize the pattern. If they never achieved the recognition, it would be troubling.
Pattern recognition, not pattern extraction, seems to be "how" we work. If pattern extraction were at the core, we wouldn't be troubled by sharks when entering the ocean and we wouldn't spend money on lottery tickets.
So it seems that Watson uses a fundamentally different "how" in its achievement. Yet the capability of rapidly and accurately answering questions (ones that have been intentionally obfuscated!) is clearly epochal. Clearly Watson has a role in medicine (diagnostics), law and regulatory compliance (is there precedent? is this a restricted behavior?), and intelligence (where's the next revolution likely?). The problems of "Big Data" are very much in the mind of the software development community and Watson is a stunning leap forward in combining big data, processing power, and specialized algorithms.