Bad Comparison: 14 Line Python RegEx evaluator vs. Microsoft's 14K lines

Wesner Moise points to "generalized regular expression matching" as a moderately hard problem that might serve as the basis for comparing programming languages and approaches. He says "Microsoft's implementation of regular expression matching over strings is spread across 24 files and 14,455 lines of code including comments and whitespace." (I'm not sure how he'd know that -- I assume he's talking about Rotor source code.)

He wonders if a functional approach could be more efficient and points to a 14-line Python program.

No, no, no: they are two incredibly different capabilities.

The Python program implements something like the original definition of a regular expression --restricted to that which can be expressed in a single line of Extended Backus-Naur Form without recursion. "Regular expression support" for today's languages means something very, very different, starting with compatibility with Perl5 and going from there. Backslashes, named groups, etc. are complex features that require, in any language, something in excess of 14 lines.

Having said that, regular expressions are a good fit for functional approaches. But, just to point out the lack of silver bullets, attempting to parse a left-recursive grammar (a grammar with a production of the form A->AB) will hang a simplistic recursive-descent parser.