Worse Than Crashing? Easy: Mistaking "MAY" for "SHALL" Near Money

Tue 07 August 2007

Jeff Atwood wonders "what's worse than crashing?" and gives a general "application causes data loss and/or corruption." Uh huh. Let's fill between the lines on that:

I once worked on a supply chain system where one of the returns of the supplier's Cancel() function was along the lines of CANCELED_PENALTIES_APPLY. Our cancellation logic ran along these lines:

begin cancellation_transaction

cancellation_results = Cancel()

if cancellation_results == CANCELED_PENALTIES_APPLY

cancel_cancellation == ... other business logic, maybe involve a human in the decision ...

if cancel_cancellation

rollback cancellation_transaction


As with many digitally-mediated marketplaces, once the switch was thrown on this thing, the amount of money flowing through it was substantial. About six weeks after deploying, the s*** hit the fan. Cancellation penalties amounting to tens of thousands of dollars had accrued. We had ass-u-med that because the request had that "Penalties apply" return value that, you know, we'd get it in the appropriate situation (we did during testing). "Oh no, that may or may not be returned. You always have to check the [free-form text] cancel penalties," the supplier told us (during the s***-storm) without the slightest acknowledgement of guilt. ("But they're free-form text," you might note, "How is one supposed to check that?" The answer: there was no 100% way to automate it.)

Dozens of thousands of dollars worth of damage, and if it weren't for dumb luck we wouldn't have caught it until a quarterly audit months later. I didn't participate in that particular coding task, but even if I had I really doubt that I would have flushed out the "MAY" rather than "SHALL" that caused all the trouble. Such SNAFUs are, far more than any mythical belief in purity, why smart people still occasionally spend time trying to "nail down" requirements.

More recently, I worked on a system that had an off-by-1 error when applying taxes. Only in acceptance testing did we flush out a sequence, not terribly uncommon in the real world, that triggered the bug. That would have been really nasty, precisely because the pain would have been spread out between clients and heaven knows how that would have been caught and resolved.

Money. It's the root of all evil, you know.