::: {style="font-size: 10pt;"} Faithful readers know that I'm learning Scala, somewhat reluctantly. A few weeks ago, I was reading New Scientist magazine and saw this writeup of Peter Norvig, Director of Research at Google, which mentioned that he was on the review board that studied the crash of the Mars Climate Orbiter. That sparked a question in me, because I'd recently watched Norvig's talk "What To Demand From A Scientific Computing Language" which says "don't spend too much time fretting about language choice," but further goes on to argue that Python is a heckuva good scientific computing language. So I asked him:
The Mars Climate Orbiter disaster would seem to be a case study for a type system with more compile-time strictness. Obviously, an SI unit system is not part of Scala's standard library, but I find it interesting that you have not cast all aside and picked up the banner of Haskell or what-have-you.
Beyond the initial error, the reasons why the error proved fatal were more around organizational structure than around language choice:
(1) An anomaly was detected early on, but was not entered into an official issue-tracking database. Better practices would force all such things to be tracked.
(2) The team was separated between JPL in California and Lockheed-Martin in Colorado, so there were no lunvh-time discussions about "hey, did you get that anomaly straighten out? No? Well, let's look into it more carefully..."
(3) The faulty code was not carefully code-reviewed, because of improper code re-use. On the previous mission, this file was just a log file, not used during flight operations, and so was not subject to careful scrutiny. In MCO, the file and surrounding code was re-used, but then at some point they promoted it to be a part of actual navigation, and unfortunately nobody went back and subjected the relevant code to careful review.
(4) Bad onboarding process of new engineers: The faulty code was written by a new engineer -- first week (or maybe first month or so -- on the job. This was deemed ok because originally it was "just a log file", not mission-critical.
:::