Implementing Dataflow with Threads

Here's a scholarly, but clearly written, article on a generalized algorithm for implementing dataflow using shared memory. Dataflow is a very intuitive calculation model (think "spreadsheet"). When I saw this paper I thought "Wasn't it known that dataflow could be automatically parallelized?" but maybe not. One way or the other, it is now. The authors even show how their algorithm can be tweaked to improve cache coherency. Nice.