Thread Creation Overhead Can Trip Up Pros

Michael Seuss has a good blog piece on \<a href=""" target="_blank" rel="noopener noreferrer">parallelizing code that contains loop-carried dependenciess, which is to say, code such as the following, where the calculation in one pass is dependent on a previous pass' calculation. The moral of the story, though, is that even when run to the point where the doubles start to overflow (i.e., at the upper limits of the code's capability), the overhead of creating threads turns out to be an order of magnitude slower than the non-parallel version! (And this in code submitted by boffins on the OpenMP mailing list.)

This is a great example of why neither of the simplistic approaches to parallelization ("everything's a future" or "let the programmer decide") will ultimately prevail and how something akin to run-time optimization (a la HotSpot) will have to be used.



  1. const double up = 1.1 ;

  2. double Sn=1000.0;

  3. double opt[N+1];

  4. int n;

  5. for (n=0; n\<=N; ++n) {

  6. opt[n] = Sn;

  7. Sn *= up;

  8. }