I've been writing a series of articles for DevX on concurrent programming. The final installment was supposed to be "Multicore for multimedia." Plan A was to speed up the MAD (MPEG Audio Decoder) processing library using OpenMP. That went well enough except for the fact that the code was so clean that I couldn't get any very impressive wins without committing myself to really changing the design in some fairly substantial ways (like concentrating the processing phases of a single audio channel to a single core). So Plan B was to write a video filter. Problem with that is that Adobe Premiere Pro already uses multiple threads to perform the callbacks to the video filter, distributing one frame to one core, the next frame to the next. That's pretty much optimal. So when I added threads, my performance actually decreased.
So now I've got to figure out a Plan C. Oh, and I recommend MAD and Adobe Premiere Pro: they're well engineered pieces of code.