"Benchmarking inside a Virtual PC is wrong at every possible level" : Really?

Is this true? I was under the impression that virtualization technology had gotten to the point where this kind of work was fine. If not, I'll try things out on real hardware.

Update: Trying out the n-queens problem (which I think is probably a pretty close match) with n = 14 I get these results


Native VMWare VPC 29.187 31.656 30.183 28.984 31.375 30.023 29.000 31.344 30.083 29.046 31.406 30.153 29.062 31.156 30.073


It seems to me that the virtual machines do have a measurable overhead. On the other hand, the standard deviation is less than 1/2 of 1 percent in all cases (.2% for Native and VPC, .5% for VMWare). That makes me think that virtual machines are probably okay for this kind of benchmarking (because, despite the overhead, they seem to have stable performance). Remember, one of the rules of the contest is that there's no file io and these are console apps.

On the other other hand, the problem with my hexodoku benchmark (like most short benchmarks) is that program loading time is likely to be a very significant portion of solving order 4 Sudoku ("Hexoduko"). It would certainly make sense that virtualized overhead is greater with file access / loading and so that weighs against using a VPC.