Originally Posted by LaurV
The common wisdom is that new processors will give you a better output if you only run fewer (1-2) workers, each in more than one thread, such that the sum of all threads are not higher than the number of your physical cores. It didn't use to be like that in the past, but with CPUs getting lots of cores, the limitation became the memory bandwidth - 16 workers would need to exchange data for 16 test. For example, in my system (10 cores, 20 threads, lots of cache memory) the best output I get with 2 workers, each running 5 threads. The hyper-threading is not useful, for most of the work types, it only produces more heat, but not more output.

I see. Memory throughput being the bottleneck sounds quite plausible. I'll run the benchmark, thanks.
