View Single Post
Old 2020-11-15, 18:53   #3
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

29·173 Posts
Default

Quote:
Timings for 2048K FFT length (6 cores, 6 workers): 18.79, 19.54, 18.55, 18.68, 18.50, 18.08 ms. Throughput: 321.19 iter/sec.
Six workers, six corresponding average times per iteration at the stated fft length. The corresponding iterations/sec for each worker are 1000ms/sec / (average iteration time in ms). The total throughput is the sum of those six figures. Generalize from six workers to N.
1000/18.79 = 53.22 iter/sec
1000/19.54 = 51.18
1000/18.55 = 53.91
1000/18.68 = 53.53
1000/18.50 = 54.05
1000/18.08 = 55.31
Sum of six = 321.20 iter / sec (0.01 /sec difference is probably due to 2-digit roundoff)

Work required for an iteration is roughly exponent * log (exponent) * log( log(exponent)) and fft length is a nearly linear function of exponent, while the processor's rate of work is fairly constant. See for example the last two attachments of https://www.mersenneforum.org/showpo...19&postcount=5, right columns; constant within +-20% over 2M-64M fft length. (Numerous processor types have been exhaustively benchmarked and posted in that thread.) Large multiprecision multiplication is so for some rather fundamental reasons; see Donald Knuth, Seminumerical Algorithms or https://www.mersenneforum.org/showpo...21&postcount=7
If this still doesn't make sense that iteration time is dependent on fft length or exponent, time yourself for each of squaring a one-digit decimal number; a 4 digit, and a 10-digit.

What is most efficient on a given system depends on system and processor details and fft length. The optimal number of workers can change versus exponent or fft length. Dual-Xeon systems do MUCH better with 2 workers or more than with one; single-worker throughput on the Knights Landing I'm benchmarking now is positively dreadful with one worker (less than 10% of maximum in some fft lengths).
Hyperthreading usually is not an advantage in fft-based multiplication, but in some cases provides an advantage.
Benchmarking them is the right thing to do.

Welcome to the forum. And the learning curve.

Last fiddled with by kriesel on 2020-11-15 at 19:54
kriesel is offline   Reply With Quote