mersenneforum.org  

Go Back   mersenneforum.org > New To GIMPS? Start Here! > Homework Help

Reply
 
Thread Tools
Old 2020-11-15, 17:12   #1
diocupro
 
Nov 2020

2 Posts
Cry Interpretation of results

Hi,

I had run a benchmark test and I have problems interpreting it because I have no idea what the different values say.

This is one row of my output:
Quote:
Timings for 2048K FFT length (6 cores, 1 worker): 2.42 ms. Throughput: 413.85 iter/sec.
Timings for 2048K FFT length (6 cores, 6 workers): 18.79, 19.54, 18.55, 18.68, 18.50, 18.08 ms. Throughput: 321.19 iter/sec.
Timings for 2048K FFT length (6 cores hyperthreaded, 1 worker): 3.07 ms. Throughput: 325.58 iter/sec.
Timings for 2048K FFT length (6 cores hyperthreaded, 6 workers): 32.95, 19.79, 20.43, 17.03, 23.78, 17.67 ms. Throughput: 287.22 iter/sec.
Why does the throughput decrease as the FFT length increases?
What do the milliseconds mean?
diocupro is offline   Reply With Quote
Old 2020-11-15, 18:17   #2
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

2·4,441 Posts
Default

The bigger the FFT size, the more work that the processor needs to do per iteration.

The milliseconds are the time per each iteration. The smaller this value, the faster the testing.
Uncwilly is online now   Reply With Quote
Old 2020-11-15, 18:53   #3
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

17×277 Posts
Default

Quote:
Timings for 2048K FFT length (6 cores, 6 workers): 18.79, 19.54, 18.55, 18.68, 18.50, 18.08 ms. Throughput: 321.19 iter/sec.
Six workers, six corresponding average times per iteration at the stated fft length. The corresponding iterations/sec for each worker are 1000ms/sec / (average iteration time in ms). The total throughput is the sum of those six figures. Generalize from six workers to N.
1000/18.79 = 53.22 iter/sec
1000/19.54 = 51.18
1000/18.55 = 53.91
1000/18.68 = 53.53
1000/18.50 = 54.05
1000/18.08 = 55.31
Sum of six = 321.20 iter / sec (0.01 /sec difference is probably due to 2-digit roundoff)

Work required for an iteration is roughly exponent * log (exponent) * log( log(exponent)) and fft length is a nearly linear function of exponent, while the processor's rate of work is fairly constant. See for example the last two attachments of https://www.mersenneforum.org/showpo...19&postcount=5, right columns; constant within +-20% over 2M-64M fft length. (Numerous processor types have been exhaustively benchmarked and posted in that thread.) Large multiprecision multiplication is so for some rather fundamental reasons; see Donald Knuth, Seminumerical Algorithms or https://www.mersenneforum.org/showpo...21&postcount=7
If this still doesn't make sense that iteration time is dependent on fft length or exponent, time yourself for each of squaring a one-digit decimal number; a 4 digit, and a 10-digit.

What is most efficient on a given system depends on system and processor details and fft length. The optimal number of workers can change versus exponent or fft length. Dual-Xeon systems do MUCH better with 2 workers or more than with one; single-worker throughput on the Knights Landing I'm benchmarking now is positively dreadful with one worker (less than 10% of maximum in some fft lengths).
Hyperthreading usually is not an advantage in fft-based multiplication, but in some cases provides an advantage.
Benchmarking them is the right thing to do.

Welcome to the forum. And the learning curve.

Last fiddled with by kriesel on 2020-11-15 at 19:54
kriesel is offline   Reply With Quote
Old 2020-11-15, 18:56   #4
Runtime Error
 
Sep 2017
USA

181 Posts
Default

The "best" setting based on your benchmarks is (6 cores, 1 worker) because it has the most throughput measured by iterations per second.

Note 1: Running multiple workers usually won't help because the fast Fourier transform (FFT) size for candidate exponents is too big to fit into the processor's cache. RAM is needed to hold the information, so RAM speed often becomes the limiting factor instead of processor speed. Having multiple workers will only lead to greater RAM-bottlenecks.

Note 2: Hyperthreading basically means the operating system can schedule two tasks for each core to perform because there is usually downtime between working on each task. This does not help for Prime95 since that task will fully utilize the core.
Runtime Error is offline   Reply With Quote
Old 2020-11-15, 19:17   #5
diocupro
 
Nov 2020

102 Posts
Default

Thanks for your explanation.
diocupro is offline   Reply With Quote
Old 2020-11-17, 23:09   #6
ZFR
 
ZFR's Avatar
 
Feb 2008
Bray, Ireland

4B16 Posts
Default

That's some cool homework you got.
ZFR is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
graphical interpretation of f(n)=n²+1 and f(m)=m²+m+1 bhelmes And now for something completely different 0 2019-10-21 16:10
Statistical properties of categories of GIMPS results and interim results kriesel Probability & Probabilistic Number Theory 1 2019-05-22 22:59
result interpretation esakertt PrimeNet 3 2012-11-14 20:03
CPU Results last 24 hrs Unregistered Information & Answers 3 2010-07-26 00:49
Trial factoring benchmark interpretation __HRB__ Information & Answers 3 2009-10-22 21:50

All times are UTC. The time now is 19:05.

Thu Nov 26 19:05:21 UTC 2020 up 77 days, 16:16, 3 users, load averages: 1.13, 1.38, 1.46

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.