mersenneforum.org Mlucas v19.1 (latest) available
 Register FAQ Search Today's Posts Mark Forums Read

 2021-02-17, 23:02 #12 ewmayer ∂2ω=0     Sep 2002 República de California 23×1,453 Posts @Lorenzo: Thanks for all the timings, that is very useful. I will add a note recommending '-cpu 0:7' for M1 users to the README. Might you have a wall-plug wattmeter you can use to compare (under-load - idle) wattages for those 2 systems, for whatever FFT lengths they are using to run their current GIMPS assignments? I'd be curious to get some idea regarding relative performance-per-watt. In any event, happy crunching!
2021-02-18, 09:36   #13
Lorenzo

Aug 2010
Republic of Belarus

101100102 Posts

Quote:
 Originally Posted by ewmayer @Lorenzo: Thanks for all the timings, that is very useful. I will add a note recommending '-cpu 0:7' for M1 users to the README. Might you have a wall-plug wattmeter you can use to compare (under-load - idle) wattages for those 2 systems, for whatever FFT lengths they are using to run their current GIMPS assignments? I'd be curious to get some idea regarding relative performance-per-watt. In any event, happy crunching!

Sorry, but I don't have the wall-plug wattmeter.

2021-02-18, 14:32   #14
ldesnogu

Jan 2008
France

54310 Posts

Quote:
 Originally Posted by Lorenzo I just want to share my experience with Apple M1 CPU.
Thanks for the results!

What machine is that? Mini, MBA or MBP? I'd expect MBA to throttle given the noise my MBP does when running 4 threads

2021-02-18, 14:46   #15
Lorenzo

Aug 2010
Republic of Belarus

2·89 Posts

Quote:
 Originally Posted by ldesnogu Thanks for the results! What machine is that? Mini, MBA or MBP? I'd expect MBA to throttle given the noise my MBP does when running 4 threads
Hi. This is an Apple Mac mini M1.

2021-02-18, 21:08   #16
ewmayer
2ω=0

Sep 2002
República de California

101101011010002 Posts

Quote:
 Originally Posted by Lorenzo Hi. This is an Apple Mac mini M1.
Some pics via Amazon.com here. The pic of the rear side shows an exhaust vent similar to those on my Intel NUCs - Lorenzo, are there intake vents on the bottom?

Had a closer look at some of your -cpu 0:7 timings ... the only obvious anomaly is at the very end, 26624K FFT, the timing for that is anomalously large. This is mainly for down-the-road as this FFT is way beyond the GIMPS wavefront, but looking at the pattern of best-timing FFT radices for the rows above it, this machine seems to really like larger leading FFT radices (call the leftmost radix r0) and combos of the form r0,16,32,32 and r0,32,32,32. At this 26M FFT length, there is no such available combo because I did not (yet) implement a radix-416 FFT-pass routine, thus instead of 416,32,32,32 the best we can do is 208,16,16,16,16, which means an extra pass through the data each iteration.

If you wold be so kind, could you pause any running jobs (I believe 'kill -STOP [pid]' works on MacOS same as Linux, then 'kill -CONT [pid]' to resume, and either 'pidof' or 'top' will give you the process ID), and re-run just the 26M-FFT timing? Here is how:

./Mlucas -iters 1000 -cpu 0:7 -fftlen 26624 >& test.log

After that completes, paste the new last-line that got appended to mlucas.cfg as a result, and please attach the test.log . Thanks.

Last fiddled with by ewmayer on 2021-02-18 at 21:09

2021-02-18, 22:12   #17
Lorenzo

Aug 2010
Republic of Belarus

2·89 Posts

Quote:
 Originally Posted by ewmayer ... the only obvious anomaly is at the very end, 26624K FFT, the timing for that is anomalously large.
Actually when I posted results for i-8100 I did cut the line with timings for 26624. I thought the same, that it was some heavy load from background application when I did the benchmark.
So I just tried to make redoing on i-8100 (OS Oracle Linux 7) and I see the same: big jump from ~69 to ~141 msec exactly for 26624.
So I think it's not a platform specific issue.
Code:
     18432  msec/iter =   53.16  ROE[avg,max] = [0.236424995, 0.281250000]  radices = 288 32 32 32  0  0  0  0  0  0
20480  msec/iter =   62.92  ROE[avg,max] = [0.237479031, 0.312500000]  radices = 320 32 32 32  0  0  0  0  0  0
22528  msec/iter =   66.03  ROE[avg,max] = [0.228240432, 0.312500000]  radices = 352 32 32 32  0  0  0  0  0  0
24576  msec/iter =   69.49  ROE[avg,max] = [0.261424145, 0.343750000]  radices = 768 16 32 32  0  0  0  0  0  0
26624  msec/iter =  144.86  ROE[avg,max] = [0.272725339, 0.343750000]  radices =  52 16 16 32 32  0  0  0  0  0
26624  msec/iter =  141.33  ROE[avg,max] = [0.272368315, 0.375000000]  radices =  52 16 16 32 32  0  0  0  0  0
24576  msec/iter =   68.38  ROE[avg,max] = [0.261777142, 0.359375000]  radices = 768 16 32 32  0  0  0  0  0  0
26624  msec/iter =  141.06  ROE[avg,max] = [0.272368315, 0.375000000]  radices =  52 16 16 32 32  0  0  0  0  0
Attached Files
 test.log (3.1 KB, 18 views)

 2021-02-19, 07:45 #18 Lorenzo     Aug 2010 Republic of Belarus 2·89 Posts Full test for large fft on i3-8100: Code:  8192 msec/iter = 19.61 ROE[avg,max] = [0.272732764, 0.375000000] radices = 256 32 32 16 0 0 0 0 0 0 9216 msec/iter = 23.07 ROE[avg,max] = [0.239072536, 0.312500000] radices = 288 16 32 32 0 0 0 0 0 0 10240 msec/iter = 27.33 ROE[avg,max] = [0.271287049, 0.375000000] radices = 320 32 32 16 0 0 0 0 0 0 11264 msec/iter = 28.73 ROE[avg,max] = [0.271818621, 0.375000000] radices = 352 32 32 16 0 0 0 0 0 0 12288 msec/iter = 32.18 ROE[avg,max] = [0.259570478, 0.312500000] radices = 768 16 16 32 0 0 0 0 0 0 13312 msec/iter = 36.87 ROE[avg,max] = [0.254703482, 0.312500000] radices = 208 32 32 32 0 0 0 0 0 0 14336 msec/iter = 39.92 ROE[avg,max] = [0.234003331, 0.296875000] radices = 224 32 32 32 0 0 0 0 0 0 15360 msec/iter = 42.65 ROE[avg,max] = [0.245504855, 0.312500000] radices = 960 16 16 32 0 0 0 0 0 0 16384 msec/iter = 44.85 ROE[avg,max] = [0.272600878, 0.375000000] radices = 256 32 32 32 0 0 0 0 0 0 18432 msec/iter = 52.67 ROE[avg,max] = [0.236424995, 0.281250000] radices = 288 32 32 32 0 0 0 0 0 0 20480 msec/iter = 61.48 ROE[avg,max] = [0.237479031, 0.312500000] radices = 320 32 32 32 0 0 0 0 0 0 22528 msec/iter = 65.70 ROE[avg,max] = [0.228240432, 0.312500000] radices = 352 32 32 32 0 0 0 0 0 0 24576 msec/iter = 68.40 ROE[avg,max] = [0.261424145, 0.343750000] radices = 768 16 32 32 0 0 0 0 0 0 26624 msec/iter = 141.14 ROE[avg,max] = [0.272725339, 0.343750000] radices = 52 16 16 32 32 0 0 0 0 0 28672 msec/iter = 106.92 ROE[avg,max] = [0.252042892, 0.312500000] radices = 224 16 16 16 16 0 0 0 0 0 30720 msec/iter = 114.56 ROE[avg,max] = [0.288327813, 0.375000000] radices = 240 16 16 16 16 0 0 0 0 0 32768 msec/iter = 101.20 ROE[avg,max] = [0.238132941, 0.312500000] radices = 1024 16 32 32 0 0 0 0 0 0 36864 msec/iter = 137.73 ROE[avg,max] = [0.265349020, 0.312500000] radices = 288 16 16 16 16 0 0 0 0 0 40960 msec/iter = 161.66 ROE[avg,max] = [0.251543120, 0.312500000] radices = 320 16 16 16 16 0 0 0 0 0 45056 msec/iter = 170.85 ROE[avg,max] = [0.244248223, 0.312500000] radices = 352 16 16 16 16 0 0 0 0 0 49152 msec/iter = 153.04 ROE[avg,max] = [0.255821747, 0.343750000] radices = 768 32 32 32 0 0 0 0 0 0 53248 msec/iter = 293.04 ROE[avg,max] = [0.262757669, 0.312500000] radices = 52 16 32 32 32 0 0 0 0 0 57344 msec/iter = 270.81 ROE[avg,max] = [0.265370288, 0.375000000] radices = 224 16 16 16 32 0 0 0 0 0 61440 msec/iter = 204.05 ROE[avg,max] = [0.246525841, 0.343750000] radices = 960 32 32 32 0 0 0 0 0 0
2021-02-19, 09:20   #19
LaurV
Romulan Interpreter

Jun 2011
Thailand

5×1,871 Posts

Quote:
 Originally Posted by Lorenzo Sorry, but I don't have the wall-plug wattmeter.
Whaaaaattttt?
You must buy one, try Aliexpress, here, you can even smart-measure some parameters of your "wife" with it! (whatever that means )

(photo for posterity, in case they change it; to be clear, this is a joke, I do not promote nor endorse that product, but "one click operation for wife" I would buy any time!).

Last fiddled with by LaurV on 2021-02-19 at 09:22

2021-02-19, 19:53   #20
ewmayer
2ω=0

Sep 2002
República de California

23·1,453 Posts

Quote:
 Originally Posted by Lorenzo Full test for large fft on i3-8100: [snip]
Thanks - that is very helpful as far as future roadmapping goes - so for selected of the FFT lengths:
o 26M: r0 = 208 needs to be made more accurate (rejected in your test.log due to excess ROE), also need r0 = 416;
o 28,30M: Need r0 = 448,480;
o 36,40,44,52,56M: Need r0 = 576,640,704,832,896.

 Similar Threads Thread Thread Starter Forum Replies Last Post ewmayer Mlucas 89 2021-02-01 20:37 ewmayer Mlucas 48 2019-11-28 02:53 Damian Mlucas 17 2017-11-13 18:12 ewmayer Mlucas 3 2017-06-17 11:18 delta_t Mlucas 14 2007-10-04 05:45

All times are UTC. The time now is 11:04.

Sat Apr 10 11:04:54 UTC 2021 up 2 days, 5:45, 1 user, load averages: 1.30, 1.33, 1.38