![]() |
![]() |
#166 |
Banned
"Luigi"
Aug 2002
Team Italia
29×167 Posts |
![]()
Any hints about the addition of the prp option for small Mersenne exponents?
It would be optimal for our PIS ๐ |
![]() |
![]() |
![]() |
#167 |
∂2ω=0
Sep 2002
Repรบblica de California
22×5×587 Posts |
![]()
@Luigi:
Work continues on PRP support ... but slowly. I am moving at end of August, so Mlucas work necessarily has been taking a back seat to all the work involved with that. |
![]() |
![]() |
![]() |
#168 | |
Banned
"Luigi"
Aug 2002
Team Italia
29·167 Posts |
![]() Quote:
![]() Hope everything goes smooth. |
|
![]() |
![]() |
![]() |
#169 |
∂2ω=0
Sep 2002
Repรบblica de California
22×5×587 Posts |
![]() |
![]() |
![]() |
![]() |
#170 | |
"Composite as Heck"
Oct 2017
90110 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#171 |
"Sam Laur"
Dec 2018
Turku, Finland
317 Posts |
![]()
I doubt that 10x factor somewhat. On the Odroid forum someone ran a couple test runs on Mlucas and a test version of the cancelled Odroid N1. 2560K fft and the timing was 87.7 msec/iter running on the two "big" A72 cores. As a comparison my Ryzen 3 2200G is doing 5.07 ms/iter at 2560K when running mprime, 4 cores 1 worker (throughput is slightly higher for 4 workers) and aren't Intel chips supposed to be much faster than that?
Would someone please make more recent benchmarks available - the benchmark tables on both mersenne.org and mersenne.ca are a bit old now. But anyway, I ordered a Rock960 board. It hasn't arrived yet, but it's also got that RK3399 chip on it. It's more expensive than those other boards, but it's just one board for testing and general fooling around, thermals and power consumption etc. There are boards with e.g. Kirin 970 (4x A73, 4x A53) but they are really quite expensive now. Maybe after a while they'll get cheaper too. Or maybe something else will. |
![]() |
![]() |
![]() |
#172 |
∂2ω=0
Sep 2002
Repรบblica de California
22×5×587 Posts |
![]()
Re. the Odriod-N1 tests, I had a beta tester of one of those do an Mlucas build and timings on that - long story short, we got best total throughput running one 2-thread job on the dual A72 core and one 4-thread job on the quad A53, and said total throughput was ~2.2x what I get running on my quad-A53-core Odroid C2.
No idea whether the N2 will be appreciably better FLOPS-wise than the N1, and no word (AFAIK) on when the much-delayed N2 will finally be available, hopefully early in 2019. Looking down the road to 2020, the best hope for an x86-competitive ARM implementation may be in Apple's PC roadmap. More frustrating waiting! |
![]() |
![]() |
![]() |
#173 | |
"Composite as Heck"
Oct 2017
17·53 Posts |
![]() Quote:
Taking your 87.7 ms/it for 2xA72 and my 160 msec/it for 2560K 4xA53 from the previous page we get ~17.65 it/sec, effectively ~56.65ms/it for an RK3399. Ewmayer's post on the Odroid forum suggests there may be a 10% slowdown running both clusters simultaneously. So ~11.2 RK3399 may be equivalent to a 2200G (~12.4 if 10% slowdown is present). From the mersenne.ca bench of an i3 8100 of 3.67ms/it for 2560K we can estimate that an 8100 roughly translates to ~15.4 RK3399 (~17.2 if 10% slowdown is present). That's a lot of compounded estimates so a big pinch of salt is required. Oh dear. |
|
![]() |
![]() |
![]() |
#174 |
"Sam Laur"
Dec 2018
Turku, Finland
317 Posts |
![]()
Just for completeness' sake, here are the full iteration timings for the Raspberry Pi 3B+ I'm running. Taking care of thermals gave a significant boost over no heatsink, but I got a further 3% by disconnecting the display and stopping X. Just having it run in the background doing "nothing" still apparently consumes a minor portion of CPU time.
Gentoo 64-bit "sakaki", probably version 1.21 of the package since I originally downloaded and installed it in August for other purposes. uname -a says Linux pi64 4.14.44-v8-4fca48b7612d-bis+ #2 SMP PREEMPT Fri Jun 1 15:55:22 BST 2018 aarch64 GNU/Linux Stock 1.4 GHz, and four Cortex-A53 cores on the BCM2837B0. No overclocking attempts have been made (or will be made). Running Mlucas_c2simd 17.1 (downloaded precompiled binary) Code:
17.1 1024 msec/iter = 47.12 ROE[avg,max] = [0.255116697, 0.375000000] radices = 256 8 16 16 0 0 0 0 0 0 1152 msec/iter = 54.31 ROE[avg,max] = [0.223389521, 0.281250000] radices = 144 16 16 16 0 0 0 0 0 0 1280 msec/iter = 58.89 ROE[avg,max] = [0.264171231, 0.375000000] radices = 160 16 16 16 0 0 0 0 0 0 1408 msec/iter = 68.59 ROE[avg,max] = [0.228616585, 0.312500000] radices = 176 16 16 16 0 0 0 0 0 0 1536 msec/iter = 75.89 ROE[avg,max] = [0.252626651, 0.343750000] radices = 192 16 16 16 0 0 0 0 0 0 1664 msec/iter = 83.16 ROE[avg,max] = [0.272233409, 0.406250000] radices = 208 16 16 16 0 0 0 0 0 0 1792 msec/iter = 90.47 ROE[avg,max] = [0.222731285, 0.312500000] radices = 224 16 16 16 0 0 0 0 0 0 1920 msec/iter = 98.88 ROE[avg,max] = [0.255165462, 0.375000000] radices = 240 16 16 16 0 0 0 0 0 0 2048 msec/iter = 105.43 ROE[avg,max] = [0.238688298, 0.312500000] radices = 256 16 16 16 0 0 0 0 0 0 2304 msec/iter = 123.05 ROE[avg,max] = [0.249503539, 0.312500000] radices = 288 16 16 16 0 0 0 0 0 0 2560 msec/iter = 143.92 ROE[avg,max] = [0.233106476, 0.312500000] radices = 160 32 16 16 0 0 0 0 0 0 2816 msec/iter = 165.29 ROE[avg,max] = [0.260105912, 0.375000000] radices = 176 32 16 16 0 0 0 0 0 0 3072 msec/iter = 181.54 ROE[avg,max] = [0.261096695, 0.375000000] radices = 192 32 16 16 0 0 0 0 0 0 3328 msec/iter = 198.91 ROE[avg,max] = [0.282578930, 0.375000000] radices = 208 32 16 16 0 0 0 0 0 0 3584 msec/iter = 215.88 ROE[avg,max] = [0.251145062, 0.375000000] radices = 224 32 16 16 0 0 0 0 0 0 3840 msec/iter = 235.33 ROE[avg,max] = [0.246073929, 0.343750000] radices = 240 32 16 16 0 0 0 0 0 0 4096 msec/iter = 253.36 ROE[avg,max] = [0.226999763, 0.281250000] radices = 256 32 16 16 0 0 0 0 0 0 4608 msec/iter = 294.28 ROE[avg,max] = [0.249245933, 0.375000000] radices = 288 32 16 16 0 0 0 0 0 0 5120 msec/iter = 330.03 ROE[avg,max] = [0.236507015, 0.312500000] radices = 160 32 32 16 0 0 0 0 0 0 5632 msec/iter = 376.08 ROE[avg,max] = [0.259536082, 0.343750000] radices = 176 32 32 16 0 0 0 0 0 0 6144 msec/iter = 417.59 ROE[avg,max] = [0.245978727, 0.343750000] radices = 192 32 32 16 0 0 0 0 0 0 6656 msec/iter = 456.34 ROE[avg,max] = [0.266108247, 0.375000000] radices = 208 32 32 16 0 0 0 0 0 0 7168 msec/iter = 498.72 ROE[avg,max] = [0.225733680, 0.312500000] radices = 224 32 32 16 0 0 0 0 0 0 7680 msec/iter = 545.44 ROE[avg,max] = [0.236645483, 0.312500000] radices = 240 32 32 16 0 0 0 0 0 0 |
![]() |
![]() |
![]() |
#175 |
Jan 2019
43 Posts |
![]() |
![]() |
![]() |
![]() |
#176 | |
∂2ω=0
Sep 2002
Repรบblica de California
101101110111002 Posts |
![]() Quote:
On the other hand, were to do a build of Mlucas and post comparative timings vs your app on your Android hardware, that could be useful. |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Economic prospects for solar photovoltaic power | cheesehead | Science & Technology | 137 | 2018-06-26 15:46 |
Which SIMD flag to use for Raspberry Pi | BrainStone | Mlucas | 14 | 2017-11-19 00:59 |
compiler/assembler optimizations possible? | ixfd64 | Software | 7 | 2011-02-25 20:05 |
Running 32-bit builds on a Win7 system | ewmayer | Programming | 34 | 2010-10-18 22:36 |
SIMD string->int | fivemack | Software | 7 | 2009-03-23 18:15 |