mersenneforum.org ARM builds and SIMD-assembler prospects
 Register FAQ Search Today's Posts Mark Forums Read

 2018-07-20, 13:50 #166 ET_ Banned     "Luigi" Aug 2002 Team Italia 7·691 Posts Any hints about the addition of the prp option for small Mersenne exponents? It would be optimal for our PIS 😀
 2018-07-21, 20:41 #167 ewmayer ∂2ω=0     Sep 2002 República de California 13·29·31 Posts @Luigi: Work continues on PRP support ... but slowly. I am moving at end of August, so Mlucas work necessarily has been taking a back seat to all the work involved with that.
2018-07-22, 11:35   #168
ET_
Banned

"Luigi"
Aug 2002
Team Italia

7×691 Posts

Quote:
 Originally Posted by ewmayer @Luigi: Work continues on PRP support ... but slowly. I am moving at end of August, so Mlucas work necessarily has been taking a back seat to all the work involved with that.
I didn't remember that, nor willing to look snappy, it wasn't my intention.

Hope everything goes smooth.

2018-07-24, 20:28   #169
ewmayer
2ω=0

Sep 2002
República de California

13·29·31 Posts

Quote:
 Originally Posted by ET_ I didn't remember that, nor willing to look snappy, it wasn't my intention.
You didn't remember that because I never previously mentioned it around here. :)

Thanks for the smoothness Glückwunsch!

2018-12-29, 18:35   #170
M344587487

"Composite as Heck"
Oct 2017

36816 Posts

Quote:
 Originally Posted by ewmayer Glad you got at least a nice chunk of the "missing FLOPS" back. I'm looking forward to buying an Odroid N1 once they go on sale, hopefully within a month - one of the beta-testers ran Mlucas benchmarks, and using all 6 cores (one 2-threaded job running on the 'big' 2-core a72 cpu, another 4-theaded one on the 'little' 4-core a53 cpu) he gets 2.2-2.3 the total throughput of an Odroid C2, which means ~3.5x the total throughput of a Pi3. N1 pricing is estimated at ~$110, i.e. about the same$/FLOP as the C2. We can ony hope this sparks a full-blown 'multi-socket war' amongst the various ARM-micro-PC manufacturers. :) Even for the N1 one still needs ~10 of them to match the LL-test throughput of a cutting-edge Intel quad, but things are getting close to the "interestingness" level as far as wide-scale adoption goes.
If ~10 RK3399 equals a quad core intel is true for all RK3399 boards then the "interestingness" level has improved due to prices (but it's still in x86's favour).

10x Neo4 + 10x SD cards = ~£400, PSUs probably ~£30, no need for a switch so total ~£430. Not too shabby, is there a stronger SoC than 2xA72+4xA53 that makes more bang for buck sense?

 2018-12-29, 19:48 #171 nomead     "Sam Laur" Dec 2018 Turku, Finland 317 Posts I doubt that 10x factor somewhat. On the Odroid forum someone ran a couple test runs on Mlucas and a test version of the cancelled Odroid N1. 2560K fft and the timing was 87.7 msec/iter running on the two "big" A72 cores. As a comparison my Ryzen 3 2200G is doing 5.07 ms/iter at 2560K when running mprime, 4 cores 1 worker (throughput is slightly higher for 4 workers) and aren't Intel chips supposed to be much faster than that? Would someone please make more recent benchmarks available - the benchmark tables on both mersenne.org and mersenne.ca are a bit old now. But anyway, I ordered a Rock960 board. It hasn't arrived yet, but it's also got that RK3399 chip on it. It's more expensive than those other boards, but it's just one board for testing and general fooling around, thermals and power consumption etc. There are boards with e.g. Kirin 970 (4x A73, 4x A53) but they are really quite expensive now. Maybe after a while they'll get cheaper too. Or maybe something else will.
 2018-12-29, 22:24 #172 ewmayer ∂2ω=0     Sep 2002 República de California 13×29×31 Posts Re. the Odriod-N1 tests, I had a beta tester of one of those do an Mlucas build and timings on that - long story short, we got best total throughput running one 2-thread job on the dual A72 core and one 4-thread job on the quad A53, and said total throughput was ~2.2x what I get running on my quad-A53-core Odroid C2. No idea whether the N2 will be appreciably better FLOPS-wise than the N1, and no word (AFAIK) on when the much-delayed N2 will finally be available, hopefully early in 2019. Looking down the road to 2020, the best hope for an x86-competitive ARM implementation may be in Apple's PC roadmap. More frustrating waiting!
2018-12-29, 23:18   #173
M344587487

"Composite as Heck"
Oct 2017

36816 Posts

Quote:
 Originally Posted by nomead I doubt that 10x factor somewhat. On the Odroid forum someone ran a couple test runs on Mlucas and a test version of the cancelled Odroid N1. 2560K fft and the timing was 87.7 msec/iter running on the two "big" A72 cores. As a comparison my Ryzen 3 2200G is doing 5.07 ms/iter at 2560K when running mprime, 4 cores 1 worker (throughput is slightly higher for 4 workers) and aren't Intel chips supposed to be much faster than that? Would someone please make more recent benchmarks available - the benchmark tables on both mersenne.org and mersenne.ca are a bit old now. ...

Taking your 87.7 ms/it for 2xA72 and my 160 msec/it for 2560K 4xA53 from the previous page we get ~17.65 it/sec, effectively ~56.65ms/it for an RK3399. Ewmayer's post on the Odroid forum suggests there may be a 10% slowdown running both clusters simultaneously. So ~11.2 RK3399 may be equivalent to a 2200G (~12.4 if 10% slowdown is present). From the mersenne.ca bench of an i3 8100 of 3.67ms/it for 2560K we can estimate that an 8100 roughly translates to ~15.4 RK3399 (~17.2 if 10% slowdown is present). That's a lot of compounded estimates so a big pinch of salt is required.

Quote:
 Originally Posted by ewmayer ... Looking down the road to 2020, the best hope for an x86-competitive ARM implementation may be in Apple's PC roadmap. More frustrating waiting!
Oh dear.

 2019-01-13, 20:52 #175 thorken   Jan 2019 43 Posts take a look at this post
2019-01-14, 20:17   #176
ewmayer
2ω=0

Sep 2002
República de California

266478 Posts

Quote:
If you're not willing to post source, you're going to continue being greeted with skepticism. All the major current clients used by GIMPSers allow users to inspect the source and to build directly from it, should they so desire.

On the other hand, were to do a build of Mlucas and post comparative timings vs your app on your Android hardware, that could be useful.

 Similar Threads Thread Thread Starter Forum Replies Last Post cheesehead Science & Technology 137 2018-06-26 15:46 BrainStone Mlucas 14 2017-11-19 00:59 ixfd64 Software 7 2011-02-25 20:05 ewmayer Programming 34 2010-10-18 22:36 fivemack Software 7 2009-03-23 18:15

All times are UTC. The time now is 00:40.

Tue Jan 18 00:40:45 UTC 2022 up 178 days, 19:09, 0 users, load averages: 1.17, 1.14, 1.12