mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software > Mlucas

Reply
 
Thread Tools
Old 2018-07-20, 13:50   #166
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

3·5·17·19 Posts
Default

Any hints about the addition of the prp option for small Mersenne exponents?

It would be optimal for our PIS ๐Ÿ˜€
ET_ is offline   Reply With Quote
Old 2018-07-21, 20:41   #167
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Repรบblica de California

24×733 Posts
Default

@Luigi:

Work continues on PRP support ... but slowly. I am moving at end of August, so Mlucas work necessarily has been taking a back seat to all the work involved with that.
ewmayer is offline   Reply With Quote
Old 2018-07-22, 11:35   #168
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

3×5×17×19 Posts
Default

Quote:
Originally Posted by ewmayer View Post
@Luigi:

Work continues on PRP support ... but slowly. I am moving at end of August, so Mlucas work necessarily has been taking a back seat to all the work involved with that.
I didn't remember that, nor willing to look snappy, it wasn't my intention.

Hope everything goes smooth.
ET_ is offline   Reply With Quote
Old 2018-07-24, 20:28   #169
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Repรบblica de California

1172810 Posts
Default

Quote:
Originally Posted by ET_ View Post
I didn't remember that, nor willing to look snappy, it wasn't my intention.
You didn't remember that because I never previously mentioned it around here. :)

Thanks for the smoothness Glรผckwunsch!
ewmayer is offline   Reply With Quote
Old 2018-12-29, 18:35   #170
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

11011111012 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Glad you got at least a nice chunk of the "missing FLOPS" back. I'm looking forward to buying an Odroid N1 once they go on sale, hopefully within a month - one of the beta-testers ran Mlucas benchmarks, and using all 6 cores (one 2-threaded job running on the 'big' 2-core a72 cpu, another 4-theaded one on the 'little' 4-core a53 cpu) he gets 2.2-2.3 the total throughput of an Odroid C2, which means ~3.5x the total throughput of a Pi3. N1 pricing is estimated at ~$110, i.e. about the same $/FLOP as the C2. We can ony hope this sparks a full-blown 'multi-socket war' amongst the various ARM-micro-PC manufacturers. :) Even for the N1 one still needs ~10 of them to match the LL-test throughput of a cutting-edge Intel quad, but things are getting close to the "interestingness" level as far as wide-scale adoption goes.
If ~10 RK3399 equals a quad core intel is true for all RK3399 boards then the "interestingness" level has improved due to prices (but it's still in x86's favour).


10x Neo4 + 10x SD cards = ~ยฃ400, PSUs probably ~ยฃ30, no need for a switch so total ~ยฃ430. Not too shabby, is there a stronger SoC than 2xA72+4xA53 that makes more bang for buck sense?
M344587487 is offline   Reply With Quote
Old 2018-12-29, 19:48   #171
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

317 Posts
Default

I doubt that 10x factor somewhat. On the Odroid forum someone ran a couple test runs on Mlucas and a test version of the cancelled Odroid N1. 2560K fft and the timing was 87.7 msec/iter running on the two "big" A72 cores. As a comparison my Ryzen 3 2200G is doing 5.07 ms/iter at 2560K when running mprime, 4 cores 1 worker (throughput is slightly higher for 4 workers) and aren't Intel chips supposed to be much faster than that?

Would someone please make more recent benchmarks available - the benchmark tables on both mersenne.org and mersenne.ca are a bit old now.

But anyway, I ordered a Rock960 board. It hasn't arrived yet, but it's also got that RK3399 chip on it. It's more expensive than those other boards, but it's just one board for testing and general fooling around, thermals and power consumption etc.

There are boards with e.g. Kirin 970 (4x A73, 4x A53) but they are really quite expensive now. Maybe after a while they'll get cheaper too. Or maybe something else will.
nomead is offline   Reply With Quote
Old 2018-12-29, 22:24   #172
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Repรบblica de California

2DD016 Posts
Default

Re. the Odriod-N1 tests, I had a beta tester of one of those do an Mlucas build and timings on that - long story short, we got best total throughput running one 2-thread job on the dual A72 core and one 4-thread job on the quad A53, and said total throughput was ~2.2x what I get running on my quad-A53-core Odroid C2.

No idea whether the N2 will be appreciably better FLOPS-wise than the N1, and no word (AFAIK) on when the much-delayed N2 will finally be available, hopefully early in 2019.

Looking down the road to 2020, the best hope for an x86-competitive ARM implementation may be in Apple's PC roadmap. More frustrating waiting!
ewmayer is offline   Reply With Quote
Old 2018-12-29, 23:18   #173
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

19·47 Posts
Default

Quote:
Originally Posted by nomead View Post
I doubt that 10x factor somewhat. On the Odroid forum someone ran a couple test runs on Mlucas and a test version of the cancelled Odroid N1. 2560K fft and the timing was 87.7 msec/iter running on the two "big" A72 cores. As a comparison my Ryzen 3 2200G is doing 5.07 ms/iter at 2560K when running mprime, 4 cores 1 worker (throughput is slightly higher for 4 workers) and aren't Intel chips supposed to be much faster than that?

Would someone please make more recent benchmarks available - the benchmark tables on both mersenne.org and mersenne.ca are a bit old now.
...

Taking your 87.7 ms/it for 2xA72 and my 160 msec/it for 2560K 4xA53 from the previous page we get ~17.65 it/sec, effectively ~56.65ms/it for an RK3399. Ewmayer's post on the Odroid forum suggests there may be a 10% slowdown running both clusters simultaneously. So ~11.2 RK3399 may be equivalent to a 2200G (~12.4 if 10% slowdown is present). From the mersenne.ca bench of an i3 8100 of 3.67ms/it for 2560K we can estimate that an 8100 roughly translates to ~15.4 RK3399 (~17.2 if 10% slowdown is present). That's a lot of compounded estimates so a big pinch of salt is required.


Quote:
Originally Posted by ewmayer View Post
...
Looking down the road to 2020, the best hope for an x86-competitive ARM implementation may be in Apple's PC roadmap. More frustrating waiting!
Oh dear.
M344587487 is offline   Reply With Quote
Old 2018-12-30, 19:07   #174
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

317 Posts
Default Raspberry Pi 3B+

Just for completeness' sake, here are the full iteration timings for the Raspberry Pi 3B+ I'm running. Taking care of thermals gave a significant boost over no heatsink, but I got a further 3% by disconnecting the display and stopping X. Just having it run in the background doing "nothing" still apparently consumes a minor portion of CPU time.

Gentoo 64-bit "sakaki", probably version 1.21 of the package since I originally downloaded and installed it in August for other purposes. uname -a says

Linux pi64 4.14.44-v8-4fca48b7612d-bis+ #2 SMP PREEMPT Fri Jun 1 15:55:22 BST 2018 aarch64 GNU/Linux

Stock 1.4 GHz, and four Cortex-A53 cores on the BCM2837B0. No overclocking attempts have been made (or will be made). Running Mlucas_c2simd 17.1 (downloaded precompiled binary)

Code:
17.1
      1024  msec/iter =   47.12  ROE[avg,max] = [0.255116697, 0.375000000]  radices = 256  8 16 16  0  0  0  0  0  0
      1152  msec/iter =   54.31  ROE[avg,max] = [0.223389521, 0.281250000]  radices = 144 16 16 16  0  0  0  0  0  0
      1280  msec/iter =   58.89  ROE[avg,max] = [0.264171231, 0.375000000]  radices = 160 16 16 16  0  0  0  0  0  0
      1408  msec/iter =   68.59  ROE[avg,max] = [0.228616585, 0.312500000]  radices = 176 16 16 16  0  0  0  0  0  0
      1536  msec/iter =   75.89  ROE[avg,max] = [0.252626651, 0.343750000]  radices = 192 16 16 16  0  0  0  0  0  0
      1664  msec/iter =   83.16  ROE[avg,max] = [0.272233409, 0.406250000]  radices = 208 16 16 16  0  0  0  0  0  0
      1792  msec/iter =   90.47  ROE[avg,max] = [0.222731285, 0.312500000]  radices = 224 16 16 16  0  0  0  0  0  0
      1920  msec/iter =   98.88  ROE[avg,max] = [0.255165462, 0.375000000]  radices = 240 16 16 16  0  0  0  0  0  0
      2048  msec/iter =  105.43  ROE[avg,max] = [0.238688298, 0.312500000]  radices = 256 16 16 16  0  0  0  0  0  0
      2304  msec/iter =  123.05  ROE[avg,max] = [0.249503539, 0.312500000]  radices = 288 16 16 16  0  0  0  0  0  0
      2560  msec/iter =  143.92  ROE[avg,max] = [0.233106476, 0.312500000]  radices = 160 32 16 16  0  0  0  0  0  0
      2816  msec/iter =  165.29  ROE[avg,max] = [0.260105912, 0.375000000]  radices = 176 32 16 16  0  0  0  0  0  0
      3072  msec/iter =  181.54  ROE[avg,max] = [0.261096695, 0.375000000]  radices = 192 32 16 16  0  0  0  0  0  0
      3328  msec/iter =  198.91  ROE[avg,max] = [0.282578930, 0.375000000]  radices = 208 32 16 16  0  0  0  0  0  0
      3584  msec/iter =  215.88  ROE[avg,max] = [0.251145062, 0.375000000]  radices = 224 32 16 16  0  0  0  0  0  0
      3840  msec/iter =  235.33  ROE[avg,max] = [0.246073929, 0.343750000]  radices = 240 32 16 16  0  0  0  0  0  0
      4096  msec/iter =  253.36  ROE[avg,max] = [0.226999763, 0.281250000]  radices = 256 32 16 16  0  0  0  0  0  0
      4608  msec/iter =  294.28  ROE[avg,max] = [0.249245933, 0.375000000]  radices = 288 32 16 16  0  0  0  0  0  0
      5120  msec/iter =  330.03  ROE[avg,max] = [0.236507015, 0.312500000]  radices = 160 32 32 16  0  0  0  0  0  0
      5632  msec/iter =  376.08  ROE[avg,max] = [0.259536082, 0.343750000]  radices = 176 32 32 16  0  0  0  0  0  0
      6144  msec/iter =  417.59  ROE[avg,max] = [0.245978727, 0.343750000]  radices = 192 32 32 16  0  0  0  0  0  0
      6656  msec/iter =  456.34  ROE[avg,max] = [0.266108247, 0.375000000]  radices = 208 32 32 16  0  0  0  0  0  0
      7168  msec/iter =  498.72  ROE[avg,max] = [0.225733680, 0.312500000]  radices = 224 32 32 16  0  0  0  0  0  0
      7680  msec/iter =  545.44  ROE[avg,max] = [0.236645483, 0.312500000]  radices = 240 32 32 16  0  0  0  0  0  0
nomead is offline   Reply With Quote
Old 2019-01-13, 20:52   #175
thorken
 
Jan 2019

4310 Posts
Default take a look at this post

https://www.mersenneforum.org/showthread.php?t=23994
thorken is offline   Reply With Quote
Old 2019-01-14, 20:17   #176
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Repรบblica de California

24×733 Posts
Default

Quote:
Originally Posted by thorken View Post
If you're not willing to post source, you're going to continue being greeted with skepticism. All the major current clients used by GIMPSers allow users to inspect the source and to build directly from it, should they so desire.

On the other hand, were to do a build of Mlucas and post comparative timings vs your app on your Android hardware, that could be useful.
ewmayer is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Economic prospects for solar photovoltaic power cheesehead Science & Technology 137 2018-06-26 15:46
Which SIMD flag to use for Raspberry Pi BrainStone Mlucas 14 2017-11-19 00:59
compiler/assembler optimizations possible? ixfd64 Software 7 2011-02-25 20:05
Running 32-bit builds on a Win7 system ewmayer Programming 34 2010-10-18 22:36
SIMD string->int fivemack Software 7 2009-03-23 18:15

All times are UTC. The time now is 02:06.


Sun May 29 02:06:57 UTC 2022 up 45 days, 8 mins, 0 users, load averages: 1.39, 1.17, 1.19

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

โ‰  ยฑ โˆ“ รท ร— ยท โˆ’ โˆš โ€ฐ โŠ— โŠ• โŠ– โŠ˜ โŠ™ โ‰ค โ‰ฅ โ‰ฆ โ‰ง โ‰จ โ‰ฉ โ‰บ โ‰ป โ‰ผ โ‰ฝ โŠ โŠ โŠ‘ โŠ’ ยฒ ยณ ยฐ
โˆ  โˆŸ ยฐ โ‰… ~ โ€– โŸ‚ โซ›
โ‰ก โ‰œ โ‰ˆ โˆ โˆž โ‰ช โ‰ซ โŒŠโŒ‹ โŒˆโŒ‰ โˆ˜ โˆ โˆ โˆ‘ โˆง โˆจ โˆฉ โˆช โจ€ โŠ• โŠ— ๐–• ๐–– ๐–— โŠฒ โŠณ
โˆ… โˆ– โˆ โ†ฆ โ†ฃ โˆฉ โˆช โŠ† โŠ‚ โŠ„ โŠŠ โŠ‡ โŠƒ โŠ… โŠ‹ โŠ– โˆˆ โˆ‰ โˆ‹ โˆŒ โ„• โ„ค โ„š โ„ โ„‚ โ„ต โ„ถ โ„ท โ„ธ ๐“Ÿ
ยฌ โˆจ โˆง โŠ• โ†’ โ† โ‡’ โ‡ โ‡” โˆ€ โˆƒ โˆ„ โˆด โˆต โŠค โŠฅ โŠข โŠจ โซค โŠฃ โ€ฆ โ‹ฏ โ‹ฎ โ‹ฐ โ‹ฑ
โˆซ โˆฌ โˆญ โˆฎ โˆฏ โˆฐ โˆ‡ โˆ† ฮด โˆ‚ โ„ฑ โ„’ โ„“
๐›ข๐›ผ ๐›ฃ๐›ฝ ๐›ค๐›พ ๐›ฅ๐›ฟ ๐›ฆ๐œ€๐œ– ๐›ง๐œ ๐›จ๐œ‚ ๐›ฉ๐œƒ๐œ— ๐›ช๐œ„ ๐›ซ๐œ… ๐›ฌ๐œ† ๐›ญ๐œ‡ ๐›ฎ๐œˆ ๐›ฏ๐œ‰ ๐›ฐ๐œŠ ๐›ฑ๐œ‹ ๐›ฒ๐œŒ ๐›ด๐œŽ๐œ ๐›ต๐œ ๐›ถ๐œ ๐›ท๐œ™๐œ‘ ๐›ธ๐œ’ ๐›น๐œ“ ๐›บ๐œ”