mersenneforum.org ARM builds and SIMD-assembler prospects
 Register FAQ Search Today's Posts Mark Forums Read

 2018-07-20, 13:50 #166 ET_ Banned     "Luigi" Aug 2002 Team Italia 29×167 Posts Any hints about the addition of the prp option for small Mersenne exponents? It would be optimal for our PIS 😀
 2018-07-21, 20:41 #167 ewmayer ∂2ω=0     Sep 2002 República de California 22×5×587 Posts @Luigi: Work continues on PRP support ... but slowly. I am moving at end of August, so Mlucas work necessarily has been taking a back seat to all the work involved with that.
2018-07-22, 11:35   #168
ET_
Banned

"Luigi"
Aug 2002
Team Italia

29·167 Posts

Quote:
 Originally Posted by ewmayer @Luigi: Work continues on PRP support ... but slowly. I am moving at end of August, so Mlucas work necessarily has been taking a back seat to all the work involved with that.
I didn't remember that, nor willing to look snappy, it wasn't my intention.

Hope everything goes smooth.

2018-07-24, 20:28   #169
ewmayer
2ω=0

Sep 2002
República de California

22×5×587 Posts

Quote:
 Originally Posted by ET_ I didn't remember that, nor willing to look snappy, it wasn't my intention.
You didn't remember that because I never previously mentioned it around here. :)

Thanks for the smoothness Glückwunsch!

2018-12-29, 18:35   #170
M344587487

"Composite as Heck"
Oct 2017

90110 Posts

Quote:
 Originally Posted by ewmayer Glad you got at least a nice chunk of the "missing FLOPS" back. I'm looking forward to buying an Odroid N1 once they go on sale, hopefully within a month - one of the beta-testers ran Mlucas benchmarks, and using all 6 cores (one 2-threaded job running on the 'big' 2-core a72 cpu, another 4-theaded one on the 'little' 4-core a53 cpu) he gets 2.2-2.3 the total throughput of an Odroid C2, which means ~3.5x the total throughput of a Pi3. N1 pricing is estimated at ~$110, i.e. about the same$/FLOP as the C2. We can ony hope this sparks a full-blown 'multi-socket war' amongst the various ARM-micro-PC manufacturers. :) Even for the N1 one still needs ~10 of them to match the LL-test throughput of a cutting-edge Intel quad, but things are getting close to the "interestingness" level as far as wide-scale adoption goes.
If ~10 RK3399 equals a quad core intel is true for all RK3399 boards then the "interestingness" level has improved due to prices (but it's still in x86's favour).

10x Neo4 + 10x SD cards = ~£400, PSUs probably ~£30, no need for a switch so total ~£430. Not too shabby, is there a stronger SoC than 2xA72+4xA53 that makes more bang for buck sense?

 2018-12-29, 19:48 #171 nomead     "Sam Laur" Dec 2018 Turku, Finland 317 Posts I doubt that 10x factor somewhat. On the Odroid forum someone ran a couple test runs on Mlucas and a test version of the cancelled Odroid N1. 2560K fft and the timing was 87.7 msec/iter running on the two "big" A72 cores. As a comparison my Ryzen 3 2200G is doing 5.07 ms/iter at 2560K when running mprime, 4 cores 1 worker (throughput is slightly higher for 4 workers) and aren't Intel chips supposed to be much faster than that? Would someone please make more recent benchmarks available - the benchmark tables on both mersenne.org and mersenne.ca are a bit old now. But anyway, I ordered a Rock960 board. It hasn't arrived yet, but it's also got that RK3399 chip on it. It's more expensive than those other boards, but it's just one board for testing and general fooling around, thermals and power consumption etc. There are boards with e.g. Kirin 970 (4x A73, 4x A53) but they are really quite expensive now. Maybe after a while they'll get cheaper too. Or maybe something else will.
 2018-12-29, 22:24 #172 ewmayer ∂2ω=0     Sep 2002 República de California 22×5×587 Posts Re. the Odriod-N1 tests, I had a beta tester of one of those do an Mlucas build and timings on that - long story short, we got best total throughput running one 2-thread job on the dual A72 core and one 4-thread job on the quad A53, and said total throughput was ~2.2x what I get running on my quad-A53-core Odroid C2. No idea whether the N2 will be appreciably better FLOPS-wise than the N1, and no word (AFAIK) on when the much-delayed N2 will finally be available, hopefully early in 2019. Looking down the road to 2020, the best hope for an x86-competitive ARM implementation may be in Apple's PC roadmap. More frustrating waiting!
2018-12-29, 23:18   #173
M344587487

"Composite as Heck"
Oct 2017

17·53 Posts

Quote:
 Originally Posted by nomead I doubt that 10x factor somewhat. On the Odroid forum someone ran a couple test runs on Mlucas and a test version of the cancelled Odroid N1. 2560K fft and the timing was 87.7 msec/iter running on the two "big" A72 cores. As a comparison my Ryzen 3 2200G is doing 5.07 ms/iter at 2560K when running mprime, 4 cores 1 worker (throughput is slightly higher for 4 workers) and aren't Intel chips supposed to be much faster than that? Would someone please make more recent benchmarks available - the benchmark tables on both mersenne.org and mersenne.ca are a bit old now. ...

Taking your 87.7 ms/it for 2xA72 and my 160 msec/it for 2560K 4xA53 from the previous page we get ~17.65 it/sec, effectively ~56.65ms/it for an RK3399. Ewmayer's post on the Odroid forum suggests there may be a 10% slowdown running both clusters simultaneously. So ~11.2 RK3399 may be equivalent to a 2200G (~12.4 if 10% slowdown is present). From the mersenne.ca bench of an i3 8100 of 3.67ms/it for 2560K we can estimate that an 8100 roughly translates to ~15.4 RK3399 (~17.2 if 10% slowdown is present). That's a lot of compounded estimates so a big pinch of salt is required.

Quote:
 Originally Posted by ewmayer ... Looking down the road to 2020, the best hope for an x86-competitive ARM implementation may be in Apple's PC roadmap. More frustrating waiting!
Oh dear.

 2018-12-30, 19:07 #174 nomead     "Sam Laur" Dec 2018 Turku, Finland 317 Posts Raspberry Pi 3B+ Just for completeness' sake, here are the full iteration timings for the Raspberry Pi 3B+ I'm running. Taking care of thermals gave a significant boost over no heatsink, but I got a further 3% by disconnecting the display and stopping X. Just having it run in the background doing "nothing" still apparently consumes a minor portion of CPU time. Gentoo 64-bit "sakaki", probably version 1.21 of the package since I originally downloaded and installed it in August for other purposes. uname -a says Linux pi64 4.14.44-v8-4fca48b7612d-bis+ #2 SMP PREEMPT Fri Jun 1 15:55:22 BST 2018 aarch64 GNU/Linux Stock 1.4 GHz, and four Cortex-A53 cores on the BCM2837B0. No overclocking attempts have been made (or will be made). Running Mlucas_c2simd 17.1 (downloaded precompiled binary) Code: 17.1 1024 msec/iter = 47.12 ROE[avg,max] = [0.255116697, 0.375000000] radices = 256 8 16 16 0 0 0 0 0 0 1152 msec/iter = 54.31 ROE[avg,max] = [0.223389521, 0.281250000] radices = 144 16 16 16 0 0 0 0 0 0 1280 msec/iter = 58.89 ROE[avg,max] = [0.264171231, 0.375000000] radices = 160 16 16 16 0 0 0 0 0 0 1408 msec/iter = 68.59 ROE[avg,max] = [0.228616585, 0.312500000] radices = 176 16 16 16 0 0 0 0 0 0 1536 msec/iter = 75.89 ROE[avg,max] = [0.252626651, 0.343750000] radices = 192 16 16 16 0 0 0 0 0 0 1664 msec/iter = 83.16 ROE[avg,max] = [0.272233409, 0.406250000] radices = 208 16 16 16 0 0 0 0 0 0 1792 msec/iter = 90.47 ROE[avg,max] = [0.222731285, 0.312500000] radices = 224 16 16 16 0 0 0 0 0 0 1920 msec/iter = 98.88 ROE[avg,max] = [0.255165462, 0.375000000] radices = 240 16 16 16 0 0 0 0 0 0 2048 msec/iter = 105.43 ROE[avg,max] = [0.238688298, 0.312500000] radices = 256 16 16 16 0 0 0 0 0 0 2304 msec/iter = 123.05 ROE[avg,max] = [0.249503539, 0.312500000] radices = 288 16 16 16 0 0 0 0 0 0 2560 msec/iter = 143.92 ROE[avg,max] = [0.233106476, 0.312500000] radices = 160 32 16 16 0 0 0 0 0 0 2816 msec/iter = 165.29 ROE[avg,max] = [0.260105912, 0.375000000] radices = 176 32 16 16 0 0 0 0 0 0 3072 msec/iter = 181.54 ROE[avg,max] = [0.261096695, 0.375000000] radices = 192 32 16 16 0 0 0 0 0 0 3328 msec/iter = 198.91 ROE[avg,max] = [0.282578930, 0.375000000] radices = 208 32 16 16 0 0 0 0 0 0 3584 msec/iter = 215.88 ROE[avg,max] = [0.251145062, 0.375000000] radices = 224 32 16 16 0 0 0 0 0 0 3840 msec/iter = 235.33 ROE[avg,max] = [0.246073929, 0.343750000] radices = 240 32 16 16 0 0 0 0 0 0 4096 msec/iter = 253.36 ROE[avg,max] = [0.226999763, 0.281250000] radices = 256 32 16 16 0 0 0 0 0 0 4608 msec/iter = 294.28 ROE[avg,max] = [0.249245933, 0.375000000] radices = 288 32 16 16 0 0 0 0 0 0 5120 msec/iter = 330.03 ROE[avg,max] = [0.236507015, 0.312500000] radices = 160 32 32 16 0 0 0 0 0 0 5632 msec/iter = 376.08 ROE[avg,max] = [0.259536082, 0.343750000] radices = 176 32 32 16 0 0 0 0 0 0 6144 msec/iter = 417.59 ROE[avg,max] = [0.245978727, 0.343750000] radices = 192 32 32 16 0 0 0 0 0 0 6656 msec/iter = 456.34 ROE[avg,max] = [0.266108247, 0.375000000] radices = 208 32 32 16 0 0 0 0 0 0 7168 msec/iter = 498.72 ROE[avg,max] = [0.225733680, 0.312500000] radices = 224 32 32 16 0 0 0 0 0 0 7680 msec/iter = 545.44 ROE[avg,max] = [0.236645483, 0.312500000] radices = 240 32 32 16 0 0 0 0 0 0
 2019-01-13, 20:52 #175 thorken   Jan 2019 43 Posts take a look at this post
2019-01-14, 20:17   #176
ewmayer
2ω=0

Sep 2002
República de California

101101110111002 Posts

Quote:
If you're not willing to post source, you're going to continue being greeted with skepticism. All the major current clients used by GIMPSers allow users to inspect the source and to build directly from it, should they so desire.

On the other hand, were to do a build of Mlucas and post comparative timings vs your app on your Android hardware, that could be useful.

 Similar Threads Thread Thread Starter Forum Replies Last Post cheesehead Science & Technology 137 2018-06-26 15:46 BrainStone Mlucas 14 2017-11-19 00:59 ixfd64 Software 7 2011-02-25 20:05 ewmayer Programming 34 2010-10-18 22:36 fivemack Software 7 2009-03-23 18:15

All times are UTC. The time now is 01:11.

Thu Aug 18 01:11:27 UTC 2022 up 41 days, 19:58, 1 user, load averages: 1.50, 1.22, 1.14

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔