![]() |
![]() |
#1 |
∂2ω=0
Sep 2002
República de California
26·181 Posts |
![]()
So I finally got round to registering for the Odroid forum and posting re. the Mlucas-for-ARMv8 SIMD-code port yesterday, in my post I also loudly yearned for an Odroid update based on the newer/faster Cortex A57, today got a "Wish granted" reply from user 'rooted' pointing to this thread posted - not sure if coincidentally - just a few hours after I started mine:
https://forum.odroid.com/viewtopic.php?t=29932 |
![]() |
![]() |
![]() |
#2 |
"Victor de Hollander"
Aug 2011
the Netherlands
49816 Posts |
![]()
Mooaarrr power, can't go wrong with that
![]() |
![]() |
![]() |
![]() |
#3 |
"Mark"
Apr 2003
Between here and the
11000000111002 Posts |
![]()
Where's the PS2 port?
![]() |
![]() |
![]() |
![]() |
#4 |
"Composite as Heck"
Oct 2017
3·5·72 Posts |
![]()
That's interesting, I have some open questions
I like that it's 12v instead of 5V, it works better to use a PSU as the power source as it's the main rail. I'm augmenting an x86 system with some pi/pi clones which use 5V, it would be pretty cool to be able to power some 12V boards from the same molex connectors. I tried to find some benchmarks for comparison. Mediatek-MT8176 (2.1Ghz 2 core A72, 4 core A53): https://www.notebookcheck.net/Mediat....187985.0.html Geekbench 4.1/4.2 64 bit single-core score: 1541 Geekbench 4.1/4.2 64 bit multi-core score: 2489 Mediatek-MT6735 (4 core 1.5Ghz A53): https://www.notebookcheck.net/Mediat....147799.0.html Geekbench 4.1/4.2 64 bit single-core score: 519 Geekbench 4.1/4.2 64 bit multi-core score: 1430 I know it's very rule-of-thumb as it is, but an A72 core having triple the bench score of an A53 core may mean the board can do 2.5x the throughput of a 4 core A53 SoC. The multi-core score backs this up if the A53 cores were idle during it's run, or maybe the benchmarks can't be compared in this way and this is all fluff. |
![]() |
![]() |
![]() |
#5 | |
Jan 2008
France
54210 Posts |
![]() Quote:
![]() |
|
![]() |
![]() |
![]() |
#6 | |
"Composite as Heck"
Oct 2017
3·5·72 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#7 |
"Victor de Hollander"
Aug 2011
the Netherlands
23·3·72 Posts |
![]()
According to ARM the A72 core should perform about 26% better than the A57 in FloatPoint (same frequency, process and memory subsystem)
However, looking at the bottom of this: https://www.anandtech.com/show/11088...ce-and-power/2 It looks like the A57 and A72 are much closer in performance/MHz (A72 maybe 10% better than A57). A53 core slightly less than half of a A57 core. |
![]() |
![]() |
![]() |
#8 | |
∂2ω=0
Sep 2002
República de California
2D4016 Posts |
![]() Quote:
Anyhow, the quickest way to find out is to get hold of one of the dev-boards they're gifting a small subset of Odroiders, I asked to be included in the list of potential grantees, so we can hope. In the meantime, though, if someone has access to one of currently-available (and pricier) big-LITTLE dual-cortex-CPUs-of-different-flavors dev-boards, they could play with this aspect. |
|
![]() |
![]() |
![]() |
#9 |
Sep 2003
32·7·41 Posts |
![]()
Looks like the compiler flags to use with gcc for this announced product would be -march=armv8-a -mtune=cortex-a72.cortex-a53
|
![]() |
![]() |
![]() |
#10 |
∂2ω=0
Sep 2002
República de California
26×181 Posts |
![]()
Note that in my C2 SIMD builds I found a slight negative timing impact from using the a53 arch-flags, so I eschew them. YMMV, but AFAICT the only good reason to invoke such flags is if your platform requires them, which is sometimes not easy to tell - e.g. I had one builder whose build runtime-segfaulted sans the arch-flags for his CPU, said issue was cured by invoking them on rebuild.
|
![]() |
![]() |
![]() |
#11 |
∂2ω=0
Sep 2002
República de California
26×181 Posts |
![]()
Thanks to a well-placed Odroider who was one of the selected recipients of a pre-release N1 system and was kind enough to try out my code on some, we have N1 timings to mull over. Couple of notes:
1. His Debian build bonked with segfaults, likely similar miscompilation issue as TomW hit (but haven't bothered to do the deeper digging needed to precisely localize the cause). But my C2-build (under the standard Ubuntu distro Hardkernel ships with that unit) worked for him. Since that same build appears to run in drop-in mode on a surprising variety of ARMv8 platforms (including Raspberry Pi3), I've posted it to the Mlucas ftp site and added corresponding link/verbiage to the Mlucas readme page. 2. I'm still waiting for more data re. running code on both sockets (i.e. 6 total cores/threads), but preliminarily it looks like running separate jobs on the A72 and the A53 is best, as I surmised would be the case. A72, 2-cores - I've snipped the ROE stats column from both mlucas.cfg files' data for the sake of readability: Code:
1024 msec/iter = 43.98 radices = 256 8 16 16 0 1152 msec/iter = 47.97 radices = 144 16 16 16 0 1280 msec/iter = 51.81 radices = 160 16 16 16 0 1408 msec/iter = 60.31 radices = 176 16 16 16 0 1536 msec/iter = 65.26 radices = 192 16 16 16 0 1664 msec/iter = 71.93 radices = 208 16 16 16 0 1792 msec/iter = 78.33 radices = 224 16 16 16 0 1920 msec/iter = 85.62 radices = 240 16 16 16 0 2048 msec/iter = 91.51 radices = 256 16 16 16 0 2304 msec/iter = 108.23 radices = 288 16 16 16 0 2560 msec/iter = 121.80 radices = 160 32 16 16 0 2816 msec/iter = 140.07 radices = 176 32 16 16 0 3072 msec/iter = 149.53 radices = 192 32 16 16 0 3328 msec/iter = 165.62 radices = 208 32 16 16 0 3584 msec/iter = 180.50 radices = 224 32 16 16 0 3840 msec/iter = 195.86 radices = 240 32 16 16 0 4096 msec/iter = 212.20 radices = 256 32 16 16 0 4608 msec/iter = 249.22 radices = 288 32 16 16 0 5120 msec/iter = 278.39 radices = 160 32 32 16 0 5632 msec/iter = 316.39 radices = 176 32 32 16 0 6144 msec/iter = 339.48 radices = 192 32 32 16 0 6656 msec/iter = 376.19 radices = 208 32 32 16 0 7168 msec/iter = 407.32 radices = 224 32 32 16 0 7680 msec/iter = 446.24 radices = 240 32 32 16 0 Code:
Speedup vs A53x4: 1024 msec/iter = 31.18 radices = 64 8 8 8 16 1.41 1152 msec/iter = 35.09 radices = 288 8 16 16 0 1.37 1280 msec/iter = 41.47 radices = 160 16 16 16 0 1.25 1408 msec/iter = 48.19 radices = 176 16 16 16 0 1.25 1536 msec/iter = 51.76 radices = 48 32 32 16 0 1.26 1664 msec/iter = 57.13 radices = 208 16 16 16 0 1.26 1792 msec/iter = 60.23 radices = 224 16 16 16 0 1.30 1920 msec/iter = 65.86 radices = 240 16 16 16 0 1.30 2048 msec/iter = 66.59 radices = 128 16 16 32 0 1.37 2304 msec/iter = 75.49 radices = 144 16 16 32 0 1.43 2560 msec/iter = 80.48 radices = 160 8 8 8 16 1.51 2816 msec/iter = 94.42 radices = 176 8 8 8 16 1.48 3072 msec/iter = 102.73 radices = 192 8 8 8 16 1.46 3328 msec/iter = 110.71 radices = 208 8 8 8 16 1.50 3584 msec/iter = 115.94 radices = 224 8 8 8 16 1.56 3840 msec/iter = 125.06 radices = 240 8 8 8 16 1.57 4096 msec/iter = 134.47 radices = 256 8 8 8 16 1.58 4608 msec/iter = 150.90 radices = 288 8 8 8 16 1.66 5120 msec/iter = 181.31 radices = 160 8 8 16 16 1.54 5632 msec/iter = 210.01 radices = 176 8 8 16 16 1.51 6144 msec/iter = 227.63 radices = 192 8 8 16 16 1.49 6656 msec/iter = 248.11 radices = 208 8 8 16 16 1.52 7168 msec/iter = 261.58 radices = 224 8 8 16 16 1.56 7680 msec/iter = 284.01 radices = 240 8 8 16 16 1.57 Last fiddled with by ewmayer on 2018-03-03 at 02:13 |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
mprime on Odroid 64bit | ET_ | Software | 2 | 2017-02-24 15:42 |
GPU72 plans post-announcement | garo | GPU to 72 | 25 | 2013-03-04 10:11 |
The Prime Announcement Thread | axn | Sierpinski/Riesel Base 5 | 61 | 2008-12-08 16:28 |
Subscribing to announcement thread | fetofs | GMP-ECM | 1 | 2006-05-30 04:32 |
Fourth known factor of M(M31) (preliminary announcement) | ewmayer | Operazione Doppi Mersennes | 22 | 2005-07-06 00:33 |