mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software > Mlucas

Reply
 
Thread Tools
Old 2019-07-05, 05:44   #100
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

3×179 Posts
Default

Quote:
Originally Posted by nomead View Post
In the same directory as the temp file, see the type file. It should tell what the corresponding temperature reading is for.
Very useful, thanks!

Quote:
Originally Posted by ewmayer View Post
Laurent, thanks for the data. Based on your timings, it seems the OS is doing a decent job of load-balancing even in the affinity-fail cases, though some of the large run-to-run timing variability you observe may be due to that.
That's my hypothesis too. I would also add that given that they are close to the end of the run it's possible threads are moved to less powerful cores to reduce temperature; the other possible explanation is that there's a hardware frequency throttler to prevent the chip from getting damaged.

Quote:
Oh, would you be so kind as to post a copy of your /proc/cpuinfo file? Thanks.
Here you go:
Code:
processor    : 0
BogoMIPS    : 38.40
Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp
CPU implementer    : 0x51
CPU architecture: 8
CPU variant    : 0x7
CPU part    : 0x803
CPU revision    : 12

processor    : 1
BogoMIPS    : 38.40
Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp
CPU implementer    : 0x51
CPU architecture: 8
CPU variant    : 0x7
CPU part    : 0x803
CPU revision    : 12

processor    : 2
BogoMIPS    : 38.40
Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp
CPU implementer    : 0x51
CPU architecture: 8
CPU variant    : 0x7
CPU part    : 0x803
CPU revision    : 12

processor    : 3
BogoMIPS    : 38.40
Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp
CPU implementer    : 0x51
CPU architecture: 8
CPU variant    : 0x7
CPU part    : 0x803
CPU revision    : 12

processor    : 4
BogoMIPS    : 38.40
Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp
CPU implementer    : 0x51
CPU architecture: 8
CPU variant    : 0x6
CPU part    : 0x802
CPU revision    : 13

processor    : 5
BogoMIPS    : 38.40
Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp
CPU implementer    : 0x51
CPU architecture: 8
CPU variant    : 0x6
CPU part    : 0x802
CPU revision    : 13

processor    : 6
BogoMIPS    : 38.40
Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp
CPU implementer    : 0x51
CPU architecture: 8
CPU variant    : 0x6
CPU part    : 0x802
CPU revision    : 13

processor    : 7
BogoMIPS    : 38.40
Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp
CPU implementer    : 0x51
CPU architecture: 8
CPU variant    : 0x6
CPU part    : 0x802
CPU revision    : 13
The 0:3 CPUs are the A53 derivatives and the 4:7 are the A75 derivatives.
ldesnogu is online now   Reply With Quote
Old 2019-10-07, 22:36   #101
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

5·701 Posts
Default

121 days on the a73 (N2):

Code:
M91410271 is not prime. Res64: 415636F42C8F81__. Program: E18.0. Final residue shift count = 48867609
I am doing another WR LL test followed by a DC test.

Last fiddled with by paulunderwood on 2019-10-07 at 22:36
paulunderwood is offline   Reply With Quote
Old 2019-10-08, 03:36   #102
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

3·7·19·29 Posts
Default

Thanks for the cycles! That's about what I get @5120K (~120 ms/iter) on the 4-core Snapdragon CPU of my Galaxy S7s - but I need to put them into good airflow (USB fan) to get the timings down to that. Are you putting your N2 into airflow or just passively using the big attached heatsink?
ewmayer is offline   Reply With Quote
Old 2019-10-08, 03:46   #103
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

DB116 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Thanks for the cycles! That's about what I get @5120K (~120 ms/iter) on the 4-core Snapdragon CPU of my Galaxy S7s - but I need to put them into good airflow (USB fan) to get the timings down to that. Are you putting your N2 into airflow or just passively using the big attached heatsink?
I have a 80mm USB fan directly underneath the heatsink base. It adds significantly to the wattage of the system. This N2 is my desktop machine. During the forthcoming summer months I plan to shutdown all X64 machines and just run the N2 and rpi 3B+ -- any extra heat here will not be welcome. During the depths of winter I will run some 140 X64 cores to keep me warm.
paulunderwood is offline   Reply With Quote
Old 2020-03-11, 05:43   #104
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

1101101100012 Posts
Default

I am impressed by how fast mlucas is on the A73 chip of my Odroid N2 compared to 8 cores of single chip AMD 6176 (Opteron 12 core at 2.3 GHz) running mprime:
Code:
mprime on 8 cores of AMD Opteron: ms/iter: 47.678
mlucas on 4 cores of ARM a73: 119.6611 msec/iter
(I could probably get more throughput on the Opteron, but I am running down tasks for it for the upcoming summer months.)

Last fiddled with by paulunderwood on 2020-03-11 at 05:47
paulunderwood is offline   Reply With Quote
Old 2020-03-11, 19:27   #105
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

3×7×19×29 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
I am impressed by how fast mlucas is on the A73 chip of my Odroid N2 compared to 8 cores of single chip AMD 6176 (Opteron 12 core at 2.3 GHz) running mprime:
Code:
mprime on 8 cores of AMD Opteron: ms/iter: 47.678
mlucas on 4 cores of ARM a73: 119.6611 msec/iter
(I could probably get more throughput on the Opteron, but I am running down tasks for it for the upcoming summer months.)
What FFT length is that at? That's actually a pretty decent apples-to-apples comparison, given the 128-bitness of both those CPU's SIMD units. Also note that w.r.to ARM, much depends on what kinds of hardware resources support that SIMD - I've never gotten a definitive answer from anyone on this, but based on scalar-FFT vs 128-bit SIMD build timings on ARM, where the former runs ~2/3 as fast as latter (compared to ~1/3 for Mlucas built both ways on my Core2Duo), I'm reaonably certain that on most ARM implementations, the SIMD uses the same underlying functional units (adders, multipliers, etc) as the non-SIMD. The SIMD thus gets its speedup not from more transistors dedicated to it as is the case on x86, but from the combination of a leaner instruction stream (fewer instructions needed to process the same data volume) and better register usage.

My surmise makes sense also in the context of ARM aiming for the performance-per-Watt-hour market segment - you want as little silicon as possible underpinning the instruction set, so non-SIMD and SIMD instructions sharing the same hardware makes sense. Said sharing is facilitated by not having a crufty legacy non-IEEE register data format such as x86 has with its 80-bit register-double-floats.

What's really needed on the ARM front is wider SIMD, cheap manycore, and perhaps some higher-end implementations which do provide dedicated silicon to feed that SIMD.
ewmayer is offline   Reply With Quote
Old 2020-03-11, 20:10   #106
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

5·701 Posts
Default

Quote:
Originally Posted by ewmayer View Post
What FFT length is that at?
AMD: [Work thread Mar 11 14:08] Resuming Gerbicz error-checking PRP test of M103xxxxxx using AMD K10 type-2 FFT length 5600K, Pass1=896, Pass2=6400, clm=4, 8 threads

ARM: M103xxxxxx: using FFT length 5632K = 5767168 8-byte floats, initial residue shift count = 61642481
this gives an average 18.027884223244406 bits per digit
The test will be done in form of a 3-PRP test.
Using complex FFT radices 352 16 16 32
paulunderwood is offline   Reply With Quote
Old 2020-03-11, 20:26   #107
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

3×179 Posts
Default

Quote:
Originally Posted by ewmayer View Post
What FFT length is that at? That's actually a pretty decent apples-to-apples comparison, given the 128-bitness of both those CPU's SIMD units. Also note that w.r.to ARM, much depends on what kinds of hardware resources support that SIMD - I've never gotten a definitive answer from anyone on this, but based on scalar-FFT vs 128-bit SIMD build timings on ARM, where the former runs ~2/3 as fast as latter (compared to ~1/3 for Mlucas built both ways on my Core2Duo), I'm reaonably certain that on most ARM implementations, the SIMD uses the same underlying functional units (adders, multipliers, etc) as the non-SIMD. The SIMD thus gets its speedup not from more transistors dedicated to it as is the case on x86, but from the combination of a leaner instruction stream (fewer instructions needed to process the same data volume) and better register usage.

My surmise makes sense also in the context of ARM aiming for the performance-per-Watt-hour market segment - you want as little silicon as possible underpinning the instruction set, so non-SIMD and SIMD instructions sharing the same hardware makes sense. Said sharing is facilitated by not having a crufty legacy non-IEEE register data format such as x86 has with its 80-bit register-double-floats.

What's really needed on the ARM front is wider SIMD, cheap manycore, and perhaps some higher-end implementations which do provide dedicated silicon to feed that SIMD.
You can find Software Optimization guides here: https://developer.arm.com/docs

Nothing for A73 it seems. But there is A75 A76 A77 N1.

For A73 there is this: https://www.anandtech.com/show/10347...mis-unveiled/2
ldesnogu is online now   Reply With Quote
Old 2020-03-11, 21:46   #108
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

3·7·19·29 Posts
Default

Quote:
Originally Posted by ldesnogu View Post
You can find Software Optimization guides here: https://developer.arm.com/docs

Nothing for A73 it seems. But there is A75 A76 A77 N1.

For A73 there is this: https://www.anandtech.com/show/10347...mis-unveiled/2
That Anandtech article addresses the how-many-underlying-functional-units question:
Quote:
We find two simple ALUs capable of basic operations such as additions and shifting. Integer multiplication, division and multiply-accumulate operations are handled by a dedicated multi-cycle integer pipeline. Floating point as well as ASIMD and NEON operations are handled by two pipelines, some of the capabilities we’ll go into detail later on. We find a single branch monitor and two dedicated Load and Store AGUs.
ewmayer is offline   Reply With Quote
Old 2020-07-22, 02:59   #109
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

5×701 Posts
Default

My N2 just finished its first wave front PRP test successfully modulo a double check No GEC errors were encountered.
paulunderwood is offline   Reply With Quote
Old 2020-07-22, 10:39   #110
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

12A416 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
My N2 just finished its first wave front PRP test successfully modulo a double check No GEC errors were encountered.
I just noticed the new N2+ has been announced.
BTW, how do you deal with the big-little 6-cores architecture? You probably already told us, but I lost the pointer, and if you have it at hand...
ET_ is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mprime on Odroid 64bit ET_ Software 2 2017-02-24 15:42
GPU72 plans post-announcement garo GPU to 72 25 2013-03-04 10:11
The Prime Announcement Thread axn Sierpinski/Riesel Base 5 61 2008-12-08 16:28
Subscribing to announcement thread fetofs GMP-ECM 1 2006-05-30 04:32
Fourth known factor of M(M31) (preliminary announcement) ewmayer Operazione Doppi Mersennes 22 2005-07-06 00:33

All times are UTC. The time now is 20:18.

Fri Dec 4 20:18:52 UTC 2020 up 1 day, 16:30, 0 users, load averages: 1.46, 1.52, 1.54

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.