mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software > Mlucas

Reply
 
Thread Tools
Old 2019-02-21, 00:01   #1
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2·13·443 Posts
Default Mlucas v18 available

Mlucas v18 has gone live. Use this thread to report bugs, build issues, and for any other related discussion.

Last fiddled with by ewmayer on 2019-03-06 at 21:55
ewmayer is offline   Reply With Quote
Old 2019-02-21, 08:09   #2
moebius
 
moebius's Avatar
 
Jul 2009
Germany

347 Posts
Default

I always wanted to try it out, but unfortunately I can not compile multi-threaded, because I still use windows 7 professional. Would be great if someone would upload an exe file for the AMD K -10 architecture.
moebius is offline   Reply With Quote
Old 2019-02-21, 08:14   #3
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

10011110002 Posts
Default

Must be my birthday :)
M344587487 is offline   Reply With Quote
Old 2019-02-21, 12:03   #4
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

10011110002 Posts
Default

It compiles and doesn't seg fault on the Samsung S7, well done on fixing the Arm issues this is great.
M344587487 is offline   Reply With Quote
Old 2019-02-21, 17:45   #5
moebius
 
moebius's Avatar
 
Jul 2009
Germany

347 Posts
Default

Quote:
Originally Posted by M344587487 View Post
It compiles and doesn't seg fault on the Samsung S7, well done on fixing the Arm issues this is great.

Your build could run on my Samsung Galaxy A3 8-Core Smartphone (S7 Architecture), but I'm pessimistic about 88M exponents.....
moebius is offline   Reply With Quote
Old 2019-02-21, 19:29   #6
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2CFE16 Posts
Default

Quote:
Originally Posted by M344587487 View Post
It compiles and doesn't seg fault on the Samsung S7, well done on fixing the Arm issues this is great.
Thanks for the build - were you previously forced to used the v17.1 precompiled binary on that platform due to your-own-build-crashed issues?
ewmayer is offline   Reply With Quote
Old 2019-02-21, 20:24   #7
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

63210 Posts
Default

Quote:
Originally Posted by moebius View Post
Your build could run on my Samsung Galaxy A3 8-Core Smartphone (S7 Architecture), but I'm pessimistic about 88M exponents.....


Sounds like your phone has a Snapdragon 415 which is a 28nm 4xA53 4xA53. It should work but unfortunately doesn't come close in efficiency to an S7's 14nm 4xM1 4xA53. It should handily beat a raspberry pi 3's 40nm in efficiency and throughput and slot somewhere behind the 20nm 10 core Helio X25 ( https://www.mersenneforum.org/showpo...8&postcount=83 ). Attached is the v18 ARM asimd binary from the S7 on the offchance you find it useful, AFAIK you need a rooted phone to run it and if you have a rooted phone you could easily build mlucas from source yourself but there it is.

Quote:
Originally Posted by ewmayer View Post
Thanks for the build - were you previously forced to used the v17.1 precompiled binary on that platform due to your-own-build-crashed issues?
Yes, luckily your c2 binary worked flawlessly on everything Armv8. Did you do anything special with that build to ensure compatibility or was it a normal -DUSE_ARM_V8_SIMD -DUSE_THREADS?


I'll try and create an APK tomorrow, there's a chance it works where the v17.1 failed as there were clobber-related error messages like this:
Code:
/home/u18/AndroidStudioProjects/MlucasAPK/app/src/main/cpp/mi64.c:813:19: error: unknown register name 'rax' in asm
                : "cc","memory","rax","rbx","rcx","rsi","r10","r11"     /* Clobbered registers */\
It's just as likely that I was accidentally trying to compile x86 code, it was at that point it got thrown at a virtual wall so never investigated.
Attached Files
File Type: bz2 mlucas_v18_armv8_asimd.bz2 (1.60 MB, 67 views)
M344587487 is offline   Reply With Quote
Old 2019-02-21, 22:00   #8
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2·13·443 Posts
Default

Quote:
Originally Posted by M344587487 View Post
Yes, luckily your c2 binary worked flawlessly on everything Armv8. Did you do anything special with that build to ensure compatibility or was it a normal -DUSE_ARM_V8_SIMD -DUSE_THREADS?
Actually, it was another user who build that posted binary under his Gentoo distro, but yes, just the usual flags. Similar to my Odroid C2 builds, in that the compiler just happened to not use any of the left-off-clobber-list registers or use 64-bit addresses for those erroneous 32-bit loads in the asm. In fact those remaining bugs survived *because* they simply happened to not trigger any errors in my build - all the similar bugs-along-the-way of my ARMv8 development work which did cause runtime errors in my C2 builds obviously were tracked down and fixed prior to the v17.1 release, the first one with ARMv8 assembly support.

Quote:
I'll try and create an APK tomorrow, there's a chance it works where the v17.1 failed as there were clobber-related error messages like this:
Code:
/home/u18/AndroidStudioProjects/MlucasAPK/app/src/main/cpp/mi64.c:813:19: error: unknown register name 'rax' in asm
                : "cc","memory","rax","rbx","rcx","rsi","r10","r11"     /* Clobbered registers */\
It's just as likely that I was accidentally trying to compile x86 code, it was at that point it got thrown at a virtual wall so never investigated.
Yes, that particular error looks like a piece of x86_64 asm is trying to get built. But will be interested to hear results of further build attempts on a wider variety of ARM hardware/OS combinations. Thanks for posting a binary for others to use, but I do hope they will also first try a build-from-source because that's what I need in order to shake out remaining bugs and portability issues.
ewmayer is offline   Reply With Quote
Old 2019-02-22, 07:16   #9
moebius
 
moebius's Avatar
 
Jul 2009
Germany

1010110112 Posts
Default

Quote:
Originally Posted by M344587487 View Post
Sounds like your phone has a Snapdragon 415 which is a 28nm 4xA53 4xA53.
No it is a newer one (Samsung Galaxy A3 (2017), Octa-core, 1600 MHz, ARM Cortex-A53, 64-bit, 14 nm. Thanks, I'll have to look for a root-tool for it.
moebius is offline   Reply With Quote
Old 2019-02-22, 12:03   #10
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

27·23 Posts
Default

Compiled it on the usual c5d.9xlarge with 18 cores and 36 threads:
gcc -c -O3 -march=skylake-avx512 -DUSE_AVX512 -DUSE_THREADS ../src/*.c >& build.log
grep -i error build.log
[Assuming above grep comes up empty]
gcc -o Mlucas *.o -lm -lpthread -lrt


-DCARRY_16_WAY is not needed in v18 right?

This time all 18 cores was fastest for some reason.

Code:
18.0
./Mlucas -fftlen 4608 -iters 10000 -nthread 36
      4608  msec/iter =    3.24  ROE[avg,max] = [0.246743758, 0.312500000]  radices = 144 16 32 32  0  0  0  0  0  0	10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107

./Mlucas -fftlen 4608 -iters 10000 -nthread 34
      4608  msec/iter =    3.18  ROE[avg,max] = [0.246743758, 0.312500000]  radices = 144 16 32 32  0  0  0  0  0  0	10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107

./Mlucas -fftlen 4608 -iters 10000 -nthread 32
      4608  msec/iter =    3.15  ROE[avg,max] = [0.246743758, 0.312500000]  radices = 144 16 32 32  0  0  0  0  0  0	10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107

./Mlucas -fftlen 4608 -iters 10000 -nthread 30
      4608  msec/iter =    3.07  ROE[avg,max] = [0.246740330, 0.312500000]  radices = 144 16 32 32  0  0  0  0  0  0	10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107

./Mlucas -fftlen 4608 -iters 10000 -nthread 28
      4608  msec/iter =    3.03  ROE[avg,max] = [0.246740330, 0.312500000]  radices = 144 16 32 32  0  0  0  0  0  0	10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107

./Mlucas -fftlen 4608 -iters 10000 -nthread 26
      4608  msec/iter =    3.08  ROE[avg,max] = [0.246740330, 0.312500000]  radices = 144 16 32 32  0  0  0  0  0  0	10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107

./Mlucas -fftlen 4608 -iters 10000 -cpu 0:17
      4608  msec/iter =    2.96  ROE[avg,max] = [0.246740330, 0.312500000]  radices = 144 16 32 32  0  0  0  0  0  0	10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107

./Mlucas -fftlen 4608 -iters 10000 -cpu 0:16
      4608  msec/iter =    3.12  ROE[avg,max] = [0.246740330, 0.312500000]  radices = 144 16 32 32  0  0  0  0  0  0	10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107

./Mlucas -fftlen 4608 -iters 10000 -cpu 0:15
      4608  msec/iter =    3.09  ROE[avg,max] = [0.246740330, 0.312500000]  radices = 144 16 32 32  0  0  0  0  0  0	10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107

./Mlucas -fftlen 4608 -iters 10000 -cpu 0:14
      4608  msec/iter =    4.05  ROE[avg,max] = [0.246727988, 0.312500000]  radices = 144 16 32 32  0  0  0  0  0  0	10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107

./Mlucas -fftlen 4608 -iters 10000 -cpu 0:13
      4608  msec/iter =    4.18  ROE[avg,max] = [0.246727988, 0.312500000]  radices = 144 16 32 32  0  0  0  0  0  0	10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107

./Mlucas -fftlen 4608 -iters 10000 -cpu 18:35
      4608  msec/iter =    3.00  ROE[avg,max] = [0.246740330, 0.312500000]  radices = 144 16 32 32  0  0  0  0  0  0	10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107

./Mlucas -fftlen 4608 -iters 10000 -cpu 0:34:2
      4608  msec/iter =    4.27  ROE[avg,max] = [0.246740330, 0.312500000]  radices = 144 16 32 32  0  0  0  0  0  0	10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107

From the README.html should this be -cpu 0:n-1 ?

Quote:
Hyperthreaded x86 CPUs: If Intel, use -cpu 0:n, where n is the number of physical cores on your system

Last fiddled with by ATH on 2019-02-22 at 12:07
ATH is offline   Reply With Quote
Old 2019-02-22, 23:44   #11
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2·13·443 Posts
Default

Quote:
Originally Posted by ATH View Post
-DCARRY_16_WAY is not needed in v18 right?
Correct - if you open platform.h and search for CARRY_16_WAY you'll see it's now on by default for avx-512 builds.

Quote:
This time all 18 cores was fastest for some reason.
What's your best-radix-set timings for 8 and 9-threads at 4608K? I'm curious how much more ||ism we're getting at the higher 16 and 18-threadcounts.

Quote:
From the README.html should this be -cpu 0:n-1 ?
That snip indeed needs an edit, but of a different kind - the section in question is describing (or attempting to :) the simplest way to maximize total system throughput on most multicore x86 systems. That is 1 LL test per physical core, with each such job using 2-threads on Intel hyperthreaded CPUs and 1-thread otherwise (Intel non-HT, AMD, ARM, etc). Because of the way Intel numbers its logical cores, on a system with n physical cores, logical cores j and n+j map to phys-core j, for j = 0,...,n-1. So to generate a proper mlucas.cfg file for such a set-up, one should use -cpu 0,n (note: comma, not colon), then copy the resulting cfg-file to each of n run directories which will host such a 2-thread-on-1-physical-core job.

From a job-management perspective it's of course easier to just run 1 job using all the physical cores, and as long as n <= 4 one won't sacrifice much total throughput by doing so. So on both my non-HT Intel quad Haswell and my quad-ARM64-core Odroid C2 I use -cpu 0:3, as I do on my HT-enabled dual-core Intel Broadwell NUC because there I want to use 2-threads-per-physical-core and a single 4-thread job gives me nearly the same throughput as separate jobs using -cpu 0,2 and -cpu 1,3.

I need to carefully re-read the README.html page to try to catch remaining such ,-versus-: mixups, because they are easy to overlook.
ewmayer is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Mlucas version 17.1 ewmayer Mlucas 96 2019-10-16 12:55
Mlucas on ubuntu Damian Mlucas 17 2017-11-13 18:12
Mlucas version 17 ewmayer Mlucas 3 2017-06-17 11:18
MLucas on IBM Mainframe Lorenzo Mlucas 52 2016-03-13 08:45
mlucas on sun delta_t Mlucas 14 2007-10-04 05:45

All times are UTC. The time now is 16:38.

Wed Sep 30 16:38:15 UTC 2020 up 20 days, 13:49, 0 users, load averages: 1.82, 1.83, 1.82

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.