![]() |
![]() |
#1 |
∂2ω=0
Sep 2002
República de California
5·7·331 Posts |
![]()
Mlucas v18 has gone live. Use this thread to report bugs, build issues, and for any other related discussion.
Last fiddled with by ewmayer on 2019-03-06 at 21:55 |
![]() |
![]() |
![]() |
#2 |
Jul 2009
Germany
547 Posts |
![]()
I always wanted to try it out, but unfortunately I can not compile multi-threaded, because I still use windows 7 professional. Would be great if someone would upload an exe file for the AMD K -10 architecture.
|
![]() |
![]() |
![]() |
#3 |
"Composite as Heck"
Oct 2017
11·67 Posts |
![]()
Must be my birthday :)
|
![]() |
![]() |
![]() |
#4 |
"Composite as Heck"
Oct 2017
73710 Posts |
![]()
It compiles and doesn't seg fault on the Samsung S7, well done on fixing the Arm issues this is great.
|
![]() |
![]() |
![]() |
#5 |
Jul 2009
Germany
54710 Posts |
![]() |
![]() |
![]() |
![]() |
#6 |
∂2ω=0
Sep 2002
República de California
5×7×331 Posts |
![]() |
![]() |
![]() |
![]() |
#7 | ||
"Composite as Heck"
Oct 2017
11·67 Posts |
![]() Quote:
Sounds like your phone has a Snapdragon 415 which is a 28nm 4xA53 4xA53. It should work but unfortunately doesn't come close in efficiency to an S7's 14nm 4xM1 4xA53. It should handily beat a raspberry pi 3's 40nm in efficiency and throughput and slot somewhere behind the 20nm 10 core Helio X25 ( https://www.mersenneforum.org/showpo...8&postcount=83 ). Attached is the v18 ARM asimd binary from the S7 on the offchance you find it useful, AFAIK you need a rooted phone to run it and if you have a rooted phone you could easily build mlucas from source yourself but there it is. Quote:
I'll try and create an APK tomorrow, there's a chance it works where the v17.1 failed as there were clobber-related error messages like this: Code:
/home/u18/AndroidStudioProjects/MlucasAPK/app/src/main/cpp/mi64.c:813:19: error: unknown register name 'rax' in asm : "cc","memory","rax","rbx","rcx","rsi","r10","r11" /* Clobbered registers */\ |
||
![]() |
![]() |
![]() |
#8 | ||
∂2ω=0
Sep 2002
República de California
5×7×331 Posts |
![]() Quote:
Quote:
|
||
![]() |
![]() |
![]() |
#9 |
Jul 2009
Germany
10001000112 Posts |
![]() |
![]() |
![]() |
![]() |
#10 | |
Einyen
Dec 2003
Denmark
23·131 Posts |
![]()
Compiled it on the usual c5d.9xlarge with 18 cores and 36 threads:
gcc -c -O3 -march=skylake-avx512 -DUSE_AVX512 -DUSE_THREADS ../src/*.c >& build.log grep -i error build.log [Assuming above grep comes up empty] gcc -o Mlucas *.o -lm -lpthread -lrt -DCARRY_16_WAY is not needed in v18 right? This time all 18 cores was fastest for some reason. Code:
18.0 ./Mlucas -fftlen 4608 -iters 10000 -nthread 36 4608 msec/iter = 3.24 ROE[avg,max] = [0.246743758, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107 ./Mlucas -fftlen 4608 -iters 10000 -nthread 34 4608 msec/iter = 3.18 ROE[avg,max] = [0.246743758, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107 ./Mlucas -fftlen 4608 -iters 10000 -nthread 32 4608 msec/iter = 3.15 ROE[avg,max] = [0.246743758, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107 ./Mlucas -fftlen 4608 -iters 10000 -nthread 30 4608 msec/iter = 3.07 ROE[avg,max] = [0.246740330, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107 ./Mlucas -fftlen 4608 -iters 10000 -nthread 28 4608 msec/iter = 3.03 ROE[avg,max] = [0.246740330, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107 ./Mlucas -fftlen 4608 -iters 10000 -nthread 26 4608 msec/iter = 3.08 ROE[avg,max] = [0.246740330, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107 ./Mlucas -fftlen 4608 -iters 10000 -cpu 0:17 4608 msec/iter = 2.96 ROE[avg,max] = [0.246740330, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107 ./Mlucas -fftlen 4608 -iters 10000 -cpu 0:16 4608 msec/iter = 3.12 ROE[avg,max] = [0.246740330, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107 ./Mlucas -fftlen 4608 -iters 10000 -cpu 0:15 4608 msec/iter = 3.09 ROE[avg,max] = [0.246740330, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107 ./Mlucas -fftlen 4608 -iters 10000 -cpu 0:14 4608 msec/iter = 4.05 ROE[avg,max] = [0.246727988, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107 ./Mlucas -fftlen 4608 -iters 10000 -cpu 0:13 4608 msec/iter = 4.18 ROE[avg,max] = [0.246727988, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107 ./Mlucas -fftlen 4608 -iters 10000 -cpu 18:35 4608 msec/iter = 3.00 ROE[avg,max] = [0.246740330, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107 ./Mlucas -fftlen 4608 -iters 10000 -cpu 0:34:2 4608 msec/iter = 4.27 ROE[avg,max] = [0.246740330, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107 From the README.html should this be -cpu 0:n-1 ? Quote:
Last fiddled with by ATH on 2019-02-22 at 12:07 |
|
![]() |
![]() |
![]() |
#11 | ||
∂2ω=0
Sep 2002
República de California
1158510 Posts |
![]()
Correct - if you open platform.h and search for CARRY_16_WAY you'll see it's now on by default for avx-512 builds.
Quote:
Quote:
From a job-management perspective it's of course easier to just run 1 job using all the physical cores, and as long as n <= 4 one won't sacrifice much total throughput by doing so. So on both my non-HT Intel quad Haswell and my quad-ARM64-core Odroid C2 I use -cpu 0:3, as I do on my HT-enabled dual-core Intel Broadwell NUC because there I want to use 2-threads-per-physical-core and a single 4-thread job gives me nearly the same throughput as separate jobs using -cpu 0,2 and -cpu 1,3. I need to carefully re-read the README.html page to try to catch remaining such ,-versus-: mixups, because they are easy to overlook. |
||
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Mlucas version 17.1 | ewmayer | Mlucas | 96 | 2019-10-16 12:55 |
Mlucas on ubuntu | Damian | Mlucas | 17 | 2017-11-13 18:12 |
Mlucas version 17 | ewmayer | Mlucas | 3 | 2017-06-17 11:18 |
MLucas on IBM Mainframe | Lorenzo | Mlucas | 52 | 2016-03-13 08:45 |
mlucas on sun | delta_t | Mlucas | 14 | 2007-10-04 05:45 |