mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software > Mlucas

Reply
 
Thread Tools
Old 2017-03-13, 20:08   #34
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

10208 Posts
Default

Quote:
Originally Posted by ewmayer View Post
I'm not familiar enough with ARM to understand why -m64 is unsupported in GCC, but correctly handling aarch64 in platform.h will cause the build to be in 64-bit mode. (I had assumed -m64 was needed to trigger the aarch64-related predefs, but your output from [1] will settle that.)
gcc ARM comes in 2 flavors: one that targets 64-bit code while the other targets 32-bit code, so there's no need for -m64 or -m32.
ldesnogu is offline   Reply With Quote
Old 2017-03-13, 20:22   #35
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

24·3·11 Posts
Default

Quote:
Originally Posted by ewmayer View Post
gcc -c -Os -m64 -DUSE_THREADS ../Mlucas.c
For that to succeed, you need this:
Code:
$ diff platform.h~ platform.h
714a715,728
> #elif defined(__AARCH64EL__)
>     #ifndef OS_BITS
>         #define OS_BITS 32
>     #endif
>     #define CPU_TYPE
>     #define CPU_IS_ARM_EABI
>     #if(defined(__GNUC__) || defined(__GNUG__))
>         #define COMPILER_TYPE
>         #define COMPILER_TYPE_GCC
>     #else
>         #define COMPILER_TYPE
>         #define COMPILER_TYPE_UNKNOWN
>     #endif
>
And it compiles:
Code:
$ aarch64-none-linux-gnu-gcc -Os -DUSE_THREADS -c *.c
$ aarch64-none-linux-gnu-gcc -Os -DUSE_THREADS *.o -o mlucas64 -lm -lpthread
$ file mlucas64
mlucas64: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, for GNU/Linux 3.7.0, not stripped
Tested with QEMU, it starts but I have no clue how I should launch the binary to do something sensible that doesn't take forever :)

Last fiddled with by ldesnogu on 2017-03-13 at 20:38
ldesnogu is offline   Reply With Quote
Old 2017-03-13, 21:41   #36
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2×13×443 Posts
Default

Quote:
Originally Posted by Lorenzo View Post
Ok! I have done!
[CODE]ubuntu@pine64:~/Solaris2/mlucas-14.1$ gcc -dM -E - < /dev/null
[snip]
Thanks! The key predefine there is __aarch64__, which is also the trigger in the .h file I posted ... so the latter should allow you to build. So I don't understand the raft of 'stray character' errors you get with that one - here are line 88-90 of that header:
Code:
#elif(defined(_AIX))
	#define	OS_TYPE
	#define	OS_TYPE_AIX
Can you open both the original and new .h in an editor, and compare the file encodings? If those are the same, can you diff your local copies of those 2 file versions? Maybe that will reveal something relevant to the stary-octals errors you are getting.

Quote:
Originally Posted by ldesnogu View Post
For that to succeed, you need this:
Code:
$ diff platform.h~ platform.h
714a715,728
> #elif defined(__AARCH64EL__)
>     #ifndef OS_BITS
>         #define OS_BITS 32
>     #endif
>     #define CPU_TYPE
>     #define CPU_IS_ARM_EABI
>     #if(defined(__GNUC__) || defined(__GNUG__))
>         #define COMPILER_TYPE
>         #define COMPILER_TYPE_GCC
>     #else
>         #define COMPILER_TYPE
>         #define COMPILER_TYPE_UNKNOWN
>     #endif
>
And it compiles:
Code:
$ aarch64-none-linux-gnu-gcc -Os -DUSE_THREADS -c *.c
$ aarch64-none-linux-gnu-gcc -Os -DUSE_THREADS *.o -o mlucas64 -lm -lpthread
$ file mlucas64
mlucas64: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, for GNU/Linux 3.7.0, not stripped
Tested with QEMU, it starts but I have no clue how I should launch the binary to do something sensible that doesn't take forever :)
That sets the wrong value of OS_BITS - for basic C-code Mlucas builds that won't matter much except for various utility functions which make heavy use of 64-bit-int math (e.g. the quad-float library used for high-precision inits of double constants), but for future asm-code builds we need the right bitness to be set. The predef section beginning at line 792 in the .h I posted should work just fine for Lorenzo, and you as well - did you try building with that, or did you just make your mod above and use it? Please try the unmodified .h file - the one with the __aarch64__ predef stuff at line 792 and let me know if you get the same unrecognized-char errors as Lorenzo.

You can quick-test the binary by trying some timing runs at a specific FFT length, say

./Mlucas -fftlen 1024 -nthread 1

will try all radix combos available @1024K and write the best-timing one to the mlucas.cfg file. You can also play with the threadcount - note the default there is to try to use all available cores.
ewmayer is offline   Reply With Quote
Old 2017-03-13, 22:35   #37
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

24×3×11 Posts
Default

Quote:
Originally Posted by ewmayer View Post
That sets the wrong value of OS_BITS - for basic C-code Mlucas builds that won't matter much except for various utility functions which make heavy use of 64-bit-int math (e.g. the quad-float library used for high-precision inits of double constants), but for future asm-code builds we need the right bitness to be set. The predef section beginning at line 792 in the .h I posted should work just fine for Lorenzo, and you as well - did you try building with that, or did you just make your mod above and use it? Please try the unmodified .h file - the one with the __aarch64__ predef stuff at line 792 and let me know if you get the same unrecognized-char errors as Lorenzo.
Silly me, I had missed your attachment. It compiles fine with it. So Lorenzo's error comes from somewhere else.

Quote:
You can quick-test the binary by trying some timing runs at a specific FFT length, say

./Mlucas -fftlen 1024 -nthread 1

will try all radix combos available @1024K and write the best-timing one to the mlucas.cfg file. You can also play with the threadcount - note the default there is to try to use all available cores.
Code:
/work/qemu/qemu/aarch64-linux-user/qemu-aarch64 -L /work/Cross/fsf-6.169/aarch64-none-linux-gnu/libc ./mlucas64 -fftlen 1024 -nthread 1 -iters 1

    Mlucas 14.1

    http://hogranch.com/mayer/README.html

INFO: testing qfloat routines...
CPU Family = ARM Embedded ABI, OS = Linux, 64-bit Version, compiled with Gnu C [or other compatible], Version 6.3.1 20170118.
INFO: Using inline-macro form of MUL_LOHI64.
INFO: MLUCAS_PATH is set to ""
INFO: using 53-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation. 
INFO: testing IMUL routines...
INFO: System has 4 available processor cores.
INFO: testing FFT radix tables...
All MaxErr are at 0.
ldesnogu is offline   Reply With Quote
Old 2017-03-13, 22:46   #38
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2×13×443 Posts
Default

Thanks, Laurent - so I suspect a file-encoding issue, with Lorenzo's .h file downloaded from my post, or perhaps his unzip utility inserted a bunch of garbage chars.
ewmayer is offline   Reply With Quote
Old 2017-03-14, 06:56   #39
Lorenzo
 
Lorenzo's Avatar
 
Aug 2010
Republic of Belarus

AA16 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Thanks, Laurent - so I suspect a file-encoding issue, with Lorenzo's .h file downloaded from my post, or perhaps his unzip utility inserted a bunch of garbage chars.
RIght! Sorry, found issue.
It's working nice!) So withoit SIMD optimization it looks like:
Code:
ubuntu@pine64:~/Solaris2/mlucas-14.1$ cat mlucas.cfg
14.1
      1024  msec/iter =  114.57  ROE[avg,max] = [0.250000000, 0.250000000]  radices =  32 32 16 32  0  0  0  0  0  0
      1152  msec/iter =  109.04  ROE[avg,max] = [0.206808036, 0.250000000]  radices = 288  8 16 16  0  0  0  0  0  0
      1280  msec/iter =  133.03  ROE[avg,max] = [0.236600167, 0.281250000]  radices = 160 16 16 16  0  0  0  0  0  0
      1408  msec/iter =  140.47  ROE[avg,max] = [0.273688616, 0.343750000]  radices = 176 16 16 16  0  0  0  0  0  0
      1536  msec/iter =  161.30  ROE[avg,max] = [0.223493304, 0.281250000]  radices = 192 16 16 16  0  0  0  0  0  0
      1664  msec/iter =  166.09  ROE[avg,max] = [0.246149554, 0.312500000]  radices = 208 16 16 16  0  0  0  0  0  0
      1792  msec/iter =  180.60  ROE[avg,max] = [0.220703125, 0.281250000]  radices = 224 16 16 16  0  0  0  0  0  0
      1920  msec/iter =  198.81  ROE[avg,max] = [0.222460938, 0.250000000]  radices = 240 16 16 16  0  0  0  0  0  0
      2048  msec/iter =  206.38  ROE[avg,max] = [0.278125000, 0.281250000]  radices = 256 16 16 16  0  0  0  0  0  0
      2304  msec/iter =  242.52  ROE[avg,max] = [0.208269392, 0.250000000]  radices = 288 16 16 16  0  0  0  0  0  0
      2560  msec/iter =  308.94  ROE[avg,max] = [0.243164062, 0.281250000]  radices = 160 16 16 32  0  0  0  0  0  0
      2816  msec/iter =  329.54  ROE[avg,max] = [0.272896903, 0.343750000]  radices = 176 16 16 32  0  0  0  0  0  0
      3072  msec/iter =  371.71  ROE[avg,max] = [0.225892857, 0.281250000]  radices = 192 16 16 32  0  0  0  0  0  0
      3328  msec/iter =  388.66  ROE[avg,max] = [0.241322545, 0.281250000]  radices = 208 16 16 32  0  0  0  0  0  0
      3584  msec/iter =  414.33  ROE[avg,max] = [0.220870536, 0.250000000]  radices = 224 16 16 32  0  0  0  0  0  0
      3840  msec/iter =  453.97  ROE[avg,max] = [0.213636998, 0.265625000]  radices = 240 16 16 32  0  0  0  0  0  0
      4096  msec/iter =  472.52  ROE[avg,max] = [0.247321429, 0.250000000]  radices = 256 16 16 32  0  0  0  0  0  0
      4608  msec/iter =  544.08  ROE[avg,max] = [0.201870292, 0.222656250]  radices = 288 16 16 32  0  0  0  0  0  0
      5120  msec/iter =  673.79  ROE[avg,max] = [0.239508929, 0.312500000]  radices = 160 16 32 32  0  0  0  0  0  0
      5632  msec/iter =  693.38  ROE[avg,max] = [0.278264509, 0.343750000]  radices = 176 16 32 32  0  0  0  0  0  0
      6144  msec/iter =  776.30  ROE[avg,max] = [0.213504464, 0.250000000]  radices = 192 16 32 32  0  0  0  0  0  0
      6656  msec/iter =  814.97  ROE[avg,max] = [0.242299107, 0.281250000]  radices = 208 16 32 32  0  0  0  0  0  0
      7168  msec/iter =  870.94  ROE[avg,max] = [0.219768415, 0.312500000]  radices = 224 16 32 32  0  0  0  0  0  0
      7680  msec/iter =  955.79  ROE[avg,max] = [0.222209821, 0.250000000]  radices = 240 16 32 32  0  0  0  0  0  0
Lorenzo is offline   Reply With Quote
Old 2017-03-14, 07:22   #40
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2×13×443 Posts
Default

Quote:
Originally Posted by Lorenzo View Post
RIght! Sorry, found issue.
It's working nice!) So withoit SIMD optimization it looks like:
Code:
ubuntu@pine64:~/Solaris2/mlucas-14.1$ cat mlucas.cfg
14.1
      1024  msec/iter =  114.57  ROE[avg,max] = [0.250000000, 0.250000000]  radices =  32 32 16 32  0  0  0  0  0  0
      1152  msec/iter =  109.04  ROE[avg,max] = [0.206808036, 0.250000000]  radices = 288  8 16 16  0  0  0  0  0  0
      1280  msec/iter =  133.03  ROE[avg,max] = [0.236600167, 0.281250000]  radices = 160 16 16 16  0  0  0  0  0  0
[snip]
Glad to hear it - what was the issue with the updated .h file? I'd like to know in case another user hits similar in future.

The only timing that really pops out is the anomalously low one @1152K ... but SIMD timings will be the ones of real interest.

How many threads did you run your self-test with? (Your screen output will indicate that, e.g. NTHREADS = {some value >= 1}.
ewmayer is offline   Reply With Quote
Old 2017-03-14, 07:46   #41
Lorenzo
 
Lorenzo's Avatar
 
Aug 2010
Republic of Belarus

2·5·17 Posts
Default

Issue was in that file was unzipped not correctly by me. So in generally it's ok.

I ran ./mlucas -s m. So looks like Mlucas used 4 cores (threads ) correctly. I didn't play with threads yet.

So in generally very slow

Last fiddled with by Lorenzo on 2017-03-14 at 08:19
Lorenzo is offline   Reply With Quote
Old 2017-03-14, 09:12   #42
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2×13×443 Posts
Default

Quote:
Originally Posted by Lorenzo View Post
I ran ./mlucas -s m. So looks like Mlucas used 4 cores (threads ) correctly. I didn't play with threads yet.

So in generally very slow
Yes - even with a 2-3x speedup from use of SIMD, the ARM will be more about performance per watt (and per hardware $) than speed-per-core.
ewmayer is offline   Reply With Quote
Old 2017-03-14, 10:19   #43
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

2·2,383 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Yes - even with a 2-3x speedup from use of SIMD, the ARM will be more about performance per watt (and per hardware $) than speed-per-core.
The following mlucas.cfg file was generated on a 2.8 GHz AMD Opteron running RedHat 64-bit linux.
Code:
        2048  sec/iter =    0.134  ROE[min,max] = [0.281250000, 0.343750000]  radices =  32 32 32 32  0  0  0  0  0  0  [Any text offset from the list-ending 0 by whitespace is ignored]
	2304  sec/iter =    0.148  ROE[min,max] = [0.242187500, 0.281250000]  radices =  36  8 16 16 16  0  0  0  0  0
	2560  sec/iter =    0.166  ROE[min,max] = [0.281250000, 0.312500000]  radices =  40  8 16 16 16  0  0  0  0  0
	2816  sec/iter =    0.188  ROE[min,max] = [0.328125000, 0.343750000]  radices =  44  8 16 16 16  0  0  0  0  0
	3072  sec/iter =    0.222  ROE[min,max] = [0.250000000, 0.250000000]  radices =  24 16 16 16 16  0  0  0  0  0
	3584  sec/iter =    0.264  ROE[min,max] = [0.281250000, 0.281250000]  radices =  28 16 16 16 16  0  0  0  0  0
	4096  sec/iter =    0.300  ROE[min,max] = [0.250000000, 0.312500000]  radices =  16 16 16 16 32  0  0  0  0  0
The following mlucas.cfg file was generated on a 1.4 GHz ARM running 64-bit linux.
Code:
      2048  msec/iter =  206.38  ROE[avg,max] = [0.278125000, 0.281250000]  radices = 256 16 16 16  0  0  0  0  0  0
      2304  msec/iter =  242.52  ROE[avg,max] = [0.208269392, 0.250000000]  radices = 288 16 16 16  0  0  0  0  0  0
      2560  msec/iter =  308.94  ROE[avg,max] = [0.243164062, 0.281250000]  radices = 160 16 16 32  0  0  0  0  0  0
      2816  msec/iter =  329.54  ROE[avg,max] = [0.272896903, 0.343750000]  radices = 176 16 16 32  0  0  0  0  0  0
      3072  msec/iter =  371.71  ROE[avg,max] = [0.225892857, 0.281250000]  radices = 192 16 16 32  0  0  0  0  0  0
      3328  msec/iter =  388.66  ROE[avg,max] = [0.241322545, 0.281250000]  radices = 208 16 16 32  0  0  0  0  0  0
      3584  msec/iter =  414.33  ROE[avg,max] = [0.220870536, 0.250000000]  radices = 224 16 16 32  0  0  0  0  0  0
      3840  msec/iter =  453.97  ROE[avg,max] = [0.213636998, 0.265625000]  radices = 240 16 16 32  0  0  0  0  0  0
      4096  msec/iter =  472.52  ROE[avg,max] = [0.247321429, 0.250000000]  radices = 256 16 16 32  0  0  0  0  0  0
In other words, a 4 threaded ARM is about 1.5x slower than one core of a 2.8 GHz Opteron.
With a 3x SIMD speedup its efficiency would be 0.5x on a per-core comparison, and 1:1 on a per-core-and-GHz comparison with the Opteron.

That's to say, a 20 ARM cores minicluster would be 20x faster on a per GHz measurement and 10x faster on a per-core measurement. And also as cheap as the single Opteron system. Not to speak about the energy saving...

Last fiddled with by ET_ on 2017-03-14 at 10:20
ET_ is online now   Reply With Quote
Old 2017-03-14, 18:42   #44
VictordeHolland
 
VictordeHolland's Avatar
 
"Victor de Hollander"
Aug 2011
the Netherlands

23·3·72 Posts
Default

You got it working, nice!
That is a Pine64 with 4x ARM Cortex A53 cores (@1.4GHz) right?

I'm a little bit surprised it is about as fast as my
Odroid-U2 (4x ARM Cortex A9 cores @1.7Ghz)
which is only 32bit and an much older architecture.
http://mersenneforum.org/showpost.ph...5&postcount=94
Code:
      1024  msec/iter =  121.70  ROE[avg,max] = [0.298214286, 0.312500000]  radices = 128 16 16 16  0  0  0  0  0  0
      1152  msec/iter =  142.69  ROE[avg,max] = [0.225310407, 0.250000000]  radices = 144 16 16 16  0  0  0  0  0  0
      1280  msec/iter =  161.44  ROE[avg,max] = [0.251618304, 0.312500000]  radices = 160 16 16 16  0  0  0  0  0  0
      1408  msec/iter =  185.52  ROE[avg,max] = [0.297056362, 0.375000000]  radices = 176 16 16 16  0  0  0  0  0  0
      1536  msec/iter =  195.56  ROE[avg,max] = [0.234742955, 0.312500000]  radices = 192 16 16 16  0  0  0  0  0  0
      1664  msec/iter =  208.36  ROE[avg,max] = [0.254631696, 0.312500000]  radices = 208 16 16 16  0  0  0  0  0  0
      1792  msec/iter =  222.32  ROE[avg,max] = [0.234012277, 0.250000000]  radices = 224 16 16 16  0  0  0  0  0  0
      1920  msec/iter =  243.65  ROE[avg,max] = [0.235016741, 0.281250000]  radices = 240 16 16 16  0  0  0  0  0  0
      2048  msec/iter =  255.25  ROE[avg,max] = [0.310714286, 0.312500000]  radices = 256 16 16 16  0  0  0  0  0  0
      2304  msec/iter =  297.26  ROE[avg,max] = [0.228341239, 0.281250000]  radices = 288 16 16 16  0  0  0  0  0  0
      2560  msec/iter =  339.70  ROE[avg,max] = [0.256682478, 0.312500000]  radices = 160 16 16 32  0  0  0  0  0  0
      2816  msec/iter =  384.56  ROE[avg,max] = [0.296219308, 0.375000000]  radices = 176 16 16 32  0  0  0  0  0  0
      3072  msec/iter =  413.85  ROE[avg,max] = [0.239704241, 0.281250000]  radices = 192 16 16 32  0  0  0  0  0  0
      3584  msec/iter =  370.28  ROE[avg,max] = [0.231487165, 0.281250000]  radices = 224 16 16 32  0  0  0  0  0  0    
      4096  msec/iter =  455.10  ROE[avg,max] = [0.282142857, 0.312500000]  radices = 128 16 32 32  0  0  0  0  0  0
In that post I also made the comparison with a Intel Core2Duo E7400 @2.8GHz, running Mprime28.7 . Looking back at it, that comparison might not have been entirely fair (Mlucas vs. Mprime) .
So I dusted off the machine and also ran Mlucas:

Intel Core2Duo E7400 @2.8GHz
NTHREADS = 1
Code:
14.1
      1024  msec/iter =   33.76  ROE[avg,max] = [0.264564732, 0.265625000]  radices =  32 32 16 32  0  0  0  0  0  0
      1152  msec/iter =   40.30  ROE[avg,max] = [0.237220982, 0.273437500]  radices =  36 16 32 32  0  0  0  0  0  0
      1280  msec/iter =   45.42  ROE[avg,max] = [0.251841518, 0.296875000]  radices =  40 16 32 32  0  0  0  0  0  0
      1408  msec/iter =   52.31  ROE[avg,max] = [0.285110910, 0.375000000]  radices =  44 16 32 32  0  0  0  0  0  0
      1536  msec/iter =   53.31  ROE[avg,max] = [0.239299665, 0.281250000]  radices =  24 32 32 32  0  0  0  0  0  0
      1664  msec/iter =   61.81  ROE[avg,max] = [0.261802455, 0.312500000]  radices =  52 16 32 32  0  0  0  0  0  0
      1792  msec/iter =   65.81  ROE[avg,max] = [0.267229353, 0.312500000]  radices =  28 32 32 32  0  0  0  0  0  0
      1920  msec/iter =   70.98  ROE[avg,max] = [0.243638393, 0.281250000]  radices =  60 16 32 32  0  0  0  0  0  0
      2048  msec/iter =   71.88  ROE[avg,max] = [0.257366071, 0.257812500]  radices =  32 32 32 32  0  0  0  0  0  0
      2304  msec/iter =   81.60  ROE[avg,max] = [0.236948940, 0.281250000]  radices =  36 32 32 32  0  0  0  0  0  0
      2560  msec/iter =   90.96  ROE[avg,max] = [0.255691964, 0.312500000]  radices =  40 32 32 32  0  0  0  0  0  0
      2816  msec/iter =  102.69  ROE[avg,max] = [0.283956473, 0.343750000]  radices =  44 32 32 32  0  0  0  0  0  0
      3072  msec/iter =  112.85  ROE[avg,max] = [0.233879743, 0.265625000]  radices =  48 32 32 32  0  0  0  0  0  0
      3328  msec/iter =  123.71  ROE[avg,max] = [0.267947824, 0.312500000]  radices =  52 32 32 32  0  0  0  0  0  0
      3584  msec/iter =  135.08  ROE[avg,max] = [0.267689732, 0.301757812]  radices =  56 32 32 32  0  0  0  0  0  0
      3840  msec/iter =  144.52  ROE[avg,max] = [0.242107282, 0.281250000]  radices =  60 32 32 32  0  0  0  0  0  0
      4096  msec/iter =  154.69  ROE[avg,max] = [0.263169643, 0.281250000]  radices =  64 32 32 32  0  0  0  0  0  0
      4608  msec/iter =  177.26  ROE[avg,max] = [0.236798968, 0.281250000]  radices =  36 16 16 16 16  0  0  0  0  0
      5120  msec/iter =  201.17  ROE[avg,max] = [0.257240513, 0.312500000]  radices =  40 16 16 16 16  0  0  0  0  0
      5632  msec/iter =  224.76  ROE[avg,max] = [0.291057478, 0.375000000]  radices =  44 16 16 16 16  0  0  0  0  0
      6144  msec/iter =  244.47  ROE[avg,max] = [0.233741978, 0.265625000]  radices =  48 16 16 16 16  0  0  0  0  0
      6656  msec/iter =  271.08  ROE[avg,max] = [0.264965820, 0.312500000]  radices =  52 16 16 16 16  0  0  0  0  0
      7168  msec/iter =  292.72  ROE[avg,max] = [0.274094936, 0.312500000]  radices =  56 16 16 16 16  0  0  0  0  0
      7680  msec/iter =  312.74  ROE[avg,max] = [0.249065290, 0.290039062]  radices =  60 16 16 16 16  0  0  0  0  0
NTHREADS = 2
Code:
14.1
      1024  msec/iter =   21.01  ROE[avg,max] = [0.273214286, 0.281250000]  radices =  32 16 32 32  0  0  0  0  0  0
      1152  msec/iter =   25.43  ROE[avg,max] = [0.237220982, 0.273437500]  radices =  36 16 32 32  0  0  0  0  0  0
      1280  msec/iter =   28.85  ROE[avg,max] = [0.259319196, 0.312500000]  radices =  20 32 32 32  0  0  0  0  0  0
      1408  msec/iter =   35.14  ROE[avg,max] = [0.280566406, 0.343750000]  radices = 176 16 16 16  0  0  0  0  0  0
      1536  msec/iter =   33.98  ROE[avg,max] = [0.239299665, 0.281250000]  radices =  24 32 32 32  0  0  0  0  0  0
      1664  msec/iter =   38.98  ROE[avg,max] = [0.261802455, 0.312500000]  radices =  52 16 32 32  0  0  0  0  0  0
      1792  msec/iter =   40.84  ROE[avg,max] = [0.267229353, 0.312500000]  radices =  28 32 32 32  0  0  0  0  0  0
      1920  msec/iter =   45.63  ROE[avg,max] = [0.243638393, 0.281250000]  radices =  60 16 32 32  0  0  0  0  0  0
      2048  msec/iter =   45.92  ROE[avg,max] = [0.257366071, 0.257812500]  radices =  32 32 32 32  0  0  0  0  0  0
      2304  msec/iter =   54.36  ROE[avg,max] = [0.236948940, 0.281250000]  radices =  36 32 32 32  0  0  0  0  0  0
      2560  msec/iter =   54.64  ROE[avg,max] = [0.255691964, 0.312500000]  radices =  40 32 32 32  0  0  0  0  0  0
      2816  msec/iter =   63.06  ROE[avg,max] = [0.283956473, 0.343750000]  radices =  44 32 32 32  0  0  0  0  0  0
      3072  msec/iter =   67.77  ROE[avg,max] = [0.233879743, 0.265625000]  radices =  48 32 32 32  0  0  0  0  0  0
      3328  msec/iter =   74.36  ROE[avg,max] = [0.267947824, 0.312500000]  radices =  52 32 32 32  0  0  0  0  0  0
      3584  msec/iter =   79.71  ROE[avg,max] = [0.267689732, 0.301757812]  radices =  56 32 32 32  0  0  0  0  0  0
      3840  msec/iter =   87.04  ROE[avg,max] = [0.242107282, 0.281250000]  radices =  60 32 32 32  0  0  0  0  0  0
      4096  msec/iter =   92.87  ROE[avg,max] = [0.263169643, 0.281250000]  radices =  64 32 32 32  0  0  0  0  0  0
      4608  msec/iter =  106.31  ROE[avg,max] = [0.238187081, 0.281250000]  radices = 288 16 16 32  0  0  0  0  0  0
      5120  msec/iter =  116.95  ROE[avg,max] = [0.241458566, 0.312500000]  radices = 160 16 32 32  0  0  0  0  0  0
      5632  msec/iter =  147.80  ROE[avg,max] = [0.278641183, 0.312500000]  radices = 176 16 32 32  0  0  0  0  0  0
      6144  msec/iter =  150.32  ROE[avg,max] = [0.247349330, 0.281250000]  radices = 192 16 32 32  0  0  0  0  0  0
      6656  msec/iter =  164.51  ROE[avg,max] = [0.250781250, 0.289062500]  radices = 208 16 32 32  0  0  0  0  0  0
      7168  msec/iter =  172.77  ROE[avg,max] = [0.277169364, 0.343750000]  radices = 224 16 32 32  0  0  0  0  0  0
      7680  msec/iter =  191.50  ROE[avg,max] = [0.253627232, 0.281250000]  radices = 240 16 32 32  0  0  0  0  0  0
I also reran the Mprime 28.7 benchmark:
Code:
[Tue Mar 14 19:28:48 2017]
Compare your results to other computers at http://www.mersenne.org/report_benchmarks
Intel(R) Core(TM)2 Duo CPU     E7400  @ 2.80GHz
CPU speed: 2800.02 MHz, 2 cores
CPU features: Prefetch, SSE, SSE2, SSE4
L1 cache size: 32 KB
L2 cache size: 3 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 256
Prime95 64-bit version 28.7, RdtscTiming=1
Best time for 1024K FFT length: 16.199 ms., avg: 16.704 ms.
Best time for 1280K FFT length: 20.961 ms., avg: 21.575 ms.
Best time for 1536K FFT length: 26.163 ms., avg: 27.718 ms.
Best time for 1792K FFT length: 30.755 ms., avg: 32.141 ms.
Best time for 2048K FFT length: 34.946 ms., avg: 38.731 ms.
Best time for 2560K FFT length: 43.191 ms., avg: 46.909 ms.
Best time for 3072K FFT length: 53.965 ms., avg: 59.120 ms.
Best time for 3584K FFT length: 69.864 ms., avg: 83.959 ms.
Best time for 4096K FFT length: 71.973 ms., avg: 72.495 ms.
Best time for 5120K FFT length: 87.800 ms., avg: 88.870 ms.
Best time for 6144K FFT length: 110.473 ms., avg: 111.362 ms.
Best time for 7168K FFT length: 131.831 ms., avg: 132.743 ms.
Best time for 8192K FFT length: 146.812 ms., avg: 147.631 ms.
Timing FFTs using 2 threads.
Best time for 1024K FFT length: 15.401 ms., avg: 15.644 ms.
Best time for 1280K FFT length: 18.143 ms., avg: 19.026 ms.
Best time for 1536K FFT length: 21.927 ms., avg: 22.995 ms.
Best time for 1792K FFT length: 26.605 ms., avg: 27.481 ms.
Best time for 2048K FFT length: 30.460 ms., avg: 31.351 ms.
Best time for 2560K FFT length: 38.699 ms., avg: 39.689 ms.
Best time for 3072K FFT length: 47.988 ms., avg: 49.353 ms.
Best time for 3584K FFT length: 85.181 ms., avg: 85.865 ms.
Best time for 4096K FFT length: 62.209 ms., avg: 66.705 ms.
Best time for 5120K FFT length: 79.554 ms., avg: 80.260 ms.
Best time for 6144K FFT length: 92.489 ms., avg: 94.000 ms.
Best time for 7168K FFT length: 116.309 ms., avg: 119.709 ms.
Best time for 8192K FFT length: 125.236 ms., avg: 128.261 ms.

Timings for 1024K FFT length (1 cpu, 1 worker): 16.37 ms.  Throughput: 61.08 iter/sec.
Timings for 1024K FFT length (2 cpus, 2 workers): 30.59, 31.69 ms.  Throughput: 64.25 iter/sec.
Timings for 1280K FFT length (1 cpu, 1 worker): 21.24 ms.  Throughput: 47.07 iter/sec.
Timings for 1280K FFT length (2 cpus, 2 workers): 37.86, 39.14 ms.  Throughput: 51.96 iter/sec.
Timings for 1536K FFT length (1 cpu, 1 worker): 26.08 ms.  Throughput: 38.34 iter/sec.
Timings for 1536K FFT length (2 cpus, 2 workers): 45.43, 47.68 ms.  Throughput: 42.99 iter/sec.
Timings for 1792K FFT length (1 cpu, 1 worker): 31.05 ms.  Throughput: 32.21 iter/sec.
Timings for 1792K FFT length (2 cpus, 2 workers): 52.50, 53.32 ms.  Throughput: 37.81 iter/sec.
Timings for 2048K FFT length (1 cpu, 1 worker): 35.05 ms.  Throughput: 28.53 iter/sec.
Timings for 2048K FFT length (2 cpus, 2 workers): 61.40, 63.17 ms.  Throughput: 32.12 iter/sec.
Timings for 2560K FFT length (1 cpu, 1 worker): 43.36 ms.  Throughput: 23.06 iter/sec.
Timings for 2560K FFT length (2 cpus, 2 workers): 77.50, 79.16 ms.  Throughput: 25.54 iter/sec.
Timings for 3072K FFT length (1 cpu, 1 worker): 53.71 ms.  Throughput: 18.62 iter/sec.
Timings for 3072K FFT length (2 cpus, 2 workers): 96.11, 97.25 ms.  Throughput: 20.69 iter/sec.
Timings for 3584K FFT length (1 cpu, 1 worker): 67.86 ms.  Throughput: 14.74 iter/sec.
Timings for 3584K FFT length (2 cpus, 2 workers): 164.50, 169.02 ms.  Throughput: 12.00 iter/sec.
Timings for 4096K FFT length (1 cpu, 1 worker): 71.87 ms.  Throughput: 13.91 iter/sec.
[Tue Mar 14 19:33:59 2017]
Timings for 4096K FFT length (2 cpus, 2 workers): 127.57, 128.14 ms.  Throughput: 15.64 iter/sec.
Timings for 5120K FFT length (1 cpu, 1 worker): 87.87 ms.  Throughput: 11.38 iter/sec.
Timings for 5120K FFT length (2 cpus, 2 workers): 153.62, 158.10 ms.  Throughput: 12.83 iter/sec.
Timings for 6144K FFT length (1 cpu, 1 worker): 110.52 ms.  Throughput:  9.05 iter/sec.
Timings for 6144K FFT length (2 cpus, 2 workers): 187.40, 186.73 ms.  Throughput: 10.69 iter/sec.
Timings for 7168K FFT length (1 cpu, 1 worker): 132.18 ms.  Throughput:  7.57 iter/sec.
Timings for 7168K FFT length (2 cpus, 2 workers): 236.89, 243.20 ms.  Throughput:  8.33 iter/sec.
Timings for 8192K FFT length (1 cpu, 1 worker): 151.83 ms.  Throughput:  6.59 iter/sec.
Timings for 8192K FFT length (2 cpus, 2 workers): 263.17, 260.16 ms.  Throughput:  7.64 iter/sec.
BTW: Is it possible to compile run Mlucas on Windows 7/10? If so, I could try to run benchmarks on my i5 2500k and/or i7 3770k

Last fiddled with by VictordeHolland on 2017-03-14 at 18:50 Reason: Mlucas on Windows????
VictordeHolland is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Economic prospects for solar photovoltaic power cheesehead Science & Technology 137 2018-06-26 15:46
Which SIMD flag to use for Raspberry Pi BrainStone Mlucas 14 2017-11-19 00:59
compiler/assembler optimizations possible? ixfd64 Software 7 2011-02-25 20:05
Running 32-bit builds on a Win7 system ewmayer Programming 34 2010-10-18 22:36
SIMD string->int fivemack Software 7 2009-03-23 18:15

All times are UTC. The time now is 15:02.

Wed Sep 30 15:02:27 UTC 2020 up 20 days, 12:13, 0 users, load averages: 2.05, 1.80, 1.68

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.