20140920, 00:10  #23  
∂^{2}ω=0
Sep 2002
República de California
2×13×443 Posts 
Quote:


20141213, 05:05  #24 
∂^{2}ω=0
Sep 2002
República de California
10110011111110_{2} Posts 
V14.1 is available  details via the readmefile link in the opening post.

20141213, 06:02  #25 
Romulan Interpreter
Jun 2011
Thailand
2220_{16} Posts 
How does the newer version compares with P95? I mean, I have read your "less than two times slower" stuff there, but I assume that is a figure of speech...
(hey, I am the guy who DCed Mike's work, remember? ) 
20141213, 07:39  #26  
∂^{2}ω=0
Sep 2002
República de California
2×13×443 Posts 
Quote:
Code:
FFT(K) msec/iter (4threaded)   1024 2.65 1152 3.15 1280 3.43 1408 4.01 1536 4.19 1664 4.61 1792 4.81 1920 5.29 2048 5.35 2304 6.07 2560 6.51 2816 7.54 3072 8.40 3328 8.74 3584 9.13 3840 10.16 4096 10.54 4608 11.98 5120 13.80 5632 15.92 6144 17.54 6656 18.62 7168 19.69 7680 22.00 

20141213, 12:32  #27 
Jan 2008
France
2^{4}×3×11 Posts 
For comparison, http://mersenneforum.org/showpost.ph...&postcount=633
i54670K @ 3.8 GHz, Dual DDR3 1600 Code:
Best time for 1024K FFT length: 1.336 ms., avg: 1.374 ms. Best time for 1280K FFT length: 1.839 ms., avg: 1.865 ms. Best time for 1536K FFT length: 2.333 ms., avg: 2.370 ms. Best time for 1792K FFT length: 2.833 ms., avg: 3.277 ms. Best time for 2048K FFT length: 3.350 ms., avg: 3.374 ms. Best time for 2560K FFT length: 4.239 ms., avg: 4.276 ms. Best time for 3072K FFT length: 5.124 ms., avg: 5.155 ms. Best time for 3584K FFT length: 6.006 ms., avg: 6.042 ms. Best time for 4096K FFT length: 6.970 ms., avg: 7.000 ms. Best time for 5120K FFT length: 8.705 ms., avg: 8.745 ms. Best time for 6144K FFT length: 10.496 ms., avg: 10.543 ms. Best time for 7168K FFT length: 12.371 ms., avg: 12.451 ms. Best time for 8192K FFT length: 14.673 ms., avg: 14.735 ms. 
20141213, 22:12  #28  
∂^{2}ω=0
Sep 2002
República de California
2×13×443 Posts 
Quote:
BTW, if anyone has access to a Broadwell system running Linux (or MingGW64 under Windoze), I'd very much appreciate tmings on such, and have some special preprocessorflagstotryforBroadwell, as well. 

20141214, 11:03  #29  
Jan 2008
France
528_{10} Posts 
Quote:
Quote:


20141214, 12:05  #30 
Jan 2008
France
2^{4}×3×11 Posts 
I gave Mlucas a try on my i74770K.
Code:
gcc c Os m64 DUSE_AVX2 DUSE_THREADS *.c rm f rng*.o util.o qfloat.o gcc c O1 m64 DUSE_AVX2 DUSE_THREADS rng*.c util.c qfloat.c gcc o Mlucas *.o lm lpthread lrt ./Mlucas fftlen 192 iters 100 radset 0 nthread 2 ... 100 iterations of M3888517 with FFT length 196608 = 192 K Res64: 579D593FCE0707B2. AvgMaxErr = 0.274916295. MaxErr = 0.343750000. Program: E14.1 Res mod 2^36 = 67881076658 Res mod 2^35  1 = 21674900403 Res mod 2^36  1 = 42893438228 Code:
This particular testcase should produce the following 100iteration residues, with some platformdependent variability in the roundoff errors : 100 iterations of M3888509 with FFT length 196608 = 192 K Res64: 71E61322CCFB396C. AvgMaxErr = 0.226967076. MaxErr = 0.281250000. Program: E3.0x Res mod 2^36 = 12028950892 Res mod 2^35  1 = 29259839105 Res mod 2^36  1 = 50741070790 How do you get an output similar to Prime95 benchmark? 
20141217, 02:30  #31  
∂^{2}ω=0
Sep 2002
República de California
2×13×443 Posts 
Quote:
./Mlucas m 3888509 fftlen 192 iters 100 radset 0 nthread 2 you will see the result indicated on the webpage (which I have since corrected). Thanks for the catch. Quote:
./Mlucas s m iters 1000 1000 iters gives cleaner timings (and better roundoff testing) than the "quick look" 100iter tests. With no #threads specified the code will use all the physical cores on your system. The README page discusses all this stuff. 

20141221, 07:34  #32  
∂^{2}ω=0
Sep 2002
República de California
2×13×443 Posts 
Quote:
Quote:
Here 4threaded results for my Haswell system: [Worker #1 Dec 19 16:21] Timing FFTs using 4 threads. [Worker #1 Dec 19 16:21] Timing 39 iterations of 1024K FFT length. Best time: 1.293 ms., avg time: 1.344 ms. [Worker #1 Dec 19 16:21] Timing 31 iterations of 1280K FFT length. Best time: 1.825 ms., avg time: 1.850 ms. [Worker #1 Dec 19 16:21] Timing 26 iterations of 1536K FFT length. Best time: 1.993 ms., avg time: 2.305 ms. [Worker #1 Dec 19 16:21] Timing 25 iterations of 1792K FFT length. Best time: 2.317 ms., avg time: 2.356 ms. [Worker #1 Dec 19 16:21] Timing 25 iterations of 2048K FFT length. Best time: 2.766 ms., avg time: 2.785 ms. [Worker #1 Dec 19 16:21] Timing 25 iterations of 2560K FFT length. Best time: 3.462 ms., avg time: 3.500 ms. [Worker #1 Dec 19 16:21] Timing 25 iterations of 3072K FFT length. Best time: 4.141 ms., avg time: 4.190 ms. [Worker #1 Dec 19 16:21] Timing 25 iterations of 3584K FFT length. Best time: 4.957 ms., avg time: 5.009 ms. [Worker #1 Dec 19 16:21] Timing 25 iterations of 4096K FFT length. Best time: 5.639 ms., avg time: 5.722 ms. [Worker #1 Dec 19 16:21] Timing 25 iterations of 5120K FFT length. Best time: 7.151 ms., avg time: 7.202 ms. [Worker #1 Dec 19 16:21] Timing 25 iterations of 6144K FFT length. Best time: 8.471 ms., avg time: 8.639 ms. [Worker #1 Dec 19 16:21] Timing 25 iterations of 7168K FFT length. Best time: 10.197 ms., avg time: 10.272 ms. [Worker #1 Dec 19 16:21] Timing 25 iterations of 8192K FFT length. Best time: 11.917 ms., avg time: 11.952 ms. Now assembling the average times for 4threaded Prime95 and Mlucas (update of previous table, now using 10000iter timings run after a reboot, right after which I ran the above Prime95 timing test) at the above FFT lengths (plus the intermediate radix9/11/13/15based ones supported by Mlucas) and supplementing with the resulting [Mlucas/Prime95] timing ratio (for cases where the FFT length in question is not supported by Prime95, use its timing at the nexthigher length as the denominator): Code:
FFTlen Prime95 Mlucas Timing Ratio (Kdbl) msec/iter msec/iter [Mlucas/P95]     1024 1.344 2.60 1.93 1152 3.13 1.69 1280 1.850 3.56 1.92 1408 3.98 1.73 1536 2.305 4.02 1.74 1664 4.63 1.97 1792 2.356 4.70 1.99 1920 5.29 1.90 2048 2.785 5.29 1.90 2304 6.00 1.71 2560 3.500 6.44 1.84 2816 7.47 1.78 3072 4.190 8.25 1.97 3328 8.84 1.76 3584 5.009 9.02 1.80 3840 10.06 1.76 4096 5.722 10.46 1.83 4608 11.78 1.64 5120 7.202 13.47 1.87 5632 15.52 1.80 6144 8.639 17.40 2.01 6656 18.48 1.80 7168 10.272 19.02 1.85 7680 21.49 1.80 8192 11.952 22.33 1.87 

20150522, 06:40  #33 
∂^{2}ω=0
Sep 2002
República de California
2×13×443 Posts 
Here is the headtohead comparison on my new Xyzzybuilt Broadwell (i3) NUC, both programs run 4threaded on the 2 physical cores of the system (that setup gives best periteration timing for both on this system)  these timings and ratios can be compared to the Haswell ones in the above post:
Code:
FFTlen Prime95 Mlucas Timing Ratio (Kdbl) msec/iter msec/iter [Mlucas/P95] Comments      1024 3.894 6.869 1.76 1152 4.634 8.294 1.79 1280 4.990 8.702 1.74 1408 5.502 10.118 1.84 [Prime95 1440K] 1536 6.203 10.298 1.66 1664 6.506 11.562 1.78 [Prime95: average of the 1600K and 1728K timings] 1792 7.473 11.904 1.59 1920 7.843 13.186 1.68 2048 7.898 13.946 1.77 2304 8.889 15.846 1.78 2560 9.930 17.281 1.74 2816 11.369 19.931 1.75 [Prime95 2880K] 3072 12.465 22.373 1.79 3328 13.688 23.541 1.72 [Prime95 3360K] 3584 14.567 25.318 1.74 3840 16.079 27.987 1.74 4096 16.917 29.488 1.74 4608 19.762 34.077 1.72 5120 21.736 37.573 1.73 5632 25.657 43.197 1.68 [Prime95 5760K] 6144 26.867 50.179 1.87 6656 30.958 51.091 1.65 [Prime95 6720K] 7168 32.399 54.929 1.70 7680 34.025 60.411 1.78 8192 34.791 65.911 1.89 Avg: 1.75 Last fiddled with by ewmayer on 20150522 at 06:41 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Mlucas v18 available  ewmayer  Mlucas  48  20191128 02:53 
Mlucas on ubuntu  Damian  Mlucas  17  20171113 18:12 
Mlucas version 17  ewmayer  Mlucas  3  20170617 11:18 
MLucas on IBM Mainframe  Lorenzo  Mlucas  52  20160313 08:45 
mlucas on sun  delta_t  Mlucas  14  20071004 05:45 