![]() |
![]() |
#1 |
∂2ω=0
Sep 2002
República de California
5·2,351 Posts |
![]()
The most recent official release is always available at
http://www.mersenneforum.org/mayer/README.html I'll announce major updates/bugfixes/new-prebuilt-binaries on this thread. ======================= 06 November 2009: An Alpha version of Mlucas 3.0 is available at the above page Major new features: - SSE2 support for Win32 and 32-and-64-bit Linux. Thanks to my late-in-life conversion to assembly coding I'm a few years behind George on this, but I think it's not too shabby for a first go. It's a bit slower than Prime95 cycle-for-cycle, but I'd appreciate if some folks would be willing to give up a bit of throughput in order to help test the software. Suggestions for speedups from the ASM experts are especially welcome. - Platform-independent savefile support. - Coning soon: Trial-factoring support. - Coming soon: Primenet support. - Coming later: Multithreading support for SSE2 code. (This is more important for new-prime verify than for GIMPS users). - Coming later: QT-based GUI. Let me know if you have any download/test/build issues, -Ernst Last fiddled with by ewmayer on 2017-07-03 at 00:43 Reason: url updated to reflect ftp-site migration |
![]() |
![]() |
![]() |
#2 |
∂2ω=0
Sep 2002
República de California
5·2,351 Posts |
![]()
[crickets chirping]
|
![]() |
![]() |
![]() |
#3 |
Aug 2002
Termonfeckin, IE
24×173 Posts |
![]()
Be happy to test. Note that an edit of an existing post is not caught by the "new post" mechanism. Send me a PM.
|
![]() |
![]() |
![]() |
#4 | |
∂2ω=0
Sep 2002
República de California
101101111010112 Posts |
![]() Quote:
All the code and build/run instructions are at the above link, so let me know how it goes, if you think the readme page could be clearer about anything, etc. BTW, I've been running the new code (even as I continued to expand and improve the SSE2 support) more or less continuously for the past 18 months, first on my Win32 box, then (after its fan died and I bought a macbook for my 64-bit linux port work) on my 6-month-old macbook, so I have full confidence in the stability and functional correctness of the LL-test core. At this point the coming year will be all about adding primenet support, speeding the trial-factoring capability (also already thoroughly tested) enough to make it releaseworthy, and (hopefully) squeezing some extra speed out of the inline assembler by way of detailed profiling and playing with stuff like prefetch, TLB priming, etc. Last fiddled with by ewmayer on 2009-11-09 at 16:12 |
|
![]() |
![]() |
![]() |
#5 |
P90 years forever!
Aug 2002
Yeehaw, FL
816510 Posts |
![]() |
![]() |
![]() |
![]() |
#6 |
Jun 2003
Ottawa, Canada
3·17·23 Posts |
![]()
Yay, it is finally publicly available. I will take a look at this in a couple of weeks after things settle down a bit here.
|
![]() |
![]() |
![]() |
#7 |
Jan 2008
France
59610 Posts |
![]()
Here is the result of -s m/l on my i7 920 (stock speed) running x86_64; the compiler is gcc 4.4.1.
Code:
1024 sec/iter = 0.028 ROE[min,max] = [0.250000000, 0.312500000] radices = 32 16 32 32 0 0 0 0 0 0 1152 sec/iter = 0.033 ROE[min,max] = [0.250000000, 0.250000000] radices = 36 32 32 16 0 0 0 0 0 0 1280 sec/iter = 0.037 ROE[min,max] = [0.250000000, 0.343750000] radices = 20 32 32 32 0 0 0 0 0 0 1408 sec/iter = 0.042 ROE[min,max] = [0.312500000, 0.312500000] radices = 44 16 32 32 0 0 0 0 0 0 1536 sec/iter = 0.045 ROE[min,max] = [0.265625000, 0.269042969] radices = 24 32 32 32 0 0 0 0 0 0 1792 sec/iter = 0.055 ROE[min,max] = [0.312500000, 0.312500000] radices = 28 32 32 32 0 0 0 0 0 0 2048 sec/iter = 0.061 ROE[min,max] = [0.281250000, 0.343750000] radices = 16 16 16 16 16 0 0 0 0 0 2304 sec/iter = 0.072 ROE[min,max] = [0.242187500, 0.281250000] radices = 36 32 32 32 0 0 0 0 0 0 2560 sec/iter = 0.078 ROE[min,max] = [0.281250000, 0.312500000] radices = 20 16 16 16 16 0 0 0 0 0 2816 sec/iter = 0.093 ROE[min,max] = [0.328125000, 0.343750000] radices = 44 32 32 32 0 0 0 0 0 0 3072 sec/iter = 0.098 ROE[min,max] = [0.250000000, 0.250000000] radices = 24 16 16 16 16 0 0 0 0 0 3584 sec/iter = 0.114 ROE[min,max] = [0.281250000, 0.281250000] radices = 28 16 16 16 16 0 0 0 0 0 4096 sec/iter = 0.122 ROE[min,max] = [0.250000000, 0.312500000] radices = 16 16 16 16 32 0 0 0 0 0 4608 sec/iter = 0.147 ROE[min,max] = [0.257812500, 0.257812500] radices = 36 16 16 16 16 0 0 0 0 0 5120 sec/iter = 0.157 ROE[min,max] = [0.281250000, 0.312500000] radices = 20 16 16 16 32 0 0 0 0 0 5632 sec/iter = 0.191 ROE[min,max] = [0.375000000, 0.375000000] radices = 44 16 16 16 16 0 0 0 0 0 6144 sec/iter = 0.198 ROE[min,max] = [0.250000000, 0.296875000] radices = 24 16 16 16 32 0 0 0 0 0 7168 sec/iter = 0.232 ROE[min,max] = [0.268554688, 0.281250000] radices = 28 16 16 16 32 0 0 0 0 0 8192 sec/iter = 0.253 ROE[min,max] = [0.281250000, 0.312500000] radices = 16 16 16 32 32 0 0 0 0 0 Last fiddled with by ldesnogu on 2009-11-09 at 22:15 |
![]() |
![]() |
![]() |
#8 |
∂2ω=0
Sep 2002
República de California
101101111010112 Posts |
![]()
Thanks, Laurent - Interesting that FFT lengths of the form 11*2^k are actually (modestly) useful on your 920 ... on both my Core2-based machines (WinXP/32-bit/MSVC and MacOS/64-bit/GCC-4.2) those are slower than the next-larger FFT length, often by quite a lot - you can see this in the sample timing tables on my README page. Your timings are much closer to what I would expect based on arithmetic opcount -- since data access patterns are similar and memory footprints also, I expected opcount would be the major timing across a variety of platforms. (It is, except for the "surprise" I got with the 11*2^k data).
|
![]() |
![]() |
![]() |
#9 |
Jan 2008
France
25416 Posts |
![]()
At first I thought it could be some compiler issue but running your executable (compiled with gcc 4.2.1) gives very similar results:
Code:
1024 sec/iter = 0.028 ROE[min,max] = [0.250000000, 0.312500000] radices = 32 16 32 32 0 0 0 0 0 0 1152 sec/iter = 0.034 ROE[min,max] = [0.250000000, 0.250000000] radices = 36 32 32 16 0 0 0 0 0 0 1280 sec/iter = 0.037 ROE[min,max] = [0.250000000, 0.343750000] radices = 20 32 32 32 0 0 0 0 0 0 1408 sec/iter = 0.042 ROE[min,max] = [0.312500000, 0.312500000] radices = 44 16 32 32 0 0 0 0 0 0 1536 sec/iter = 0.045 ROE[min,max] = [0.265625000, 0.269042969] radices = 24 32 32 32 0 0 0 0 0 0 1792 sec/iter = 0.056 ROE[min,max] = [0.312500000, 0.312500000] radices = 28 32 32 32 0 0 0 0 0 0 2048 sec/iter = 0.060 ROE[min,max] = [0.281250000, 0.343750000] radices = 16 16 16 16 16 0 0 0 0 0 2304 sec/iter = 0.073 ROE[min,max] = [0.242187500, 0.281250000] radices = 36 32 32 32 0 0 0 0 0 0 2560 sec/iter = 0.077 ROE[min,max] = [0.281250000, 0.312500000] radices = 20 16 16 16 16 0 0 0 0 0 2816 sec/iter = 0.094 ROE[min,max] = [0.328125000, 0.343750000] radices = 44 32 32 32 0 0 0 0 0 0 3072 sec/iter = 0.097 ROE[min,max] = [0.250000000, 0.250000000] radices = 24 16 16 16 16 0 0 0 0 0 3584 sec/iter = 0.114 ROE[min,max] = [0.281250000, 0.281250000] radices = 28 16 16 16 16 0 0 0 0 0 4096 sec/iter = 0.122 ROE[min,max] = [0.250000000, 0.312500000] radices = 16 16 16 16 32 0 0 0 0 0 4608 sec/iter = 0.147 ROE[min,max] = [0.257812500, 0.257812500] radices = 36 16 16 16 16 0 0 0 0 0 5120 sec/iter = 0.156 ROE[min,max] = [0.281250000, 0.312500000] radices = 20 16 16 16 32 0 0 0 0 0 5632 sec/iter = 0.193 ROE[min,max] = [0.375000000, 0.375000000] radices = 44 16 16 16 16 0 0 0 0 0 6144 sec/iter = 0.196 ROE[min,max] = [0.250000000, 0.296875000] radices = 24 16 16 16 32 0 0 0 0 0 7168 sec/iter = 0.231 ROE[min,max] = [0.268554688, 0.281250000] radices = 28 16 16 16 32 0 0 0 0 0 8192 sec/iter = 0.252 ROE[min,max] = [0.281250000, 0.312500000] radices = 16 16 16 32 32 0 0 0 0 0 Last fiddled with by ldesnogu on 2009-11-09 at 23:16 |
![]() |
![]() |
![]() |
#10 |
May 2009
7 Posts |
![]()
Congratulations on this milestone!
May I ask about the roadmap for the RISC versions of Mlucas? It is fully understandable why they wouldn't be a priority, but one can still hope, right? A feature like PrimeNet integration would be an awesome advance! -smoky |
![]() |
![]() |
![]() |
#11 |
Jul 2006
Calgary
52·17 Posts |
![]()
While trying Mlucas 3.0x (binary download for Linux 64)
./Mlucas_AMD64 -s a on a AMD Sempron 64 on 2.6.26-2-amd64 x86_64 GNU/Linux model name : AMD Sempron(tm) Processor 2600+ stepping : 2 cpu MHz : 1600.059 cache size : 128 KB It run all thru the full set if sizes the first try but mprime was running in the background so I deleted mlucas.cfg and tried again just to see if it was different. It crashes now at : M4521557: using FFT length 224K = 229376 8-byte floats. this gives an average 19.712424142020090 bits per digit Using complex FFT radices 28 16 16 16 Segmentation fault 3 tries, always the same place. I tried again with mprime in background again and it crashes again, same place. Trying it now with -s m failed at: M34573867: using FFT length 1792K = 1835008 8-byte floats. this gives an average 18.841262272426061 bits per digit Using complex FFT radices 28 8 16 16 16 Segmentation fault with -s l M134113933: using FFT length 7168K = 7340032 8-byte floats. this gives an average 18.271573339189803 bits per digit Using complex FFT radices 28 32 16 16 16 Segmentation fault seems like a problem with the radix 28? Last fiddled with by lfm on 2009-11-12 at 10:44 |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Mlucas v18 available | ewmayer | Mlucas | 48 | 2019-11-28 02:53 |
Mlucas on ubuntu | Damian | Mlucas | 17 | 2017-11-13 18:12 |
Mlucas version 17 | ewmayer | Mlucas | 3 | 2017-06-17 11:18 |
MLucas on IBM Mainframe | Lorenzo | Mlucas | 52 | 2016-03-13 08:45 |
mlucas on sun | delta_t | Mlucas | 14 | 2007-10-04 05:45 |