mersenneforum.org Mlucas version 17.1
 Register FAQ Search Today's Posts Mark Forums Read

 2004-08-27, 18:18 #1 ewmayer ∂2ω=0     Sep 2002 República de California 5·2,351 Posts Mlucas version 17.1 The most recent official release is always available at http://www.mersenneforum.org/mayer/README.html I'll announce major updates/bugfixes/new-prebuilt-binaries on this thread. ======================= 06 November 2009: An Alpha version of Mlucas 3.0 is available at the above page Major new features: - SSE2 support for Win32 and 32-and-64-bit Linux. Thanks to my late-in-life conversion to assembly coding I'm a few years behind George on this, but I think it's not too shabby for a first go. It's a bit slower than Prime95 cycle-for-cycle, but I'd appreciate if some folks would be willing to give up a bit of throughput in order to help test the software. Suggestions for speedups from the ASM experts are especially welcome. - Platform-independent savefile support. - Coning soon: Trial-factoring support. - Coming soon: Primenet support. - Coming later: Multithreading support for SSE2 code. (This is more important for new-prime verify than for GIMPS users). - Coming later: QT-based GUI. Let me know if you have any download/test/build issues, -Ernst Last fiddled with by ewmayer on 2017-07-03 at 00:43 Reason: url updated to reflect ftp-site migration
 2009-11-09, 15:09 #2 ewmayer ∂2ω=0     Sep 2002 República de California 5·2,351 Posts [crickets chirping]
 2009-11-09, 15:34 #3 garo     Aug 2002 Termonfeckin, IE 24×173 Posts Be happy to test. Note that an edit of an existing post is not caught by the "new post" mechanism. Send me a PM.
2009-11-09, 16:11   #4
ewmayer
2ω=0

Sep 2002
República de California

101101111010112 Posts

Quote:
 Originally Posted by garo Be happy to test. Note that an edit of an existing post is not caught by the "new post" mechanism. Send me a PM.
Yeah, I realized this morning that although I'd updated the thread, it needed an actual new post to advertise the fact.

All the code and build/run instructions are at the above link, so let me know how it goes, if you think the readme page could be clearer about anything, etc.

BTW, I've been running the new code (even as I continued to expand and improve the SSE2 support) more or less continuously for the past 18 months, first on my Win32 box, then (after its fan died and I bought a macbook for my 64-bit linux port work) on my 6-month-old macbook, so I have full confidence in the stability and functional correctness of the LL-test core. At this point the coming year will be all about adding primenet support, speeding the trial-factoring capability (also already thoroughly tested) enough to make it releaseworthy, and (hopefully) squeezing some extra speed out of the inline assembler by way of detailed profiling and playing with stuff like prefetch, TLB priming, etc.

Last fiddled with by ewmayer on 2009-11-09 at 16:12

2009-11-09, 16:50   #5
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

816510 Posts

Quote:
 Originally Posted by ewmayer TLB priming.
TLB priming is only necessary on early versions of the Pentium 4.

 2009-11-09, 16:57 #6 Jeff Gilchrist     Jun 2003 Ottawa, Canada 3·17·23 Posts Yay, it is finally publicly available. I will take a look at this in a couple of weeks after things settle down a bit here.
 2009-11-09, 22:01 #7 ldesnogu     Jan 2008 France 59610 Posts Here is the result of -s m/l on my i7 920 (stock speed) running x86_64; the compiler is gcc 4.4.1. Code:  1024 sec/iter = 0.028 ROE[min,max] = [0.250000000, 0.312500000] radices = 32 16 32 32 0 0 0 0 0 0 1152 sec/iter = 0.033 ROE[min,max] = [0.250000000, 0.250000000] radices = 36 32 32 16 0 0 0 0 0 0 1280 sec/iter = 0.037 ROE[min,max] = [0.250000000, 0.343750000] radices = 20 32 32 32 0 0 0 0 0 0 1408 sec/iter = 0.042 ROE[min,max] = [0.312500000, 0.312500000] radices = 44 16 32 32 0 0 0 0 0 0 1536 sec/iter = 0.045 ROE[min,max] = [0.265625000, 0.269042969] radices = 24 32 32 32 0 0 0 0 0 0 1792 sec/iter = 0.055 ROE[min,max] = [0.312500000, 0.312500000] radices = 28 32 32 32 0 0 0 0 0 0 2048 sec/iter = 0.061 ROE[min,max] = [0.281250000, 0.343750000] radices = 16 16 16 16 16 0 0 0 0 0 2304 sec/iter = 0.072 ROE[min,max] = [0.242187500, 0.281250000] radices = 36 32 32 32 0 0 0 0 0 0 2560 sec/iter = 0.078 ROE[min,max] = [0.281250000, 0.312500000] radices = 20 16 16 16 16 0 0 0 0 0 2816 sec/iter = 0.093 ROE[min,max] = [0.328125000, 0.343750000] radices = 44 32 32 32 0 0 0 0 0 0 3072 sec/iter = 0.098 ROE[min,max] = [0.250000000, 0.250000000] radices = 24 16 16 16 16 0 0 0 0 0 3584 sec/iter = 0.114 ROE[min,max] = [0.281250000, 0.281250000] radices = 28 16 16 16 16 0 0 0 0 0 4096 sec/iter = 0.122 ROE[min,max] = [0.250000000, 0.312500000] radices = 16 16 16 16 32 0 0 0 0 0 4608 sec/iter = 0.147 ROE[min,max] = [0.257812500, 0.257812500] radices = 36 16 16 16 16 0 0 0 0 0 5120 sec/iter = 0.157 ROE[min,max] = [0.281250000, 0.312500000] radices = 20 16 16 16 32 0 0 0 0 0 5632 sec/iter = 0.191 ROE[min,max] = [0.375000000, 0.375000000] radices = 44 16 16 16 16 0 0 0 0 0 6144 sec/iter = 0.198 ROE[min,max] = [0.250000000, 0.296875000] radices = 24 16 16 16 32 0 0 0 0 0 7168 sec/iter = 0.232 ROE[min,max] = [0.268554688, 0.281250000] radices = 28 16 16 16 32 0 0 0 0 0 8192 sec/iter = 0.253 ROE[min,max] = [0.281250000, 0.312500000] radices = 16 16 16 32 32 0 0 0 0 0 EDIT : add results of -s l. Last fiddled with by ldesnogu on 2009-11-09 at 22:15
2009-11-09, 22:57   #8
ewmayer
2ω=0

Sep 2002
República de California

101101111010112 Posts

Quote:
 Originally Posted by ldesnogu Here is the result of -s m/l on my i7 920 (stock speed) running x86_64; the compiler is gcc 4.4.1.
Thanks, Laurent - Interesting that FFT lengths of the form 11*2^k are actually (modestly) useful on your 920 ... on both my Core2-based machines (WinXP/32-bit/MSVC and MacOS/64-bit/GCC-4.2) those are slower than the next-larger FFT length, often by quite a lot - you can see this in the sample timing tables on my README page. Your timings are much closer to what I would expect based on arithmetic opcount -- since data access patterns are similar and memory footprints also, I expected opcount would be the major timing across a variety of platforms. (It is, except for the "surprise" I got with the 11*2^k data).

 2009-11-09, 23:15 #9 ldesnogu     Jan 2008 France 25416 Posts At first I thought it could be some compiler issue but running your executable (compiled with gcc 4.2.1) gives very similar results: Code:  1024 sec/iter = 0.028 ROE[min,max] = [0.250000000, 0.312500000] radices = 32 16 32 32 0 0 0 0 0 0 1152 sec/iter = 0.034 ROE[min,max] = [0.250000000, 0.250000000] radices = 36 32 32 16 0 0 0 0 0 0 1280 sec/iter = 0.037 ROE[min,max] = [0.250000000, 0.343750000] radices = 20 32 32 32 0 0 0 0 0 0 1408 sec/iter = 0.042 ROE[min,max] = [0.312500000, 0.312500000] radices = 44 16 32 32 0 0 0 0 0 0 1536 sec/iter = 0.045 ROE[min,max] = [0.265625000, 0.269042969] radices = 24 32 32 32 0 0 0 0 0 0 1792 sec/iter = 0.056 ROE[min,max] = [0.312500000, 0.312500000] radices = 28 32 32 32 0 0 0 0 0 0 2048 sec/iter = 0.060 ROE[min,max] = [0.281250000, 0.343750000] radices = 16 16 16 16 16 0 0 0 0 0 2304 sec/iter = 0.073 ROE[min,max] = [0.242187500, 0.281250000] radices = 36 32 32 32 0 0 0 0 0 0 2560 sec/iter = 0.077 ROE[min,max] = [0.281250000, 0.312500000] radices = 20 16 16 16 16 0 0 0 0 0 2816 sec/iter = 0.094 ROE[min,max] = [0.328125000, 0.343750000] radices = 44 32 32 32 0 0 0 0 0 0 3072 sec/iter = 0.097 ROE[min,max] = [0.250000000, 0.250000000] radices = 24 16 16 16 16 0 0 0 0 0 3584 sec/iter = 0.114 ROE[min,max] = [0.281250000, 0.281250000] radices = 28 16 16 16 16 0 0 0 0 0 4096 sec/iter = 0.122 ROE[min,max] = [0.250000000, 0.312500000] radices = 16 16 16 16 32 0 0 0 0 0 4608 sec/iter = 0.147 ROE[min,max] = [0.257812500, 0.257812500] radices = 36 16 16 16 16 0 0 0 0 0 5120 sec/iter = 0.156 ROE[min,max] = [0.281250000, 0.312500000] radices = 20 16 16 16 32 0 0 0 0 0 5632 sec/iter = 0.193 ROE[min,max] = [0.375000000, 0.375000000] radices = 44 16 16 16 16 0 0 0 0 0 6144 sec/iter = 0.196 ROE[min,max] = [0.250000000, 0.296875000] radices = 24 16 16 16 32 0 0 0 0 0 7168 sec/iter = 0.231 ROE[min,max] = [0.268554688, 0.281250000] radices = 28 16 16 16 32 0 0 0 0 0 8192 sec/iter = 0.252 ROE[min,max] = [0.281250000, 0.312500000] radices = 16 16 16 32 32 0 0 0 0 0 Last fiddled with by ldesnogu on 2009-11-09 at 23:16
 2009-11-12, 01:40 #10 smoky   May 2009 7 Posts Congratulations on this milestone! May I ask about the roadmap for the RISC versions of Mlucas? It is fully understandable why they wouldn't be a priority, but one can still hope, right? A feature like PrimeNet integration would be an awesome advance! -smoky
 2009-11-12, 10:22 #11 lfm     Jul 2006 Calgary 52·17 Posts While trying Mlucas 3.0x (binary download for Linux 64) ./Mlucas_AMD64 -s a on a AMD Sempron 64 on 2.6.26-2-amd64 x86_64 GNU/Linux model name : AMD Sempron(tm) Processor 2600+ stepping : 2 cpu MHz : 1600.059 cache size : 128 KB It run all thru the full set if sizes the first try but mprime was running in the background so I deleted mlucas.cfg and tried again just to see if it was different. It crashes now at : M4521557: using FFT length 224K = 229376 8-byte floats. this gives an average 19.712424142020090 bits per digit Using complex FFT radices 28 16 16 16 Segmentation fault 3 tries, always the same place. I tried again with mprime in background again and it crashes again, same place. Trying it now with -s m failed at: M34573867: using FFT length 1792K = 1835008 8-byte floats. this gives an average 18.841262272426061 bits per digit Using complex FFT radices 28 8 16 16 16 Segmentation fault with -s l M134113933: using FFT length 7168K = 7340032 8-byte floats. this gives an average 18.271573339189803 bits per digit Using complex FFT radices 28 32 16 16 16 Segmentation fault seems like a problem with the radix 28? Last fiddled with by lfm on 2009-11-12 at 10:44

 Similar Threads Thread Thread Starter Forum Replies Last Post ewmayer Mlucas 48 2019-11-28 02:53 Damian Mlucas 17 2017-11-13 18:12 ewmayer Mlucas 3 2017-06-17 11:18 Lorenzo Mlucas 52 2016-03-13 08:45 delta_t Mlucas 14 2007-10-04 05:45

All times are UTC. The time now is 11:36.

Tue Feb 7 11:36:13 UTC 2023 up 173 days, 9:04, 1 user, load averages: 1.13, 1.29, 1.24