mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software > Mlucas

Reply
 
Thread Tools
Old 2004-08-27, 18:18   #1
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2·13·443 Posts
Default Mlucas version 17.1

The most recent official release is always available at

http://www.mersenneforum.org/mayer/README.html

I'll announce major updates/bugfixes/new-prebuilt-binaries on this thread.

=======================

06 November 2009: An Alpha version of Mlucas 3.0 is available at the above page

Major new features:

- SSE2 support for Win32 and 32-and-64-bit Linux. Thanks to my late-in-life conversion to assembly coding I'm a few years behind George on this, but I think it's not too shabby for a first go. It's a bit slower than Prime95 cycle-for-cycle, but I'd appreciate if some folks would be willing to give up a bit of throughput in order to help test the software. Suggestions for speedups from the ASM experts are especially welcome.

- Platform-independent savefile support.

- Coning soon: Trial-factoring support.

- Coming soon: Primenet support.

- Coming later: Multithreading support for SSE2 code. (This is more important for new-prime verify than for GIMPS users).

- Coming later: QT-based GUI.

Let me know if you have any download/test/build issues,
-Ernst

Last fiddled with by ewmayer on 2017-07-03 at 00:43 Reason: url updated to reflect ftp-site migration
ewmayer is offline   Reply With Quote
Old 2009-11-09, 15:09   #2
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2CFE16 Posts
Default

[crickets chirping]
ewmayer is offline   Reply With Quote
Old 2009-11-09, 15:34   #3
garo
 
garo's Avatar
 
Aug 2002
Termonfeckin, IE

5×19×29 Posts
Default

Be happy to test. Note that an edit of an existing post is not caught by the "new post" mechanism. Send me a PM.
garo is offline   Reply With Quote
Old 2009-11-09, 16:11   #4
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2·13·443 Posts
Default

Quote:
Originally Posted by garo View Post
Be happy to test. Note that an edit of an existing post is not caught by the "new post" mechanism. Send me a PM.
Yeah, I realized this morning that although I'd updated the thread, it needed an actual new post to advertise the fact.

All the code and build/run instructions are at the above link, so let me know how it goes, if you think the readme page could be clearer about anything, etc.

BTW, I've been running the new code (even as I continued to expand and improve the SSE2 support) more or less continuously for the past 18 months, first on my Win32 box, then (after its fan died and I bought a macbook for my 64-bit linux port work) on my 6-month-old macbook, so I have full confidence in the stability and functional correctness of the LL-test core. At this point the coming year will be all about adding primenet support, speeding the trial-factoring capability (also already thoroughly tested) enough to make it releaseworthy, and (hopefully) squeezing some extra speed out of the inline assembler by way of detailed profiling and playing with stuff like prefetch, TLB priming, etc.

Last fiddled with by ewmayer on 2009-11-09 at 16:12
ewmayer is offline   Reply With Quote
Old 2009-11-09, 16:50   #5
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

71×101 Posts
Default

Quote:
Originally Posted by ewmayer View Post
TLB priming.
TLB priming is only necessary on early versions of the Pentium 4.
Prime95 is online now   Reply With Quote
Old 2009-11-09, 16:57   #6
Jeff Gilchrist
 
Jeff Gilchrist's Avatar
 
Jun 2003
Ottawa, Canada

7·167 Posts
Default

Yay, it is finally publicly available. I will take a look at this in a couple of weeks after things settle down a bit here.
Jeff Gilchrist is offline   Reply With Quote
Old 2009-11-09, 22:01   #7
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

24×3×11 Posts
Default

Here is the result of -s m/l on my i7 920 (stock speed) running x86_64; the compiler is gcc 4.4.1.

Code:
      1024  sec/iter =    0.028  ROE[min,max] = [0.250000000, 0.312500000]  radices =  32 16 32 32  0  0  0  0  0  0
      1152  sec/iter =    0.033  ROE[min,max] = [0.250000000, 0.250000000]  radices =  36 32 32 16  0  0  0  0  0  0
      1280  sec/iter =    0.037  ROE[min,max] = [0.250000000, 0.343750000]  radices =  20 32 32 32  0  0  0  0  0  0
      1408  sec/iter =    0.042  ROE[min,max] = [0.312500000, 0.312500000]  radices =  44 16 32 32  0  0  0  0  0  0
      1536  sec/iter =    0.045  ROE[min,max] = [0.265625000, 0.269042969]  radices =  24 32 32 32  0  0  0  0  0  0
      1792  sec/iter =    0.055  ROE[min,max] = [0.312500000, 0.312500000]  radices =  28 32 32 32  0  0  0  0  0  0
      2048  sec/iter =    0.061  ROE[min,max] = [0.281250000, 0.343750000]  radices =  16 16 16 16 16  0  0  0  0  0
      2304  sec/iter =    0.072  ROE[min,max] = [0.242187500, 0.281250000]  radices =  36 32 32 32  0  0  0  0  0  0
      2560  sec/iter =    0.078  ROE[min,max] = [0.281250000, 0.312500000]  radices =  20 16 16 16 16  0  0  0  0  0
      2816  sec/iter =    0.093  ROE[min,max] = [0.328125000, 0.343750000]  radices =  44 32 32 32  0  0  0  0  0  0
      3072  sec/iter =    0.098  ROE[min,max] = [0.250000000, 0.250000000]  radices =  24 16 16 16 16  0  0  0  0  0
      3584  sec/iter =    0.114  ROE[min,max] = [0.281250000, 0.281250000]  radices =  28 16 16 16 16  0  0  0  0  0
      4096  sec/iter =    0.122  ROE[min,max] = [0.250000000, 0.312500000]  radices =  16 16 16 16 32  0  0  0  0  0
      4608  sec/iter =    0.147  ROE[min,max] = [0.257812500, 0.257812500]  radices =  36 16 16 16 16  0  0  0  0  0
      5120  sec/iter =    0.157  ROE[min,max] = [0.281250000, 0.312500000]  radices =  20 16 16 16 32  0  0  0  0  0
      5632  sec/iter =    0.191  ROE[min,max] = [0.375000000, 0.375000000]  radices =  44 16 16 16 16  0  0  0  0  0
      6144  sec/iter =    0.198  ROE[min,max] = [0.250000000, 0.296875000]  radices =  24 16 16 16 32  0  0  0  0  0
      7168  sec/iter =    0.232  ROE[min,max] = [0.268554688, 0.281250000]  radices =  28 16 16 16 32  0  0  0  0  0
      8192  sec/iter =    0.253  ROE[min,max] = [0.281250000, 0.312500000]  radices =  16 16 16 32 32  0  0  0  0  0
EDIT : add results of -s l.

Last fiddled with by ldesnogu on 2009-11-09 at 22:15
ldesnogu is offline   Reply With Quote
Old 2009-11-09, 22:57   #8
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2·13·443 Posts
Default

Quote:
Originally Posted by ldesnogu View Post
Here is the result of -s m/l on my i7 920 (stock speed) running x86_64; the compiler is gcc 4.4.1.
Thanks, Laurent - Interesting that FFT lengths of the form 11*2^k are actually (modestly) useful on your 920 ... on both my Core2-based machines (WinXP/32-bit/MSVC and MacOS/64-bit/GCC-4.2) those are slower than the next-larger FFT length, often by quite a lot - you can see this in the sample timing tables on my README page. Your timings are much closer to what I would expect based on arithmetic opcount -- since data access patterns are similar and memory footprints also, I expected opcount would be the major timing across a variety of platforms. (It is, except for the "surprise" I got with the 11*2^k data).
ewmayer is offline   Reply With Quote
Old 2009-11-09, 23:15   #9
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

24·3·11 Posts
Default

At first I thought it could be some compiler issue but running your executable (compiled with gcc 4.2.1) gives very similar results:

Code:
      1024  sec/iter =    0.028  ROE[min,max] = [0.250000000, 0.312500000]  radices =  32 16 32 32  0  0  0  0  0  0
      1152  sec/iter =    0.034  ROE[min,max] = [0.250000000, 0.250000000]  radices =  36 32 32 16  0  0  0  0  0  0
      1280  sec/iter =    0.037  ROE[min,max] = [0.250000000, 0.343750000]  radices =  20 32 32 32  0  0  0  0  0  0
      1408  sec/iter =    0.042  ROE[min,max] = [0.312500000, 0.312500000]  radices =  44 16 32 32  0  0  0  0  0  0
      1536  sec/iter =    0.045  ROE[min,max] = [0.265625000, 0.269042969]  radices =  24 32 32 32  0  0  0  0  0  0
      1792  sec/iter =    0.056  ROE[min,max] = [0.312500000, 0.312500000]  radices =  28 32 32 32  0  0  0  0  0  0
      2048  sec/iter =    0.060  ROE[min,max] = [0.281250000, 0.343750000]  radices =  16 16 16 16 16  0  0  0  0  0
      2304  sec/iter =    0.073  ROE[min,max] = [0.242187500, 0.281250000]  radices =  36 32 32 32  0  0  0  0  0  0
      2560  sec/iter =    0.077  ROE[min,max] = [0.281250000, 0.312500000]  radices =  20 16 16 16 16  0  0  0  0  0
      2816  sec/iter =    0.094  ROE[min,max] = [0.328125000, 0.343750000]  radices =  44 32 32 32  0  0  0  0  0  0
      3072  sec/iter =    0.097  ROE[min,max] = [0.250000000, 0.250000000]  radices =  24 16 16 16 16  0  0  0  0  0
      3584  sec/iter =    0.114  ROE[min,max] = [0.281250000, 0.281250000]  radices =  28 16 16 16 16  0  0  0  0  0
      4096  sec/iter =    0.122  ROE[min,max] = [0.250000000, 0.312500000]  radices =  16 16 16 16 32  0  0  0  0  0
      4608  sec/iter =    0.147  ROE[min,max] = [0.257812500, 0.257812500]  radices =  36 16 16 16 16  0  0  0  0  0
      5120  sec/iter =    0.156  ROE[min,max] = [0.281250000, 0.312500000]  radices =  20 16 16 16 32  0  0  0  0  0
      5632  sec/iter =    0.193  ROE[min,max] = [0.375000000, 0.375000000]  radices =  44 16 16 16 16  0  0  0  0  0
      6144  sec/iter =    0.196  ROE[min,max] = [0.250000000, 0.296875000]  radices =  24 16 16 16 32  0  0  0  0  0
      7168  sec/iter =    0.231  ROE[min,max] = [0.268554688, 0.281250000]  radices =  28 16 16 16 32  0  0  0  0  0
      8192  sec/iter =    0.252  ROE[min,max] = [0.281250000, 0.312500000]  radices =  16 16 16 32 32  0  0  0  0  0

Last fiddled with by ldesnogu on 2009-11-09 at 23:16
ldesnogu is offline   Reply With Quote
Old 2009-11-12, 01:40   #10
smoky
 
May 2009

7 Posts
Default

Congratulations on this milestone!

May I ask about the roadmap for the RISC versions of Mlucas? It is fully understandable why they wouldn't be a priority, but one can still hope, right? A feature like PrimeNet integration would be an awesome advance!

-smoky
smoky is offline   Reply With Quote
Old 2009-11-12, 10:22   #11
lfm
 
lfm's Avatar
 
Jul 2006
Calgary

52·17 Posts
Default

While trying Mlucas 3.0x (binary download for Linux 64)

./Mlucas_AMD64 -s a

on a AMD Sempron 64 on 2.6.26-2-amd64 x86_64 GNU/Linux
model name : AMD Sempron(tm) Processor 2600+
stepping : 2
cpu MHz : 1600.059
cache size : 128 KB

It run all thru the full set if sizes the first try but mprime was running in the background so I deleted mlucas.cfg and tried again just to see if it was different. It crashes now at :

M4521557: using FFT length 224K = 229376 8-byte floats.
this gives an average 19.712424142020090 bits per digit
Using complex FFT radices 28 16 16 16
Segmentation fault

3 tries, always the same place.

I tried again with mprime in background again and it crashes again, same place.

Trying it now with -s m failed at:

M34573867: using FFT length 1792K = 1835008 8-byte floats.
this gives an average 18.841262272426061 bits per digit
Using complex FFT radices 28 8 16 16 16
Segmentation fault

with -s l

M134113933: using FFT length 7168K = 7340032 8-byte floats.
this gives an average 18.271573339189803 bits per digit
Using complex FFT radices 28 32 16 16 16
Segmentation fault


seems like a problem with the radix 28?

Last fiddled with by lfm on 2009-11-12 at 10:44
lfm is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Mlucas v18 available ewmayer Mlucas 48 2019-11-28 02:53
Mlucas on ubuntu Damian Mlucas 17 2017-11-13 18:12
Mlucas version 17 ewmayer Mlucas 3 2017-06-17 11:18
MLucas on IBM Mainframe Lorenzo Mlucas 52 2016-03-13 08:45
mlucas on sun delta_t Mlucas 14 2007-10-04 05:45

All times are UTC. The time now is 16:27.

Wed Sep 30 16:27:49 UTC 2020 up 20 days, 13:38, 0 users, load averages: 1.87, 1.81, 1.82

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.