mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2005-06-09, 00:58   #1
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

22·1,789 Posts
Default Beta version 24.12 available

The first beta of version 24.12 can be downloaded from:

Windows: ftp://mersenne.org/gimps/p95v2412.zip
Linux: ftp://mersenne.org/gimps/mprime2412.zip.tar.gz or
Static Linux: ftp://mersenne.org/gimps/sprime2412.zip.tar.gz

This version is faster for all SSE2 machines (2 - 10%).
SSE2 code supports FFT sizes up to 32 million, but don't use them yet. They need some more QA (see next post).
Has a workaround for Error 3 problem.
Should be harder for computers to spontaneously rename themselves.
Prime95 is offline   Reply With Quote
Old 2005-06-09, 01:06   #2
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

22·1,789 Posts
Default

I'm down to just one idea left to speed up the code. After working on that for a while, I will need volunteers to time prime95 on various L2 cache sizes and I will need volunteers to QA the large FFT sizes.

I especially need an AMD SSE2 machine with 256K L2 cache.

I'll also need a 1MB L2 cache P4 and 512K L2 cache AMD64 machine, but these are very common.

I need an AMD64 and P4 machine with 2GB of memory to QA the large FFTs.

If you can help, please revisit this section of the forums over the next week or two. I'll announce a test version to download for benchmarking and QAing.
Prime95 is offline   Reply With Quote
Old 2005-06-09, 01:10   #3
sdbardwick
 
sdbardwick's Avatar
 
Aug 2002
North San Diego County

23·5·17 Posts
Default

Spiffy!
Code:
Intel(R) Pentium(R) 4 CPU 2.40GHz
CPU speed: 2392.49 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE, SSE2
L1 cache size: 8 KB
L2 cache size: 512 KB
L1 cache line size: 64 bytes
L2 cache line size: 128 bytes
TLBS: 64
Prime95 version 23.8, RdtscTiming=1
Best time for 512K FFT length: 23.986 ms.
Best time for 640K FFT length: 29.059 ms.
Best time for 768K FFT length: 35.340 ms.
Best time for 896K FFT length: 41.411 ms.
Best time for 1024K FFT length: 46.952 ms.
Best time for 1280K FFT length: 61.214 ms.
Best time for 1536K FFT length: 73.665 ms.
Best time for 1792K FFT length: 88.070 ms.
Best time for 2048K FFT length: 101.813 ms.
[Wed Jun 08 18:07:27 2005]
Compare your results to other computers at http://www.mersenne.org/bench.htm
That web page also contains instructions on how your results can be included.
 
Intel(R) Pentium(R) 4 CPU 2.40GHz
CPU speed: 2392.31 MHz
CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2
L1 cache size: 8 KB
L2 cache size: 512 KB
L1 cache line size: 64 bytes
L2 cache line size: 128 bytes
TLBS: 64
Prime95 32-bit version 24.12, RdtscTiming=1
Best time for 512K FFT length: 21.825 ms.
Best time for 640K FFT length: 28.142 ms.
Best time for 768K FFT length: 34.530 ms.
Best time for 896K FFT length: 39.891 ms.
Best time for 1024K FFT length: 45.745 ms.
Best time for 1280K FFT length: 58.705 ms.
Best time for 1536K FFT length: 72.312 ms.
Best time for 1792K FFT length: 83.354 ms.
Best time for 2048K FFT length: 95.058 ms.
Best time for 2560K FFT length: 123.420 ms.
Best time for 3072K FFT length: 157.165 ms.
Best time for 3584K FFT length: 182.897 ms.
Best time for 4096K FFT length: 212.733 ms.
Best time for 58 bit trial factors: 12.394 ms.
Best time for 59 bit trial factors: 12.347 ms.
Best time for 60 bit trial factors: 12.269 ms.
Best time for 61 bit trial factors: 12.373 ms.
Best time for 62 bit trial factors: 13.325 ms.
Best time for 63 bit trial factors: 13.277 ms.
Best time for 64 bit trial factors: 16.332 ms.
Best time for 65 bit trial factors: 16.332 ms.
Best time for 66 bit trial factors: 16.229 ms.
Best time for 67 bit trial factors: 16.231 ms.
sdbardwick is online now   Reply With Quote
Old 2005-06-09, 05:26   #4
RMAC9.5
 
RMAC9.5's Avatar
 
Jun 2003

9916 Posts
Default Interesting Article on the Pentium IV Replay "Feature"

George, I wanted to make sure that you were aware of an interesting article by Xbit Labs on the Pentium IV Replay feature. It might explain the weirdness experienced by PhilF and delta_t that you commented on in the Early Beta of Version 24.11 thread.
Link http://www.xbitlabs.com/articles/cpu...ay/replay.html
RMAC9.5 is offline   Reply With Quote
Old 2005-06-09, 06:42   #5
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

129E16 Posts
Default

Quote:
Originally Posted by RMAC9.5
George, I wanted to make sure that you were aware of an interesting article by Xbit Labs on the Pentium IV Replay feature. It might explain the weirdness experienced by PhilF and delta_t that you commented on in the Early Beta of Version 24.11 thread.
Link http://www.xbitlabs.com/articles/cpu...ay/replay.html
Quite intersting!

Luigi
ET_ is offline   Reply With Quote
Old 2005-06-09, 07:32   #6
Kaboom
 
Kaboom's Avatar
 
Apr 2003
Milan, Italy

1C16 Posts
Default

No 64-bit clients, yet?
Kaboom is offline   Reply With Quote
Old 2005-06-09, 07:33   #7
Cruelty
 
Cruelty's Avatar
 
May 2005

2×809 Posts
Default P3 1000 benchmarks

There is a significant gain (>20%) by going from 23.8 to 24.11, however 24.12 does not offer any drastic performance increase over 24.11 (something below 1%).
Attached Files
File Type: txt P3 1000.txt (3.7 KB, 200 views)
Cruelty is offline   Reply With Quote
Old 2005-06-09, 08:04   #8
Kaboom
 
Kaboom's Avatar
 
Apr 2003
Milan, Italy

111002 Posts
Default Cool!

Cool!

About 10% gain.

Code:
[Thu Jun  9 10:00:33 2005]
Compare your results to other computers at http://www.mersenne.org/bench.htm
That web page also contains instructions on how your results can be included.

AMD Athlon(tm) 64 Processor 3800+
CPU speed: 2410.47 MHz
CPU features: RDTSC, CMOV, Prefetch, 3DNow!, MMX, SSE, SSE2
L1 cache size: 64 KB
L2 cache size: 512 KB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
L1 TLBS: 32
L2 TLBS: 512
Prime95 32-bit version 24.12, RdtscTiming=1
Best time for 512K FFT length: 20.036 ms.
Best time for 640K FFT length: 26.049 ms.
Best time for 768K FFT length: 31.640 ms.
Best time for 896K FFT length: 37.771 ms.
Best time for 1024K FFT length: 42.064 ms.
Best time for 1280K FFT length: 53.510 ms.
Best time for 1536K FFT length: 65.706 ms.
Best time for 1792K FFT length: 79.678 ms.
Best time for 2048K FFT length: 88.757 ms.
Best time for 2560K FFT length: 120.701 ms.
Best time for 3072K FFT length: 146.729 ms.
Best time for 3584K FFT length: 178.234 ms.
Best time for 4096K FFT length: 200.030 ms.
Best time for 58 bit trial factors: 4.824 ms.
Best time for 59 bit trial factors: 4.819 ms.
Best time for 60 bit trial factors: 4.815 ms.
Best time for 61 bit trial factors: 4.828 ms.
Best time for 62 bit trial factors: 9.107 ms.
Best time for 63 bit trial factors: 9.123 ms.
Best time for 64 bit trial factors: 11.584 ms.
Best time for 65 bit trial factors: 11.513 ms.
Best time for 66 bit trial factors: 11.522 ms.
Best time for 67 bit trial factors: 11.479 ms.
Can't wait for a 64-bit linux client!

Last fiddled with by Kaboom on 2005-06-09 at 08:05
Kaboom is offline   Reply With Quote
Old 2005-06-09, 12:05   #9
db597
 
db597's Avatar
 
Jan 2003

7×29 Posts
Default

Quote:
Originally Posted by Prime95
I'll also need a 1MB L2 cache P4 and 512K L2 cache AMD64 machine, but these are very common.
I have two P4 Prescotts with 1MB, one always in Linux (office) and one always in Windows (home). Do let me know what benchmarks you want.
db597 is offline   Reply With Quote
Old 2005-06-09, 12:26   #10
db597
 
db597's Avatar
 
Jan 2003

7·29 Posts
Default Comparison between 24.11 and 24.12

The following were run on a 1MB cache P4, so we can see the new cache size tuning come into play:

Intel(R) Pentium(R) 4 CPU 2.80GHz
CPU speed: 3227.28 MHz
CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2
L1 cache size: 16 KB
L2 cache size: 1024 KB
L1 cache line size: 64 bytes
L2 cache line size: 128 bytes
TLBS: 64
Prime95 version 24.11, RdtscTiming=1
Best time for 512K FFT length: 17.486 ms.
Best time for 640K FFT length: 21.240 ms.
Best time for 768K FFT length: 25.714 ms.
Best time for 896K FFT length: 30.632 ms.
Best time for 1024K FFT length: 34.529 ms.
Best time for 1280K FFT length: 45.018 ms.
Best time for 1536K FFT length: 54.450 ms.
Best time for 1792K FFT length: 65.444 ms.
Best time for 2048K FFT length: 73.603 ms.
Best time for 58 bit trial factors: 8.512 ms.
Best time for 59 bit trial factors: 8.567 ms.
Best time for 60 bit trial factors: 8.483 ms.
Best time for 61 bit trial factors: 8.549 ms.
Best time for 62 bit trial factors: 11.898 ms.
Best time for 63 bit trial factors: 11.961 ms.
Best time for 64 bit trial factors: 13.826 ms.
Best time for 65 bit trial factors: 13.817 ms.
Best time for 66 bit trial factors: 13.801 ms.
Best time for 67 bit trial factors: 13.754 ms.

--------------------------------------------------------

Intel(R) Pentium(R) 4 CPU 2.80GHz
CPU speed: 3227.31 MHz
CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2
L1 cache size: 16 KB
L2 cache size: 1024 KB
L1 cache line size: 64 bytes
L2 cache line size: 128 bytes
TLBS: 64
Prime95 32-bit version 24.12, RdtscTiming=1
Best time for 512K FFT length: 16.216 ms.
Best time for 640K FFT length: 20.766 ms.
Best time for 768K FFT length: 25.187 ms.
Best time for 896K FFT length: 29.880 ms.
Best time for 1024K FFT length: 34.054 ms.
Best time for 1280K FFT length: 42.345 ms.
Best time for 1536K FFT length: 51.512 ms.
Best time for 1792K FFT length: 61.213 ms.
Best time for 2048K FFT length: 69.289 ms.
Best time for 2560K FFT length: 90.056 ms.
Best time for 3072K FFT length: 110.067 ms.
Best time for 3584K FFT length: 131.981 ms.
Best time for 4096K FFT length: 148.198 ms.
Best time for 58 bit trial factors: 8.525 ms.
Best time for 59 bit trial factors: 8.543 ms.
Best time for 60 bit trial factors: 8.531 ms.
Best time for 61 bit trial factors: 8.528 ms.
Best time for 62 bit trial factors: 11.948 ms.
Best time for 63 bit trial factors: 11.947 ms.
Best time for 64 bit trial factors: 13.613 ms.
Best time for 65 bit trial factors: 13.751 ms.
Best time for 66 bit trial factors: 13.720 ms.
Best time for 67 bit trial factors: 13.816 ms.

--------------------------------------------------------

So from my analysis - percentage speedup in (brackets):

512K : 17.486 vs 16.216 (+7.263%)
640K : 21.24 vs 20.766 (+2.232%)
768K : 25.714 vs 25.187 (+2.049%)
896K : 30.632 vs 29.88 (+2.455%)
1024K : 34.529 vs 34.054 (+1.376%)
1280K : 45.018 vs 42.345 (+5.938%)
1536K : 54.45 vs 51.512 (+5.396%)
1792K : 65.444 vs 61.213 (+6.465%)
2048K : 73.603 vs 69.289 (+5.861%)

Conclusion:

1. There is no improvement for trial factoring.
2. Nice small improvements throughout the FFT range.

Is it safe to switch to this new version of the client yet?

Last fiddled with by db597 on 2005-06-09 at 12:35
db597 is offline   Reply With Quote
Old 2005-06-09, 17:34   #11
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

22·1,789 Posts
Default

Quote:
Originally Posted by db597
Is it safe to switch to this new version of the client yet?
Betas are always thought to be safe for use - they pass torture tests. However, your risk of running into a new bug is somewhat higher.

I've switched my P4s, my first double-check will complete on Monday.
Prime95 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
LLR beta Version 3.8.13 (deprecated) Jean Penné Software 111 2015-01-26 21:41
Prime95 beta version 28.4 Prime95 Software 20 2014-03-02 02:51
Prime95 beta version 28.3 Prime95 Software 68 2014-02-23 05:42
Early Beta of version 24.11 Prime95 Software 113 2005-05-24 17:05
Beta version of PRP Prime95 PSearch 15 2004-09-17 19:21

All times are UTC. The time now is 01:45.

Fri Sep 25 01:45:20 UTC 2020 up 14 days, 22:56, 0 users, load averages: 1.06, 1.20, 1.32

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.