mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2005-06-17, 03:42   #1
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

3·11·227 Posts
Default Version 24.12 release candidate 1

Here we go again!

Windows: ftp://mersenne.org/gimps/p95v2412.zip
Windows 64-bit: ftp://mersenne.org/gimps/p64v2412.zip
WinNT service: ftp://mersenne.org/gimps/winnt2412.zip
Linux: ftp://mersenne.org/gimps/mprime2412.tar.gz or
Static Linux: ftp://mersenne.org/gimps/sprime2412.tar.gz

What's new since the beta 24.12?
1) I implemented my last optimization idea for another 1 or 2% speedup
2) Using the data from the special benchmarks, the proper FFT implementation is chosen for each L2 cache size.
3) Lower factoring breakevens implemented.

Barring any serious bugs found, 24.12 will become the next official prime95 release.
Prime95 is offline   Reply With Quote
Old 2005-06-17, 05:20   #2
Cruelty
 
Cruelty's Avatar
 
May 2005

23×7×29 Posts
Default Just a quick benchmark

AMD Athlon(tm) 64 Processor 3500+
CPU speed: 2520.33 MHz
CPU features: RDTSC, CMOV, Prefetch, 3DNow!, MMX, SSE, SSE2
L1 cache size: 64 KB
L2 cache size: 512 KB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
L1 TLBS: 32
L2 TLBS: 512
Prime95 32-bit version 24.12, RdtscTiming=1
Best time for 512K FFT length: 18.450 ms.
Best time for 640K FFT length: 24.010 ms.
Best time for 768K FFT length: 29.121 ms.
Best time for 896K FFT length: 35.137 ms.
Best time for 1024K FFT length: 38.830 ms.
Best time for 1280K FFT length: 49.727 ms.
Best time for 1536K FFT length: 60.783 ms.
Best time for 1792K FFT length: 73.629 ms.
Best time for 2048K FFT length: 81.896 ms.
Best time for 2560K FFT length: 108.686 ms.
Best time for 3072K FFT length: 132.641 ms.
Best time for 3584K FFT length: 160.269 ms.
Best time for 4096K FFT length: 178.565 ms.
Best time for 58 bit trial factors: 4.579 ms.
Best time for 59 bit trial factors: 4.592 ms.
Best time for 60 bit trial factors: 4.570 ms.
Best time for 61 bit trial factors: 4.599 ms.
Best time for 62 bit trial factors: 8.632 ms.
Best time for 63 bit trial factors: 8.630 ms.
Best time for 64 bit trial factors: 10.985 ms.
Best time for 65 bit trial factors: 10.909 ms.
Best time for 66 bit trial factors: 10.932 ms.
Best time for 67 bit trial factors: 10.876 ms.

This version is 1-4% faster than the previous 24.12 release
Cruelty is offline   Reply With Quote
Old 2005-06-17, 08:01   #3
Cruelty
 
Cruelty's Avatar
 
May 2005

31308 Posts
Default

I have benchmarked my mobile P3 1000 using 24.12 rc1, and it is still slower (0.5-2%) than version 24.11. Should I switch to 24.12, or can I continue to use 24.11
Cruelty is offline   Reply With Quote
Old 2005-06-17, 12:13   #4
db597
 
db597's Avatar
 
Jan 2003

110010112 Posts
Default

The latest 24.12 shows some speed up. Here's the benchmarks on a P4 Prescott with 1MB cache. From the old 24.12 (before cache optimisation):

Intel(R) Pentium(R) 4 CPU 2.80GHz
CPU speed: 3227.21 MHz
CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2
L1 cache size: 16 KB
L2 cache size: 1024 KB
L1 cache line size: 64 bytes
L2 cache line size: 128 bytes
TLBS: 64
Prime95 32-bit version 24.12, RdtscTiming=1
Best time for 512K FFT length: 16.255 ms.
Best time for 640K FFT length: 21.133 ms.
Best time for 768K FFT length: 25.275 ms.
Best time for 896K FFT length: 29.966 ms.
Best time for 1024K FFT length: 34.241 ms.
Best time for 1280K FFT length: 42.462 ms.
Best time for 1536K FFT length: 51.886 ms.
Best time for 1792K FFT length: 61.309 ms.
Best time for 2048K FFT length: 69.049 ms.
Best time for 2560K FFT length: 90.215 ms.
Best time for 3072K FFT length: 110.053 ms.
Best time for 3584K FFT length: 131.908 ms.
Best time for 4096K FFT length: 148.553 ms.
Best time for 58 bit trial factors: 8.511 ms.
Best time for 59 bit trial factors: 8.574 ms.
Best time for 60 bit trial factors: 8.538 ms.
Best time for 61 bit trial factors: 8.514 ms.
Best time for 62 bit trial factors: 11.889 ms.
Best time for 63 bit trial factors: 11.895 ms.
Best time for 64 bit trial factors: 13.589 ms.
Best time for 65 bit trial factors: 13.655 ms.
Best time for 66 bit trial factors: 13.772 ms.
Best time for 67 bit trial factors: 13.862 ms.

-----------------------------------------------------------

Official 24.12 client after cache optimisation (note: both clients report version number as 24.12), a bit confusing:

Intel(R) Pentium(R) 4 CPU 2.80GHz
CPU speed: 3227.22 MHz
CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2
L1 cache size: 16 KB
L2 cache size: 1024 KB
L1 cache line size: 64 bytes
L2 cache line size: 128 bytes
TLBS: 64
Prime95 32-bit version 24.12, RdtscTiming=1
Best time for 512K FFT length: 15.754 ms.
Best time for 640K FFT length: 20.036 ms.
Best time for 768K FFT length: 24.443 ms.
Best time for 896K FFT length: 29.279 ms.
Best time for 1024K FFT length: 33.345 ms.
Best time for 1280K FFT length: 41.592 ms.
Best time for 1536K FFT length: 50.487 ms.
Best time for 1792K FFT length: 59.909 ms.
Best time for 2048K FFT length: 67.934 ms.
Best time for 2560K FFT length: 88.175 ms.
Best time for 3072K FFT length: 108.520 ms.
Best time for 3584K FFT length: 128.553 ms.
Best time for 4096K FFT length: 144.977 ms.
Best time for 58 bit trial factors: 8.476 ms.
Best time for 59 bit trial factors: 8.456 ms.
Best time for 60 bit trial factors: 8.495 ms.
Best time for 61 bit trial factors: 8.532 ms.
Best time for 62 bit trial factors: 11.919 ms.
Best time for 63 bit trial factors: 11.909 ms.
Best time for 64 bit trial factors: 13.656 ms.
Best time for 65 bit trial factors: 13.768 ms.
Best time for 66 bit trial factors: 13.815 ms.
Best time for 67 bit trial factors: 13.517 ms.

-----------------------------------------------------------

The percentage improvement for LL tests are as follows:

512K - 3.08%
640K - 5.19%
768K - 3.29%
896K - 2.29%
1024K - 2.62%
1280K - 2.05%
1536K - 2.70%
1792K - 2.28%
2048K - 1.61%
2560K - 2.26%
3072K - 1.39%
3584K - 2.54%
4096K - 2.41%

Average speed improvement over the unoptimised 24.12 client - 2.59%. Well done George!
db597 is offline   Reply With Quote
Old 2005-06-17, 16:04   #5
akruppa
 
akruppa's Avatar
 
"Nancy"
Aug 2002
Alexandria

2,467 Posts
Default

Quote:
Originally Posted by Prime95
Barring any serious bugs found, 24.12 will become the next official prime95 release.
Bad news, I'm afraid.

With the statically linked version, FullBench=1, AllBench=1 on an Opteron 150 (1MB cache):

Code:
Timing 10 iterations at 7168K FFT length.  Best time: 389.042 ms.
Timing 10 iterations at 7168K FFT length.  Best time: 391.583 ms.
Timing 10 iterations at 8192K FFT length.  Best time: 457.464 ms.
Timing 10 iterations at 8192K FFT length.  Best time: 439.674 ms.
Timing 10 iterations at 8192K all-complex FFT length.  Segmentation fault

I'll run again in gdb, maybe it'll tell where it crashes.

Alex

Edit: Nope, gdb has no idea where it is.

Last fiddled with by akruppa on 2005-06-17 at 16:10
akruppa is offline   Reply With Quote
Old 2005-06-17, 17:45   #6
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

3×11×227 Posts
Default

There is a typo in the table entry for 8M all-complex FFT for AMD machines with 1M of L2 cache. I've fixed it and will upload a new exe sometime soon (after I look into the 24M FFT troubles).
Prime95 is offline   Reply With Quote
Old 2005-06-18, 05:36   #7
HiddenWarrior
 
HiddenWarrior's Avatar
 
Jun 2003
Russia, Novosibirsk

2×107 Posts
Default

Timings from my 2 machines

Intel(R) Pentium(R) III processor
CPU speed: 1002.06 MHz
CPU features: RDTSC, CMOV, Prefetch, MMX, SSE
L1 cache size: 16 KB
L2 cache size: 256 KB
L1 cache line size: 32 bytes
L2 cache line size: 32 bytes
TLBS: 64
Prime95 32-bit version 24.12, RdtscTiming=1
Best time for 512K FFT length: 92.041 ms.
Best time for 640K FFT length: 123.092 ms.
Best time for 768K FFT length: 149.546 ms.
Best time for 896K FFT length: 186.173 ms.
Best time for 1024K FFT length: 212.590 ms.
Best time for 1280K FFT length: 280.567 ms.
Best time for 1536K FFT length: 340.946 ms.
Best time for 1792K FFT length: 404.854 ms.
Best time for 2048K FFT length: 452.005 ms.
Best time for 2560K FFT length: 599.777 ms.
Best time for 3072K FFT length: 731.378 ms.
Best time for 3584K FFT length: 878.770 ms.
Best time for 4096K FFT length: 1002.969 ms.
Best time for 58 bit trial factors: 14.261 ms.
Best time for 59 bit trial factors: 14.265 ms.
Best time for 60 bit trial factors: 14.191 ms.
Best time for 61 bit trial factors: 14.240 ms.
Best time for 62 bit trial factors: 26.324 ms.
Best time for 63 bit trial factors: 26.420 ms.
Best time for 64 bit trial factors: 59.905 ms.
Best time for 65 bit trial factors: 60.912 ms.
Best time for 66 bit trial factors: 61.941 ms.
Best time for 67 bit trial factors: 62.574 ms.

Intel(R) Pentium(R) 4 CPU 1.50GHz
CPU speed: 1530.24 MHz
CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2
L1 cache size: 8 KB
L2 cache size: 256 KB
L1 cache line size: 64 bytes
L2 cache line size: 128 bytes
TLBS: 64
Prime95 32-bit version 24.12, RdtscTiming=1
Best time for 512K FFT length: 28.553 ms.
Best time for 640K FFT length: 36.761 ms.
Best time for 768K FFT length: 45.440 ms.
Best time for 896K FFT length: 54.076 ms.
Best time for 1024K FFT length: 60.343 ms.
Best time for 1280K FFT length: 79.738 ms.
Best time for 1536K FFT length: 102.075 ms.
Best time for 1792K FFT length: 121.463 ms.
Best time for 2048K FFT length: 135.990 ms.
Best time for 2560K FFT length: 183.085 ms.
Best time for 3072K FFT length: 225.757 ms.
Best time for 3584K FFT length: 271.189 ms.
Best time for 4096K FFT length: 305.406 ms.
Best time for 58 bit trial factors: 20.353 ms.
Best time for 59 bit trial factors: 20.532 ms.
Best time for 60 bit trial factors: 20.232 ms.
Best time for 61 bit trial factors: 20.324 ms.
Best time for 62 bit trial factors: 20.743 ms.
Best time for 63 bit trial factors: 20.738 ms.
Best time for 64 bit trial factors: 25.466 ms.
Best time for 65 bit trial factors: 25.295 ms.
Best time for 66 bit trial factors: 25.357 ms.
Best time for 67 bit trial factors: 25.414 ms.
HiddenWarrior is offline   Reply With Quote
Old 2005-06-19, 01:07   #8
delta_t
 
delta_t's Avatar
 
Nov 2002
Anchorage, AK

3×7×17 Posts
Default

FullBench=1, AllBench=1

Intel(R) Pentium(R) M processor 2.10GHz
CPU speed: 2093.10 MHz
CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2
L1 cache size: 32 KB
L2 cache size: 2048 KB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 128
Prime95 32-bit version 24.12, RdtscTiming=1
Attached Files
File Type: txt results.txt (9.5 KB, 141 views)
delta_t is offline   Reply With Quote
Old 2005-06-19, 15:52   #9
PrimeCruncher
 
PrimeCruncher's Avatar
 
Sep 2003
Borg HQ, Delta Quadrant

2·33·13 Posts
Default

Quote:
Originally Posted by Prime95
3) Lower factoring breakevens implemented.
So... Prime95 will do less TF now? Aside from this, are there any factoring performance improvements for Celerons/Athlons? Should I upgrade to 24 or continue using 23?
PrimeCruncher is offline   Reply With Quote
Old 2005-06-19, 16:21   #10
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11101010000112 Posts
Default

Quote:
Originally Posted by PrimeCruncher
So... Prime95 will do less TF now? Aside from this, are there any factoring performance improvements for Celerons/Athlons? Should I upgrade to 24 or continue using 23?
Yes, prime95 will do less TF now. There are no factoring performance improvements for any CPU type. Yes, you should upgrade to v24 at your leisure.
Prime95 is offline   Reply With Quote
Old 2005-06-19, 19:38   #11
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

111110 Posts
Default

*hehe* your FFT code is quiet impressive.

Do you have measurements like <actual throughput per clock> / <theoretical throughput per clock>?

btw.: some timings on my doublecheck machine (dual Xeon 2.2GHz FSB400, Dual-DDR 200, _REALLY_ bandwidth limited in most cases ;))

24.11(beta) -> 24.12(beta) -> 24.12-rc1

internal benchmark of prime95 (on instance of prime):
768K FFT: 34.407 -> 33.732 (+2%) -> 32.951 (+2.4%)

running 2 instances of prime at the same time doing doublechecks in 768K FFT range (counted the number of iterations done in one day and calculated the per iteration time (output auf mprime -d is quiet inaccurate))

768K FFT, 2 instances of doublechecking:
42.0 -> 41.1 (+2.2%) -> 40.0 (+2.8%)

When you develop your program do you do timings only on single-CPUs sytems or on SMP-systems, too?
TheJudger is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Prime95 version 27.1 early preview, not-even-close-to-beta release Prime95 Software 126 2012-02-09 16:17
GMP-ECM 6.3 release candidate akruppa GMP-ECM 58 2010-09-11 15:26
v24.13 release candidate 1 Prime95 Software 13 2005-07-14 23:29
V24.12 release candidate 3 Prime95 Software 45 2005-07-02 19:13
Version 24.12 release candidate 2 Prime95 Software 14 2005-06-26 19:25

All times are UTC. The time now is 13:58.

Wed May 12 13:58:36 UTC 2021 up 34 days, 8:39, 0 users, load averages: 2.57, 2.37, 2.26

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.