mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2010-09-27, 19:41   #89
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

22·1,873 Posts
Default

Followup. You may wonder why gwnum doesn't use a 64K FFT with no prefetching at all? This is indeed a little bit faster -- if you are running on one core. However, if you are running on 4 cores, the prefetching might be useful.

I had to pick one, so I picked an FFT implementation that does some prefetching. The difference is minimal anyway.
Prime95 is online now   Reply With Quote
Old 2010-09-27, 21:22   #90
Rhyled
 
Rhyled's Avatar
 
May 2010

32·7 Posts
Default TF benchmarks fine, Spam still there.

Quote:
Originally Posted by liqi View Post
trial factoring timing is quite different between 32-bit version and 64-bit verison.
64-bit is much faster than 32-bit version.

please benchmark Prime95 64-bit version 26.2 and compare with Prime95 64-bit version 25.11, and check the results.
You are correct. The Trial Factoring benchmarks are almost identical between 25.11 and 26.2 (x64 versions). The FFT logic is ~20% faster in 26.2, which is sweet.

That just leaves the trivial spam issue, also present in 26.2 x64:

[Sep 27 16:58] Timing 10 iterations at 7168K FFT length. Best time: 37.430 ms., avg time: 38.174 ms.
[Sep 27 16:58] Setting affinity to run helper thread 1 on logical CPU #1
[Sep 27 16:58] Setting affinity to run helper thread 2 on logical CPU #2
[Sep 27 16:58] Setting affinity to run helper thread 3 on logical CPU #3
[Sep 27 16:58] Setting affinity to run helper thread 1 on logical CPU #1
[Sep 27 16:58] Setting affinity to run helper thread 2 on logical CPU #2
[Sep 27 16:58] Setting affinity to run helper thread 3 on logical CPU #3
[Sep 27 16:58] Setting affinity to run helper thread 1 on logical CPU #1
[Sep 27 16:58] Setting affinity to run helper thread 3 on logical CPU #3
[Sep 27 16:58] Setting affinity to run helper thread 2 on logical CPU #2
[Sep 27 16:58] Timing 10 iterations at 8192K FFT length. Best time: 37.667 ms., avg time: 38.574 ms

Results & Worker Window x64.zip
Rhyled is offline   Reply With Quote
Old 2010-09-27, 22:10   #91
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

22·1,873 Posts
Default

Quote:
Originally Posted by Rhyled View Post
That just leaves the trivial spam issue, also present in 26.2 x64:
See the second post in this thread
Prime95 is online now   Reply With Quote
Old 2010-09-28, 01:05   #92
Rhyled
 
Rhyled's Avatar
 
May 2010

6310 Posts
Default

Quote:
Originally Posted by Prime95 View Post
See the second post in this thread
My bad. I'll fire up LL-D checks on all 4 cores using 26.2 when my current tasks finish up in a week or so. I'm too chicken to move LL save files from one version to the other and jeopardize weeks of cpu time.
Rhyled is offline   Reply With Quote
Old 2010-09-28, 03:01   #93
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

2·3·773 Posts
Default i5-750

Code:
------ Stock Speed v25.9 -------
Intel(R) Core(TM) i5 CPU         750  @ 2.67GHz
CPU speed: 2664.80 MHz, 2 hyperthreaded cores
CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2, SSE4
L1 cache size: 32 KB
L2 cache size: 256 KB, L3 cache size: 8 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
Prime95 64-bit version 25.9, RdtscTiming=1
Best time for 768K FFT length: 11.669 ms.
Best time for 896K FFT length: 14.130 ms.
Best time for 1024K FFT length: 15.980 ms.
Best time for 1280K FFT length: 20.180 ms.
Best time for 1536K FFT length: 24.283 ms.
Best time for 1792K FFT length: 29.388 ms.
Best time for 2048K FFT length: 33.446 ms.
Best time for 2560K FFT length: 44.548 ms.
Best time for 3072K FFT length: 54.507 ms.
Best time for 3584K FFT length: 66.158 ms.
Best time for 4096K FFT length: 74.582 ms.
Best time for 5120K FFT length: 95.992 ms.
Best time for 6144K FFT length: 113.927 ms.
Best time for 7168K FFT length: 139.210 ms.
Best time for 8192K FFT length: 156.387 ms.
Code:
-------- OC'd to 3200 v25.9 --------
Intel(R) Core(TM) i5 CPU         750  @ 2.67GHz
CPU speed: 3199.96 MHz, 4 cores
CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2, SSE4
L1 cache size: 32 KB
L2 cache size: 256 KB, L3 cache size: 8 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
Prime95 64-bit version 25.9, RdtscTiming=1
Best time for 768K FFT length: 11.632 ms.
Best time for 896K FFT length: 14.085 ms.
Best time for 1024K FFT length: 15.852 ms.
Best time for 1280K FFT length: 19.948 ms.
Best time for 1536K FFT length: 24.047 ms.
Best time for 1792K FFT length: 29.116 ms.
Best time for 2048K FFT length: 32.797 ms.
Best time for 2560K FFT length: 43.661 ms.
Best time for 3072K FFT length: 53.214 ms.
Best time for 3584K FFT length: 64.602 ms.
Best time for 4096K FFT length: 72.811 ms.
Best time for 5120K FFT length: 93.271 ms.
Best time for 6144K FFT length: 111.274 ms.
Best time for 7168K FFT length: 135.486 ms.
Best time for 8192K FFT length: 152.231 ms.
Code:
 ------- OC'd 3200 v26.2 --------
Intel(R) Core(TM) i5 CPU         750  @ 2.67GHz
CPU speed: 3199.98 MHz, 4 cores
CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2, SSE4
L1 cache size: 32 KB
L2 cache size: 256 KB, L3 cache size: 8 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
Prime95 64-bit version 26.2, RdtscTiming=1
Best time for 768K FFT length: 9.618 ms., avg: 9.705 ms.
Best time for 896K FFT length: 11.512 ms., avg: 11.564 ms.
Best time for 1024K FFT length: 13.013 ms., avg: 13.059 ms.
Best time for 1280K FFT length: 16.935 ms., avg: 16.966 ms.
Best time for 1536K FFT length: 20.770 ms., avg: 20.853 ms.
Best time for 1792K FFT length: 24.599 ms., avg: 24.633 ms.
Best time for 2048K FFT length: 27.789 ms., avg: 27.832 ms.
Best time for 2560K FFT length: 35.266 ms., avg: 35.304 ms.
Best time for 3072K FFT length: 43.486 ms., avg: 43.548 ms.
Best time for 3584K FFT length: 51.484 ms., avg: 52.263 ms.
Best time for 4096K FFT length: 58.154 ms., avg: 58.224 ms.
Best time for 5120K FFT length: 74.697 ms., avg: 74.782 ms.
Best time for 6144K FFT length: 93.713 ms., avg: 93.796 ms.
Best time for 7168K FFT length: 112.050 ms., avg: 112.175 ms.
Best time for 8192K FFT length: 126.525 ms., avg: 127.009 ms.
Just comparing the current LL range FFT of 2560:

Stock / v25.9: 44.548
OC 3.2 / v 25.9: 43.661 - 2% Speedup (Not as much as I would have expected)
OC 3.2 / v 26.2: 35.266 - 20% Speedup Woot Woot!!!

Also current assignments that were in the 3072 FFT Range moved to the 2688 FFT and the iteration time dropped from 0.056 to 0.041 ms --- a 27% speedup.
petrw1 is offline   Reply With Quote
Old 2010-09-28, 03:20   #94
delta_t
 
delta_t's Avatar
 
Nov 2002
Anchorage, AK

3·7·17 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Can I trouble you to try 26.2 again? This time go to the Test/Primenet... dialog box. Go to Connection... Select Verbose debug output. Send the prime.log file after it acts up again.

Thanks.
I did as you asked above. Here are the results below (username changed):

Code:
[Mon Sep 27 20:12:17 2010 - ver 26.2]
Updating computer information on the server
PrimeNet error 9: Access denied
Untrusted program versions currently excluded by PrimeNet
Updating computer information on the server
URL: http://v5.mersenne.org/v5server/?v=0.95&px=GIMPS&t=uc&g=cb584594ca003ead473e294184935e6d&hg=01a3f44f920082762303beb0d6a21694&wg=&a=Mac+OS+X,Prime95,v26.2,build+1&c=Genuine+Intel(R)+CPU++++++++++++1400++@+1.83GHz&f=Prefetch,SSE,SSE2&L1=32&L2=2048&np=2&hp=1&m=2048&s=100&h=24&r=1000&u=MyUsernameChanged&cn=orig_mini183&ss=58292&sh=6B8C8D1F6AAC37131B8019869392FD65
== Info: About to connect() to v5.mersenne.org port 80 (#0)
== Info:   Trying 71.6.220.163... 
== Info: connected
== Info: Connected to v5.mersenne.org (71.6.220.163) port 80 (#0)
=> Send header: GET /v5server/?v=0.95&px=GIMPS&t=uc&g=cb584594ca003ead473e294184935e6d&hg=01a3f44f920082762303beb0d6a21694&wg=&a=Mac+OS+X,Prime95,v26.2,build+1&c=Genuine+Intel(R)+CPU++++++++++++1400++@+1.83GHz&f=Prefetch,SSE,SSE2&L1=32&L2=2048&np=2&hp=1&m=2048&s=100&h=24&r=1000&u=MyUsernameChanged&cn=orig_mini183&ss=58292&sh=6B8C8D1F6AAC37131B8019869392FD65 HTTP/1.1
Host: v5.mersenne.org
Accept: */*

<= Recv header: HTTP/1.1 200 OK
<= Recv header: Server: Microsoft-IIS/5.0
<= Recv header: Date: Tue, 28 Sep 2010 03:14:13 GMT
<= Recv header: MicrosoftOfficeWebServer: 5.0_Pub
<= Recv header: X-Powered-By: ASP.NET
<= Recv header: Connection: close
<= Recv header: X-Powered-By: PHP/5.2.5
<= Recv header: Content-type: text/html
<= Recv header: 
<= Recv data: pnErrorResult=9
pnErrorDetail=Untrusted program versions currently excluded by PrimeNet
==END==

== Info: Closing connection #0
RESPONSE:
pnErrorResult=9
pnErrorDetail=Untrusted program versions currently excluded by PrimeNet
==END==

PrimeNet error 9: Access denied
Untrusted program versions currently excluded by PrimeNet
[Mon Sep 27 20:14:48 2010 - ver 26.2]
Updating computer information on the server
URL: http://v5.mersenne.org/v5server/?v=0.95&px=GIMPS&t=uc&g=cb584594ca003ead473e294184935e6d&hg=01a3f44f920082762303beb0d6a21694&wg=&a=Mac+OS+X,Prime95,v26.2,build+1&c=Genuine+Intel(R)+CPU++++++++++++1400++@+1.83GHz&f=Prefetch,SSE,SSE2&L1=32&L2=2048&np=2&hp=1&m=2048&s=1833&h=24&r=1000&u=MyUsernameChanged&cn=orig_mini183&ss=59365&sh=638A12AF748BF06D570EAAAE90200B78
== Info: About to connect() to v5.mersenne.org port 80 (#0)
== Info:   Trying 71.6.220.163... 
== Info: connected
== Info: Connected to v5.mersenne.org (71.6.220.163) port 80 (#0)
=> Send header: GET /v5server/?v=0.95&px=GIMPS&t=uc&g=cb584594ca003ead473e294184935e6d&hg=01a3f44f920082762303beb0d6a21694&wg=&a=Mac+OS+X,Prime95,v26.2,build+1&c=Genuine+Intel(R)+CPU++++++++++++1400++@+1.83GHz&f=Prefetch,SSE,SSE2&L1=32&L2=2048&np=2&hp=1&m=2048&s=1833&h=24&r=1000&u=MyUsernameChanged&cn=orig_mini183&ss=59365&sh=638A12AF748BF06D570EAAAE90200B78 HTTP/1.1
Host: v5.mersenne.org
Accept: */*

<= Recv header: HTTP/1.1 200 OK
<= Recv header: Server: Microsoft-IIS/5.0
<= Recv header: Date: Tue, 28 Sep 2010 03:14:48 GMT
<= Recv header: MicrosoftOfficeWebServer: 5.0_Pub
<= Recv header: X-Powered-By: ASP.NET
<= Recv header: Connection: close
<= Recv header: X-Powered-By: PHP/5.2.5
<= Recv header: Content-type: text/html
<= Recv header: 
<= Recv data: pnErrorResult=9
pnErrorDetail=Untrusted program versions currently excluded by PrimeNet
==END==

== Info: Closing connection #0
RESPONSE:
pnErrorResult=9
pnErrorDetail=Untrusted program versions currently excluded by PrimeNet
==END==

PrimeNet error 9: Access denied
Untrusted program versions currently excluded by PrimeNet
delta_t is offline   Reply With Quote
Old 2010-09-28, 13:52   #95
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

22×1,873 Posts
Default

Quote:
Originally Posted by delta_t View Post
I did as you asked above. Here are the results below
Thanks! I copied the http request into my browser and it failed with the same error - so far so good. Next, I appended the debug switch and the http request then worked as expected. Ugh.

The good news for you is 26.2 for 32-bit Mac version should now be recognized. The bad news is that I am no closer to finding the root cause of the problem.
Prime95 is online now   Reply With Quote
Old 2010-09-28, 16:38   #96
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

2×5×587 Posts
Default

Quote:
Originally Posted by Prime95 View Post
This is normal, albeit confusing. Gwnum consists of building block macros that are optimized for each architecture. These are then put together along with loops and prefetching code to make the FFT.

1) You are probably using a 32-bit executable. The difference between Pentium 4 optimized building blocks and Core 2 optimized building blocks is minimal -- no extra registers available.
2) The Pentium 4 prefetch instruction loads 128 bytes, a Core 2 prefetches 64 bytes. Thus a Core 2 optimized FFT has twice as many prefetch instructions.
3) A Core 2 chip has lots of L2 cache. A 64K FFT probably keeps most of its data in cache, making prefetch instructions of little to no value. Thus, a Pentium4 optimized FFT might be a little faster because it wastes less time executing useless prefetch instructions.

Anyhow, the FFT that is selected came from me doing actual timings of Pentium-4 and Core2 optmized FFTs. The Pentium4 FFT was a hair faster.

Perhaps I should change the FFT description to "Using Pentium4-optimized-even-though-this-is-a-Core2-CPU type-3 FFT"
You are right about me using a 32-bit version. LLR hasn't progressed to 64-bit yet although PFGW has just done so.
More clarity would be helpful as people will wonder why the fft doesn't match the cpu.
henryzz is offline   Reply With Quote
Old 2010-09-28, 16:39   #97
delta_t
 
delta_t's Avatar
 
Nov 2002
Anchorage, AK

3·7·17 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Thanks! I copied the http request into my browser and it failed with the same error - so far so good. Next, I appended the debug switch and the http request then worked as expected. Ugh.

The good news for you is 26.2 for 32-bit Mac version should now be recognized. The bad news is that I am no closer to finding the root cause of the problem.
Great thanks, I'll update to 26.2 next time I'm over at that machine.
I'll have 32-bit Mac version on that for a little while longer until I get some free time to swap in a new Core2 and upgrade to Snow Leopard, so if you need additional debugging, give a yell.
delta_t is offline   Reply With Quote
Old 2010-09-28, 17:59   #98
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

463810 Posts
Default

Quote:
Originally Posted by Rhyled View Post
I find it hard to believe that trial factoring is actually slower in 26.2, but that's what the benchmark says (up to 10%, depending on the factor size)
I just upgraded a 2.8 Ghz PIV to 26.2 and according to the benchmark and actual results factoring is about 1% slower.
petrw1 is offline   Reply With Quote
Old 2010-09-28, 21:54   #99
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

22·1,873 Posts
Default

Quote:
Originally Posted by petrw1 View Post
I just upgraded a 2.8 Ghz PIV to 26.2 and according to the benchmark and actual results factoring is about 1% slower.
One percent is within the benchmarking margin of error. The trial factoring code is identical in the two versions.
Prime95 is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Prime95 version 27.3 Prime95 Software 148 2012-03-18 19:24
Prime95 version 26.3 Prime95 Software 76 2010-12-11 00:11
Prime95 version 25.5 Prime95 PrimeNet 369 2008-02-26 05:21
Prime95 version 25.4 Prime95 PrimeNet 143 2007-09-24 21:01
When the next prime95 version ? pacionet Software 74 2006-12-07 20:30

All times are UTC. The time now is 16:27.

Mon May 17 16:27:12 UTC 2021 up 39 days, 11:08, 0 users, load averages: 4.52, 3.79, 3.51

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.