mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2007-01-25, 13:31   #34
rx7350
 
rx7350's Avatar
 
Feb 2006
AR, US

9016 Posts
Default

The TaT utility can be downloaded from overclock.net .
rx7350 is offline   Reply With Quote
Old 2007-01-25, 15:23   #35
garo
 
garo's Avatar
 
Aug 2002
Termonfeckin, IE

53148 Posts
Default

On my Core2Duos (both laptop and desktop) CoreTemp, TAT and RMClock (on the lappy only) report the same temps. I think that all three programs are accurate because they all read the temp from the digital diode.

BTW, my temps are:
T5600 Laptop Core2Duo@1.86GHz
Full load 2x896K FFTs (crunching not torture testing) - 70-72C
One core fully loaded, the other 60% load (using Throttle=1) - 64-66C

I have not yet succeeded in getting RMClock to lower the voltage on this guy. I'm thinking that 1.15 would be stable and probably decrease temps by about 5C.

Core2Duo E6400 with stock cooler. Again fully loaded with 2x896K FFTs crunching:
@2133MHz - 59-61C
@2400MHz - the same.

Edit: I just realized this a quad-core thread but the first para is still relevant.

Last fiddled with by garo on 2007-01-25 at 15:24
garo is offline   Reply With Quote
Old 2007-02-23, 22:26   #36
dsouza123
 
dsouza123's Avatar
 
Sep 2002

66210 Posts
Default

Maybe the newer 680i motherboards will work better.

Testing two highend systems a Quad versus a Oct (dual Quad)
with the Quad using an Asus Striker Extreme, OCing friendly 680i MB,
the Quad won most of the benchmarks.
The full specs on both systems along with the results are provided.

PCMark05, ScienceMark 2, Sandra, Cinebench
http://www.theinquirer.net/default.aspx?article=37821

How these benchmarks correlate with Prime95 performance is an unknown.

Unfortunately Prime95 isn't one of the benchmarks used
but the author states if other free benchmarks are available
there are a couple of days (this weekend) to run them.

Any ideas on settings for running multiple copies of Prime95
on these systems ?
dsouza123 is offline   Reply With Quote
Old 2007-02-24, 07:54   #37
Cruelty
 
Cruelty's Avatar
 
May 2005

162810 Posts
Default

Quote:
Originally Posted by dsouza123 View Post
Any ideas on settings for running multiple copies of Prime95 on these systems ?
I would like to see 3 x large FFT torture test + 1 benchmark in comparison to "benchmark-only" situation (each instance running with different affinity).
Cruelty is offline   Reply With Quote
Old 2007-02-24, 08:15   #38
S485122
 
S485122's Avatar
 
"Jacob"
Sep 2006
Brussels, Belgium

1,753 Posts
Default

In my experience the NVidia chipset is no good for running multiple memory intensive applications concurrently.

Here are benchmarks done with all parameters equal except the motherboard : QX6700 at stock speed, 6400-C4 memory...)

P965 chipset (Asus P5B-E)
One core used : 31
Two cores on different dies (0 and 2 for instance : 33
Two cores on the same dye : 34
Three cores : 37
Four cores : 42

NVIDIA 650 SLI (Asus P5N-E SLI)
One core used : 34
Two cores on different dies (0 and 2 for instance : 41
Two cores on the same dye : 39
Three cores : 58
Four cores : 74

Even running one instance of Prime95 the NVidia chipset is slower than the P965, it alos uses more power under load (cf the AnandTech and X-Bit labs reviews.)

Once you use all cores of the Quad Core processor the timings are about 75% slower !!!

I wrote about the problem to AnandTech and X-Bits. The benchmarks those reviews use are testing a single (sometimes multithreaded) application. We want to use each core independently. With such usage the memory bus becomes a real bottleneck. This was not a problem with the dual Core Pentium D. I suppose that the memory bus can cope with the demands of Prime95 on a Core 2 Duo, but that is the limit.

I sidelined the P5N-E SLI motherboard, I bought because of this. It has other drawbacks : you have to reset the C-Mos each time a BIOS setting proves unworkable for instance, as I said before, it uses more electricity...

Since there is no big difference betwen the 650i and the 680i (AnandTech called the 650i a 680i killer since it is less expensive fore similar performance.)
S485122 is offline   Reply With Quote
Old 2007-02-25, 15:19   #39
rx7350
 
rx7350's Avatar
 
Feb 2006
AR, US

24×32 Posts
Default

Were you running four LL tests, and what size FFTS were being used? Correct me if I 'm wrong, but it appears that a QX6700 or a Q6600 on an Intel chipset gets decent iteration times, and that's running at stock frequency. I wonder about oc'ing those bad boys (preferably the
QX6700, since it has an unlocked multiplier), and the effect on iteration times.
rx7350 is offline   Reply With Quote
Old 2007-02-25, 18:10   #40
S485122
 
S485122's Avatar
 
"Jacob"
Sep 2006
Brussels, Belgium

1,753 Posts
Default

The timings I gave where running one to four LL tests on 27M exponents, corresponding to 1536K FFTs. Run one instance and the iteration times are good. Run two and you have a performance loss compared to the single instance. According to George Woltman it is better to measure the iteration times by actually running tests than by using the benchmark. The later measures BEST times.

I now run two QX6700.

XP x64, 64 bit Prime95 v24.14 on ASUS P5B-VM, 4GB 6400C4 memory.
Core 0 : LL test 37.6M exponent, 2048K FFT 47,5s average iteration time.
Core 1 : Factorisation fo 41M exponents from 62 to 68 bits in 13,5 hours.
Core 2 : LL test 37.6M exponent, 2048K FFT 52,5s average iteration time.
Core 3 : P-1 with 3,5GB or memory

The core LL testing on the same dye as the P-1 factoring suffers from the memory needs of that test. The other one benefits from the cache the factoring process does not need.

XP , Prime95 v24.14 on ASUS P5B-E Plus, 2GB 8500C4 memory.
Core 0 : LL test 37.6M exponent, 2048K FFT 53,5s average iteration time.
Core 1 : LL test 37.7M exponent, 2048K FFT 53,5s average iteration time.
Core 2 : LL test 37.7M exponent, 2048K FFT 53,5s average iteration time.
Core 3 : LL test 37.7M exponent, 2048K FFT 53,5s average iteration time.

The timings are worse despite the faster memory, because all instances have teh same memory demands.

But even 53,5 / 4 is still a lot better than 40 / 2, especially considering the fact that I use only one MB, and power supply. I chose the Quad because of energy efficiency of the whole system.

As for overclocking. I tried to lower the multiplier and increase the FSB speed, and got worse results for the same CPU speed. On the motherboards I have (and on the ASUS P5N-E SLI) adjusting the meory latencies is very tricky since there are a lot of parameters that are not documented, nor by the memory manufactor, nor by the MB manufactor. When one switches to manual latency settings, the default used for those undocumented values are no good, you cannot even post. And the thing I would want to overclock first is the memory. I can not even spec clock it : on both machines C5 is used, it would be a very lengthty and boring process to find out the 8 different settings for which there is no documentation.

As far as I understand it, the problem is not between the CPU and the memory controller on the northbridge but between the memory controller and the memory. One can not get sufficiently fast memory. This is even more true with the NVIDIA chipset.

One should have one memory controller per two cores all other things being equal.

Last fiddled with by S485122 on 2007-02-25 at 18:26
S485122 is offline   Reply With Quote
Old 2007-12-21, 21:41   #41
ADBjester
 
Aug 2002

2·3·5 Posts
Default More quad core information

As posted to the hogranch mailing list:

My system specs:

eVGA CK-NF68 nVidia 680i motherboard
QX6700
4 GB Corsair Dominator XMS PC2-8500C5D
Vista Ultimate x64
Previous cooling: CoolIt Freezone (insufficient for P95 use)
Current cooling: homebrew watercooling loop (including CPU/video)
Areca 1210 PCI-X RAID controller
BFG nVidia 8800 GTX watercooled edition (not overcloked)
Creative SB X-Fi Elite Pro

Overclock:

13X multiplier taking CPU to 3.47 Ghz, voltage @ 1.4563
Memory bus unlocked from FSB, running at 1000 Mhz @ voltage 2.2V
FSB at stock clock of 1066 Mhz

The above overclock is stable in overnight P95 torture testing. All other overclocks proved unstable, including all FSB alterations -- I have a hardware RAID controller that turned out to be rather finicky about the FSB remaining at stock speed, so FSB alterations are off the table. I couldn't quite stretch the RAM bus to 1:1 1066 Mhz, so I have to run "Unlinked" and set the RAM bus speed asynchronously. 1000 Mhz worked fine, but anything higher started throwing up P95 errors during torture testing, even with higher RAM voltages.

At this overclock with watercooling, with all four cores running a LL primality test at 2560K FFT size, I maintain 58C-64C core temperatures, depending on ambient air temp which can vary in my basement. (The TEC-cooling Freezone couldn't keep it below 70C).

However, iteration times stink when I crank up all four cores.

All exponents are of the close order of 41507900 -- all four were requested within moments of each other as part of a new setup.

With just one core (core 1) cranking, I can get it as low as 0.050.

With two cores (one per die, cores 1 and 3 working, cores 0 and 2 idle), the time is about 0.063.

With two cores cranking on a single die (cores 2 and 3 working, cores 0 and 1 idle), the time is about 0.068 -- i.e. it doesn't (much) matter whether the cores are on the same die or not.

These times, BTW, are probably not quite as precise as they could be as I didn't make an effort to eliminate every superficial process like an idle Skype or my Email client -- I'm approximating real world usage here, not dedicated Mersenne hunting, so I'm not shocked or surprised by that 7% variance.

Adding a third core to the equation produces mixed results. With cores 1, 2, and 3 cranking with core 0 idle, core 1 maintains a steady 0.074 iteration time (which I could probably live with), but the two cores that are on the same die together really contend with each other and results there are 0.094 average for core 3 and 0.118 for core 2.

With all four cores cranking, the iteration times vary from .107 to .132 depending on the core -- double or worse than the timings of just two cores.

I appear to be better off running P95 on just two cores, or perhaps 3. What good are four cores if they are cut to half speed by bandwidth bottlenecks when all are in use? I suppose we'll have to wait for Nehalem to rid ourselves of this bottleneck.

But given the above, is this really RAM bandwidth, or is it an L2 cache issue after all? If it was RAM bandwidth, why would the three-core test allow the code whose die twin is idle to work faster than the two cores who are on the same die (and thus sharing the L2 cache of that die)? If it was a problem with bandwidth in general, or contention for the FSB, wouldn't it affect all three cores equally?

Jeff Woods
Reading, PA
ADBjester is offline   Reply With Quote
Old 2007-12-22, 03:56   #42
ADBjester
 
Aug 2002

2·3·5 Posts
Default

As a followup to the above, I read S458122's post about nVidia vs. Intel chipsets and his performance tests on identical systems. Since I seem to be suffering a loss of more than half my potential performance on the nVidia chipset, I've bitten the bullet and ordered an ASUS Maximus Formula Mobo based on the Intel X38 chipset, which is well known for its overclockability.

I'll receive it next Friday at the earliest (holidays), and I'll have to spend some time draining the watercooling and ripping it apart to move the waterblock over, and then another week or two pushing the overclock as hard as I can with various overnight tests before I'll have more results to post here.

I'll try to post preliminary results on a standard clock as soon as I get it plumbed -- call it 2 weeks, since I'll need to find time to do the build.

Jeff
ADBjester is offline   Reply With Quote
Old 2007-12-24, 01:11   #43
db597
 
db597's Avatar
 
Jan 2003

110010112 Posts
Default

Quote:
Originally Posted by ADBjester View Post
But given the above, is this really RAM bandwidth, or is it an L2 cache issue after all? If it was RAM bandwidth, why would the three-core test allow the code whose die twin is idle to work faster than the two cores who are on the same die (and thus sharing the L2 cache of that die)? If it was a problem with bandwidth in general, or contention for the FSB, wouldn't it affect all three cores equally?
Having the whole L2 cache to work with cushions the impact of the memory bottleneck. So I believe this is still a memory issue - when going from 2 to 3 cores, two of the cores starts to have to share their L2 cache, so a lot more strain starts to be put on the memory. In theory, things should improve a little with Penryn's 6MB per die cache.
db597 is offline   Reply With Quote
Old 2007-12-24, 01:19   #44
db597
 
db597's Avatar
 
Jan 2003

7×29 Posts
Default

Quote:
Originally Posted by ADBjester View Post
I'll receive it next Friday at the earliest (holidays), and I'll have to spend some time draining the watercooling and ripping it apart to move the waterblock over, and then another week or two pushing the overclock as hard as I can with various overnight tests before I'll have more results to post here.
Given the memory bottleneck, for the similar cost, going for a DDR3 @ 1600MHz might have a bigger difference than watercooling and super overclocking your rig. Not to mention less worries about stability, topping up the water periodically, electricity bills etc.
db597 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Dual Core to Quad Core Upgrade Rodrigo Hardware 6 2010-11-29 18:48
exclude single core from quad core cpu for gimps jippie Information & Answers 7 2009-12-14 22:04
Quad Core Questions... TomYosho Information & Answers 2 2009-09-14 13:01
Quad Core R.D. Silverman Hardware 76 2007-11-19 21:57
Optimising work for Intel Core 2 Duo or Quad Core S485122 Software 0 2007-05-13 09:15

All times are UTC. The time now is 22:53.


Tue Nov 30 22:53:02 UTC 2021 up 130 days, 17:22, 0 users, load averages: 1.18, 1.35, 1.42

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.