 2007-01-12, 16:29 #1 sgrupp     Dec 2003 19 Posts Quad Core and P95 Now that the Quad core (QX7600) systems are coming out, any experience with Prime95? Clearly the throughput/ issue will be a challenge in these expensive systems, but what are people seeing as regards: 1) Memory contention when all 4 cores are running P95. Are iteration times still OK? 2) HEAT. Can an air cooling solution handle all 4 cores doing LL testing? 3) Anyone running a water cooled overclocked rig like Dell's latest? 4) Power. Any notion of what power consumption is on a QX6700 running 4 LL tests?
 2007-01-12, 17:20 #2 dsouza123     Sep 2002 2×331 Posts The thermal envelope for the quad core QX6700 is 130W which is double the dual core E6700's 65W. Both run at 2.66 Ghz. The new quad core Q6600 runs at 2.4 Ghz with 80W envelope. Last fiddled with by dsouza123 on 2007-01-12 at 17:21
 2007-01-12, 18:31 #3 S485122     "Jacob" Sep 2006 Brussels, Belgium 110110100002 Posts I am busy with a quad core at the moment. 1. Iteration times are consistent : running one instance of the benchmark or running 4 has no significant impact. Although I obtained more of the minimum times for a given FFT size when running only one instance of the benchmarks, the differences are well within the standard deviation on the benchmark figures. The standard deviation for the FFT tests is about 2%, and for the factoring tests it is 1%, the maximum difference can go up to 20%. There is one point and I intend to post about it : the Level 2 cache sizes are not recognised : Code: L1 cache size: 32 KB L2 cache size: unknown L1 cache line size: 64 bytes L2 cache line size: unknown And when running 4 LL tests at the same time, the iteration times are much higher than the results of the benchmarks. There have been some posts about this problem. I will investigate further. 2. Air cooling is fine, it will dissipate as much heat as a Pentium IV D830 or D840 processor. You have to use a good coler though. I use a Zalman 9700 and core temperatures measured by the thermal diodes are well within specs. 3. I use no water cooling. I do not overclock for now because the machine is breaking in. 4. I did not measure the power consumptiun, but it should be equivalent to 2 Core 2 Duo E6700 or one D830 processor. The overal conusumptiun of the system is not twice that of a E6700 system because of all the elements in common. I would say count some 70 Watts more than a E6700.
 2007-01-12, 18:39 #4 sgrupp     Dec 2003 19 Posts What kind of iteration times are you getting on the quad core? With a Core 2 Duo 2.66 processor and a 36M LL test, I am getting .05 sec/iteration with each of the 2 processes. On the unrecognized L2 cache problem, undoc.txt has this to say: You can explicitly specify the L2 cache size although this shouldn't be necessary since the program uses the CPUID instruction to determine the L2 cache size. In local.ini enter: CpuL2CacheSize=128 or 256 or 512 CpuL2CacheLineSize=32 or 64 or 128 CpuL2SetAssociative=4 or 8
Quote:
 Originally Posted by S485122 Iteration times are consistent : running one instance of the benchmark or running 4 has no significant impact.
The benchmark is a poor tool to use. Since it reports the BEST iteration time.

You really need to run 4 LL tests which will report the more important average iteration time.

 2007-01-12, 19:34 #6 sgrupp     Dec 2003 1910 Posts What is your guess on memory contention or bus limitations for 4 LL tests running on a quad core with a total of 8M of L2 cache, George? A significant issue or not?
 2007-01-12, 19:40 #7 sgrupp     Dec 2003 19 Posts And a second question - is the right value for L2 Cache size 2048 (i. e. 2 MB for each core)? CpuL2CacheSize=2048 or CpuL2CacheSize=8192 ?
Quote:
 Originally Posted by sgrupp What is your guess on memory contention or bus limitations for 4 LL tests running on a quad core with a total of 8M of L2 cache, George? A significant issue or not?
I am curious how NFS will perform on a quad-core. I am certain that
cache and memory contention will be a major problem. Running 4 instances
may even be slower (in aggregate output) than running 2 instances.

If I provide code and data for a Windows system, can someone run an
NFS benchmark???

Quote:
 Originally Posted by sgrupp And a second question - is the right value for L2 Cache size 2048 (i. e. 2 MB for each core)? CpuL2CacheSize=2048 or CpuL2CacheSize=8192 ?
This is not exactly answering the question, but:

According to Wikipedia (german Wikipedia seems to be more informative than the english one), the QX6700 ("Core 2 Extreme Kentsfield") consists of two Dice of Dual-Core, not one Die of Quad-Core.
Each Die has got one 4096 MB L2 Cache (shared by both cores), if I understand correctly.

Quote:
 Originally Posted by sgrupp What kind of iteration times are you getting on the quad core? With a Core 2 Duo 2.66 processor and a 36M LL test, I am getting .05 sec/iteration with each of the 2 processes. On the unrecognized L2 cache problem, undoc.txt has this to say: You can explicitly specify the L2 cache size although this shouldn't be necessary since the program uses the CPUID instruction to determine the L2 cache size. In local.ini enter: CpuL2CacheSize=128 or 256 or 512 CpuL2CacheLineSize=32 or 64 or 128 CpuL2SetAssociative=4 or 8
On the quadcore testing 4 differnet 27M numbers concurently I get 0,042 s iteration times. On the benchmark it is 0,33 s. The difference is HUGE. Yesterday I already tried to set the missing L2 values, the results where not better. I followed the indications provided by George in the thread L2 cache unknown with new CPUs

CPUID says the following about the cache on the QX6700 :
L1 data cache : 4 x 32 Kbytes, 8 way set associative, 64 bytes line size
L1 code (or instruction) cache : 4 x 32 Kbytes, 8 way set associative, 64 bytes line size
L2 : 2 x 4096 KBytes, 16 way set associative, 64 bytes line size

A QX6700 is the same as two E67000 on one board through one socket. So the shared cache issue is the same as on an E6700. But it could be memory contention : 4 instances of Prime95 being to much for the memory bus ? I could try to fiddle with the memory settings of my board, but I want to wait a bit first.

Or is it something to do with
Quote:
 Originally Posted by Prime95 I had the same problem in 64-bit Windows (dual core Pentium 4). It turns out to be some weird Windows problem reading the time stamp counter. I "fixed" it by adding the /usepmtimer to the boot.ini file

 2007-01-12, 22:56 #11 Cruelty     May 2005 162810 Posts Maybe it is FSB that's holding back this CPU? I would lower the multiplier to x8 and increase the FSB from 266 to 333 MHz to see if that helps. Also setting an affinity for every instance would be advisable for quad-core since all the communication between the cores happens through FSB. BTW: does running 2 or 3 instances instead of 4 make any difference? Last fiddled with by Cruelty on 2007-01-12 at 23:00

