![]() |
![]() |
#1 |
Aug 2002
2×7×617 Posts |
![]()
http://www.cpuid.com/K8/index.php
After reading this, I can see why Prime95/mprime might be hard to optimize for the K8... Pay close attention to the inclusive cache design, especially how on K8 an L1 hit involves writing to the L2... The P4's exclusive design is more efficient in this regard... Also look at the L1 latency... The P4 has a very small L1, but it has a very low latency... Finally, look at the SSE2 section labeled "Floating points calculations: x87, SSE and SSE2"... I barely understand it, but it looks like the Intel wins by a large margin in this category... The second half of the paper shows great promise for the extra registers and stuff, but I don't know how those will affect our work... I guess when you compare the K8 to the P4, you get two totally different ways of doing things, that eventually accomplish the same goal... It just looks like the P4 way happens to be more efficient for SIMD stuff, like Prime95... I don't know if a new client, written from the ground up for the K8, would do better than the P4 client we have now... I'm thinking probably not... As it is, this paper explains why a P4 is faster in Prime95 than an equally clocked K8... Just making them equal at the same clock speed looks tough... I do hope that the extra registers and 64-bit stuff will help general scientific computing in the future though... One other thing I picked up from this article is if you are going to buy a K8, and you have a choice between a smaller cache model that is clocked higher and a larger cache model that is clocked lower, go for the higher clocked model... The cache part of that paper makes it very clear that the L2 size isn't that important in the overall design... (I learn this after I specifically bought a 1MB L2 3200+... Doh!) Personally, I tend to agree more with the K8 design than the P4 design... Yes, the P4 is very fast for some tasks, but overall, it looks like the K7/K8 is a better "general purpose" CPU... And we all know that the P4 design is starting to show its limits, and we haven't even officially hit 4GHz yet... In fact, Intel plans to scrap the P4 design pretty soon... Please read this paper and let us know your thoughts... Edit: I mirrored the zip file they linked at the bottom of that article since it had a bad URL... |
![]() |
![]() |
![]() |
#2 | |
Sep 2003
Borg HQ, Delta Quadrant
10101111102 Posts |
![]() Quote:
![]() |
|
![]() |
![]() |
![]() |
#3 |
Jun 2003
Ottawa, Canada
3×17×23 Posts |
![]()
Very cool article.
|
![]() |
![]() |
![]() |
#4 | |
P90 years forever!
Aug 2002
Yeehaw, FL
2·4,127 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#5 | |
Aug 2002
2·7·617 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#6 |
Jul 2004
Nowhere
14518 Posts |
![]()
intresting lol poor intel but i bet u could oc those 3.8 up to mabey 4 to 4.2
also alienware seams to off 4 ghzs system useing the intel p4 proibly over clocked funneyest part of it all is the fact that they water cool it meaning those things are kicking out more heat then the older ones :)... probily? |
![]() |
![]() |
![]() |
#7 | |
Aug 2002
2·7·617 Posts |
![]() Quote:
http://www.theinquirer.net/?article=19110 |
|
![]() |
![]() |
![]() |
#8 |
Aug 2002
2·7·617 Posts |
![]() |
![]() |
![]() |
![]() |
#9 |
Aug 2002
3×52×7 Posts |
![]()
The real reason the AMD 64 and the Pentium-M lag behind the Pentium-IV is that they both use 80 bit Wallace trees for Floating Point operations whereas the Pentium-IV uses 128 bit Wallace trees.
|
![]() |
![]() |
![]() |
#10 |
"GIMFS"
Sep 2002
Oeiras, Portugal
62E16 Posts |
![]()
Absolutely...
That´s why the SSE2 implementation in the P4s is superior to the one at Ath64 ![]() Let´s hope that the forthcoming Win64 will give AMD a push ![]() |
![]() |
![]() |
![]() |
#11 | |
Apr 2003
Berlin, Germany
192 Posts |
![]() Quote:
The Pentium 4 has other advantages, but not a 128 bit Wallace tree. The same idea came up in another forum (sudhian?) but is plain wrong, since the P4 still does the 2 calculations in a SSE2 vector operation one after another. That's why the throughput is 2 for such ops and not 1 (which it would be with a wider Wallace tree). But there is hope:
|
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Finding a paper | CRGreathouse | Information & Answers | 1 | 2010-08-17 22:32 |
LLT for Fermats : need a paper | T.Rex | Math | 3 | 2010-01-06 19:47 |
An interesting paper: Pomerance-Lucas | T.Rex | Math | 5 | 2009-01-30 22:50 |
Need a paper! | Citrix | Math | 21 | 2005-12-18 08:45 |
Composing a paper | devarajkandadai | Miscellaneous Math | 4 | 2005-03-30 10:26 |