mersenneforum.org Upcoming Prime95 monsters (processors)
 Register FAQ Search Today's Posts Mark Forums Read

2006-06-05, 14:30   #89
R.D. Silverman

Nov 2003

746010 Posts

Quote:
 Originally Posted by TheJudger kind off... 1024k FFT 15min test time: Dualcore Opteron 170 clocked @3.2GHz: http://img153.imageshack.us/my.php?i...20015044gz.png lower bound: 5*4000 iterations/(15m*60) = 22.2 iterations per second <=> 45ms per iteration upper bound: 6*4000 iterations/(15m*60) = 26.7 iterations per second <=> 37.5ms per iteration http://img242.imageshack.us/my.php?i...71313250sz.png Conroe E6600 clocked @2.7GHz lower bound: 9*4000 iterations/(15m*60) = 40.0 iterations per second <=> 25ms per iteration upper bound: 10*4000 iterations/(15m*60) = 44.4 iterations per second <=> 22.5ms per iteration

Nice.

If I provide code (source if you like) and data could you run both a
machine?

2006-06-05, 15:23   #90
TheJudger

"Oliver"
Mar 2005
Germany

11×101 Posts

Quote:
 Originally Posted by R.D. Silverman If I provide code (source if you like) and data could you run both a single-thread and double-thread benchmark of the lattice sieve on this machine?
If I would own such a machine: sure.
But sadly it's not my machine :(

I've found these screenshots in a german hardware forum.

 2006-06-08, 16:52 #91 dsouza123     Sep 2002 10100101102 Posts There are now two companies making FPGA based Opteron coprocessors. http://www.eetimes.com/news/semi/sho...leID=188702712 The coprocessors plug directly into an empty CPU socket and can be dynamically reconfigured, thus permitting users to change logic configurations to better match the algorithms that need acceleration. DRC Computer Corp. and XtremeData Inc., are delivering programmable solutions that can accelerate time-critical algorithms. These coprocessors leverage the flexibility of Xilinx and Altera FPGAs, respectively, so that they can be configured to accelerate graphics, XML, floating point, video transcoding and other applications. Both the DRC and XtremeData solutions are modules that combine an FPGA with static RAM, flash memory (XtremeData only), and interface logic to support 8- or 16-bit HyperTransport interfaces. DRC offers three versions of its module: the DRC100-L60ES and L60, which are based on the 60k logic cell LX60 Virtex 4 FPGA, and the DRC110-L160, which is based on the 152k logic cell LX160 FPGA. The XD1000 from XtremeData employs Altera's largest Stratix II FPGA, the EP2S180...the company has several enhanced versions of XD1000 planned for future release. To develop the hardware-based algorithms XtremeData leverages Altera's SOPC Builder and C2H (C-language to hardware) tools as well as Altera's soft intellectual property blocks such as the NIOS processor core. A full development system with a dual-socket motherboard and one XD1000 module sells for about $15,000 in small quantities; the XD1000 module sells for$6,500 a piece.
 2006-10-11, 12:30 #93 Dresdenboy     Apr 2003 Berlin, Germany 192 Posts AMD presented more details on MPF: http://www.thechannelinsider.com/pri...ls/191008.aspx A photo of the beast: http://news.com.com/2300-1006_3-6124...4500&subj=news Most interesting for Prime95 should be these features (many are already known, but several details were not): 128 bit SSE paths 2x128 bit L1D bandwidth 32B instruction fetch window (since Prime95 uses loads of long SSE2 instructions) 128 bit L2/NB bandwidth 36 dedicated 128 bit ops in the FPU scheduler (vs. only 18 128 bit ops before) FMISC unit can execute SSE MOV (128 bit/cycle) an AMD slide mentioned a max of 2 128 bit SSE ops + 1 SSE MOV + 2 SSE loads/cycle (as memory operands) generally 2 SSE loads/cycle (if there is not such a bottleneck as with K8, then this should actually quadruple the load bandwidth during blocks of MOVPDs at the beginning of most FFT butterfly macros L3 cache and separate 64 bit memory channels might help reducing latency for the multi megabyte Prime95 workloads.. especially for multiple instances (not a multithreaded variant), which work on different working sets
 2006-10-11, 22:52 #94 dsouza123     Sep 2002 10100101102 Posts Other AMD features (reductions): The L1 cache drops from 128KB (64KB data and 64KB code) to 64KB (32 and 32), and the L2 drops from 1024KB to 512KB. The 64KB L1 is a supprising change, the Athlon/Opteron chips have had 128KB since the beginning, the 512KB is within the range of previous L2 amounts from 256KB to 512KB to more recently 1024KB.
2006-10-12, 07:47   #95
Dresdenboy

Apr 2003
Berlin, Germany

5518 Posts

Quote:
 Originally Posted by dsouza123 Other AMD features (reductions): The L1 cache drops from 128KB (64KB data and 64KB code) to 64KB (32 and 32), and the L2 drops from 1024KB to 512KB. The 64KB L1 is a supprising change, the Athlon/Opteron chips have had 128KB since the beginning, the 512KB is within the range of previous L2 amounts from 256KB to 512KB to more recently 1024KB.
That is still discussed on some forums out there. But months ago someone from AMD already stated, that the L1 caches will still be 2 x 64 kB per core. Also the die plots and die photos support this, since the size of the L1 caches relative to the core didn't change much. Mostly because of different SRAM cells. And many knowledgable people (e.g. Hans de Vries, who discovered 64bit in Prescott die photo) didn't see a cache reduction.

The confusion might be caused by an AMD slide showing the cache infrastructure, where only 64 kB L1 per core are shown. But this is actually the infrastructure for data cache. See here:
About these 64kB they say: "keeps most critical data", "2 128 bit data paths" (L1D+L1I will have four 128 bit data paths in Barcelona), "2 loads per cycle" (same as for K8 L1D).

2006-10-13, 07:09   #96
Dresdenboy

Apr 2003
Berlin, Germany

192 Posts

Confirmation for 128 kB L1 from Johan (from Anandtech):
Quote:
 At our last phone call, Damon Muzny repeated this at least 3 times that the figure with Data cache might have confused a lot of people: but the L1 is still 64KB D + 64 KB I, just like it was before.
http://www.aceshardware.com/forums/r...8309&forumid=1

 2007-05-16, 07:06 #97 Dresdenboy     Apr 2003 Berlin, Germany 192 Posts Optimization Manual The "Software Optimization Guide for AMD Family 10h Processors" is available now: http://www.amd.com/us-en/assets/cont...docs/40546.pdf Besides all the stuff already known, there are some informations even new to me, like that the L3 cache is bandwidth adaptive, which means, that goes to lower latency and bandwidth, if there is less traffic and increases bandwidth (while also increasing latency) in the case of cache traffic reaching some treshold. Most SSEn instructions are now decoded more efficiently, allowing more of them to reside in the scheduler, so that it can exploit ILP better. I've got an idea how to find out, how Prime95 might run on K10 compared to K8. The availability of this manual allows to run some simulations, which should come closer to reality in the labs than any SWAG.

 Similar Threads Thread Thread Starter Forum Replies Last Post joblack Hardware 4 2010-04-02 14:23 Xyzzy Forum Feedback 1 2007-11-26 18:57 AntonVrba Hardware 6 2006-06-14 19:49 pcr Software 8 2005-12-22 14:43 Unregistered Data 6 2004-08-12 00:28

All times are UTC. The time now is 07:24.

Thu May 6 07:24:05 UTC 2021 up 28 days, 2:04, 0 users, load averages: 2.15, 2.13, 2.23