![]() |
![]() |
#1 |
I ♥ BOINC!
Oct 2002
Glendale, AZ. (USA)
3·7·53 Posts |
![]()
http://www.internetnews.com/ent-news...le.php/3551236
The fabless semiconductor company plans to demonstrate its CXS600 dual-chip board, running at 50 Gigaflops and a mere 25 watts... estimate of pricing would start in the low five figures. ... forthcoming cell processor will be theoretically capable of running as fast as 256 gigaflops per second. Runs in a PCIe slot :D Will it run DC software? |
![]() |
![]() |
![]() |
#2 |
Jul 2004
Nowhere
809 Posts |
![]()
" the ClearSpeed chip features 96 processors running at "only 250 MHz" each"
... prime95 runs on a single core and doesnt run on muti cpu systems we would have to keep 96 instances open.... |
![]() |
![]() |
![]() |
#3 |
Oct 2004
10218 Posts |
![]()
Re: clearspeed
Hehehe I wondered how long before someone mentioned this..... This company's prototype version chip was discussed on mersenneforum circa 2003. I have recently spoken with the company myself. ![]() They are based in Bristol UK quite near me :-) and listed on AIM stock market in the UK. They also have a smaller US office. Since the CS301 they upped from 200 to 250 MHz, from single to dual precision (after feedback from test userbase), and 64 to 96 processing elements per chip, and on-board memory controller in the chip. You would NOT run 96 instances of prime95 on this. For one thing its VLIW not x86 architecture, but there is a C compilation SDK. But you could use it to speed up the DFT/FFT ops p95 uses for multiply of big numbers. (eg compile Mlucas or Glucas with modifications to use it). They already have implemented acceleration for FFT (via FFTW) and certain math libraries eg BLAS. Number theorists here may like to know that they have made it support both MATLAB and Mathematica applications already. The CSX600 chip delivers 25 Gflops DGEMM (sustained) compared to say an opteron giving circa 3.8 Gflops. What is VERY nice is that it achieves this performance using only about 5-10WATTS!!!!! of power consumption. Around November at supercomputer conference I believe you will see one of the large HPC vendors make an announcement re these chips :-) Clearspeed will also be at the show (again). It is possible to buy chips to embed in your own products but you would need to design your own external logic (something I am also considering). However more practical initially is buy a board. The board contains TWO CSX600 (50 Gflops DGEMM). and is PCI-X. What is NOT on the clearspeed website yet is they will announce in Q1 a PCIEXPRESS version, probably available Q2. Obviously you can put more than one card in a chassis. The chips contain two bridge ports to talk to other chips in a chain without glue logic. On the initial board this is interfaced by a FPGA XILINX XC2VP30 FF896 (Virtex 2 Pro). The board power consumption is around 25WATT. EACH chip has their own connection to its own external DDR2 memory module (supporting ECC if you want) up to 4GBYTE. On chip is more memory 6KB per PE, and a further 128 shared by all PEs on the chip. NOTE for priming, unlike CELL chip, these FPUs are DOUBLE PRECISION (ie 64 bits accurate). The company is in a somewhat "stealth mode" until recently. You cannot buy these products today but they do exist. In a few months the PCI-X board will be available for purchase. Board price was said to be somewhere "under $10K". I have obtained clearer pricing details from Clearspeed on both chips and boards but think I should not advertise these until they make formal announcement. What I can say is just cos it fits on a PCI card dont expect it to be cheap :-) At the prices I was given it appears better value to buy chips. If I were making a board I would replace the Xilinx Virtex 2 pro with Virtex 4FX which is faster newer and cheaper so would keep board price down. They are quite happy for other companies to make boards as their core business is chipmaking. CSX600 has 64 bit address space (48 physical). Chip bandwidth TWO ports at 3.2 Gbyte/sec chip-chip bandwidth 3.2 Gbyte/s EXTERNAL mem bandwidth to the DDR2 Internal aggregate bw 96 Gbyte/s in the chip. FOLLOWING RECENT DISCUSSIONS OF "CELL" I THINK THE CSX600 IS A MUCH MORE APPROPRIATE ARCHITECTURE FOR THE THINGS WE DO HERE ON MERSENNEFORUM. The chip is built on 130nm process so I would prefer them to go 90 or 65 to improve it :-( Of interest to maths progs like factoring is the GMP math library. Clearspeed currently ported several libraries for math, chemical molecules, etc but NOT GMP (cos nobody they were working with had asked for it). SO I ASKED FOR IT (LOL). They will think about it. I suspect a petition from mersenneforum math users would show that there exists plenty people who use GMP! So they ought to get GMP supported if they want to sell more chips :-) I believe they intend to sell the SDK (eg to compile your own app/library in C for VLIW) separately and charge extra for it (maybe we can get for George ![]() We have agreed to get in touch again once they have products available for sale. I may also speak with their technical people about protocols/requirements for interfacing the chips on a custom board. For comparison, the two chip board delivers about 50 Gflops Dgemm and for Linpack benchmark somewhere over 30 Gflop (sustained). That makes it better linpack performance to the 12 processor Orion systems DT12 machine 14 Gflop Linpack sustained (on a peak 28 GF). Four boards (30x4=120GF?) would surpass the performance of the Orion DS-96 cluster (110 GF sustained). Orion are very good on power consumption 220W DT-12 / 1500W peak DS-96 but four boards of CSX600 would eat just 100W power which is even better. And in both scenarios the price would be cheaper and it would not be limited by "mere" Gigabit ether interconnect. I would be interested for other mersenneforum participants to discuss this architecture further. ![]() I think it could certainly be good for trial factoring and probably LL too. Because it costs more than a PC the price/performance ratio may not be VASTLY superior BUT the power consumption definitely is. I know I dont have the electrical infrastructure to run 100 PCs in my house but a smaller farm using these accelerator cards certainly would. And reduced power bill helps offset the high price of chips/boards. If you wondered why I didn't post this earlier I was thinking of buying myself some shares/stock in the company before the news spread of these chips :-) But hopefully it will remain secret in the realm of an obscure math community :-) hehehehe ![]() Sorry for posting the techie stuff pretty randomly, I have more ..... |
![]() |
![]() |
![]() |
#4 |
I ♥ BOINC!
Oct 2002
Glendale, AZ. (USA)
111310 Posts |
![]()
Very intersting read!
I just hope we can use the darn thing with DC projects :) |
![]() |
![]() |
![]() |
#5 |
Oct 2004
232 Posts |
![]()
Ironbits said "I hope we can use it with DC projects"
Well, they already made CSX600 work with the GROMACS library. In case you didn't know, the GROMACS library is the current library used by folding at home (about 10x faster than the previous one they started with). A board seems to accelerate GROMACS by at least 10x over a regular pc without card. AND the guy in charge of folding at home is on the like advisory panel and company report of Clearspeed. Therefore it is very strong possibility that F@H either already supports CSX600 or would require like a couple of minor changes to do so. The library would need to check for presence of the board and use it if there. So whilst lotsa wannabe pc "experts" dedicate their overclocked rigs to FOLDING, with one of these boards I could quickly blow away their achievement in stats. Of course I would rather spend my cpu time looking for primes than folding. |
![]() |
![]() |
![]() |
#6 |
Sep 2002
Austin, TX
10001100012 Posts |
![]()
why are these chips so much better at power management than our desktop PCs? Is their fabrication process that much better or is it more than that?
|
![]() |
![]() |
![]() |
#7 | |
Apr 2003
Berlin, Germany
1011010012 Posts |
![]() Quote:
It is very likely (and also communicated by the big players), that future desktop processors will also use some added simpler cores for special tasks. |
|
![]() |
![]() |
![]() |
#8 |
Oct 2004
232 Posts |
![]()
The Clearspeed chip is manufactured by IBM using a 130nm process.
I believe it contains 128 million transistors (have also heard another similar figure). That transistor budget isnt wasted on large amounts of cache. Additionally, I think the instruction set and decoding is much simpler than Pentium 4. VLIW versus CISC. Also some of the control logic is shared and does not need to be repeated for each core. Intel spend extra transistors on optimising like out-of-order instruction execution, branch prediction etc etc to squeeze every last drip of performance but at a high cost in logic gates (and long pipeline). I image Clearspeed could improve their low power consumption FURTHER by moving to 90 or 65nm process geometries as their partner IBM should have some experience in that stuff by now. (but 130 kept the entry to market costs low for them). One major difference in the heat and power consumption is caused by the clocking rate of 250MHz (but note each PE can do 2 FPU ops per cycle and other stuff) as opposed to 3.0-3.8 GHz of typical Intel P4. The upside of this is the CSX600 chips can be operated without any heatsink or fan. |
![]() |
![]() |
![]() |
#9 |
Apr 2003
Berlin, Germany
5518 Posts |
![]()
Some additional thoughts:
The very low clockspeed allows the manufacturers to use different types of transistors with much less leakage and a much better behaviour in regards of power consumption in general. Usually on modern CPUs there are a lot of different variants of transistors depending on their work and place (e.g. if time critical or not). Lower clocked designs allow to use power saving transistors everywhere. |
![]() |
![]() |
![]() |
#10 |
Jul 2004
Potsdam, Germany
3·277 Posts |
![]()
Seems like ClearSpeed has hit a big one:
http://www.tmcnet.com/usubmit/2005/nov/1204858.htm (some more info here: http://www.google.com/translate?u=ht...&hl=en&ie=UTF8 ) |
![]() |
![]() |
![]() |
#11 |
Bemusing Prompter
"Danny"
Dec 2002
California
5×499 Posts |
![]()
Previously, it was agreed that ClearSpeed was only suitable for trial factoring at best. Are the new ClearSpeed processors able to do L-L tests yet?
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Nigerian professor claims to have solved Riemann hypothesis | ixfd64 | Miscellaneous Math | 19 | 2015-11-23 14:31 |
Rebuttal to false claims about Earth's magnetic field | cheesehead | Science & Technology | 14 | 2013-02-14 12:27 |
48-core chip - Intel | hj47 | Hardware | 6 | 2009-12-20 13:59 |
EU Claims Internet Could Fall Apart in November 2005 | cheesehead | Soap Box | 4 | 2005-12-26 08:01 |
Clearspeed? | TauCeti | Hardware | 20 | 2003-12-17 17:57 |