Go Back > Great Internet Mersenne Prime Search > Hardware

Thread Tools
Old 2005-09-24, 04:19   #1
IronBits's Avatar
Oct 2002
Glendale, AZ. (USA)

3·7·53 Posts
Post ClearSpeed Claims Fastest Chip Crown
The fabless semiconductor company plans to demonstrate its CXS600 dual-chip board, running at 50 Gigaflops and a mere 25 watts...

estimate of pricing would start in the low five figures.
forthcoming cell processor will be theoretically capable of running as fast as 256 gigaflops per second.
Runs in a PCIe slot :D
Will it run DC software?
IronBits is offline   Reply With Quote
Old 2005-09-24, 06:02   #2
moo's Avatar
Jul 2004

809 Posts

" the ClearSpeed chip features 96 processors running at "only 250 MHz" each"
... prime95 runs on a single core and doesnt run on muti cpu systems we would have to keep 96 instances open....
moo is offline   Reply With Quote
Old 2005-09-24, 22:23   #3
Peter Nelson
Peter Nelson's Avatar
Oct 2004

10218 Posts

Re: clearspeed

Hehehe I wondered how long before someone mentioned this.....

This company's prototype version chip was discussed on mersenneforum circa 2003.

I have recently spoken with the company myself.

They are based in Bristol UK quite near me :-) and listed on AIM stock market in the UK. They also have a smaller US office.

Since the CS301 they upped from 200 to 250 MHz, from single to dual precision (after feedback from test userbase), and 64 to 96 processing elements per chip, and on-board memory controller in the chip.

You would NOT run 96 instances of prime95 on this.

For one thing its VLIW not x86 architecture, but there is a C compilation SDK. But you could use it to speed up the DFT/FFT ops p95 uses for multiply of big numbers. (eg compile Mlucas or Glucas with modifications to use it).

They already have implemented acceleration for FFT (via FFTW) and certain math libraries eg BLAS.

Number theorists here may like to know that they have made it support both MATLAB and Mathematica applications already.

The CSX600 chip delivers 25 Gflops DGEMM (sustained) compared to say an opteron giving circa 3.8 Gflops.

What is VERY nice is that it achieves this performance using only about 5-10WATTS!!!!! of power consumption.

Around November at supercomputer conference I believe you will see one of the large HPC vendors make an announcement re these chips :-)
Clearspeed will also be at the show (again).

It is possible to buy chips to embed in your own products but you would need to design your own external logic (something I am also considering).

However more practical initially is buy a board.

The board contains TWO CSX600 (50 Gflops DGEMM). and is PCI-X.

What is NOT on the clearspeed website yet is they will announce in Q1 a PCIEXPRESS version, probably available Q2.

Obviously you can put more than one card in a chassis.

The chips contain two bridge ports to talk to other chips in a chain without glue logic. On the initial board this is interfaced by a FPGA XILINX XC2VP30 FF896 (Virtex 2 Pro). The board power consumption is around 25WATT.

EACH chip has their own connection to its own external DDR2 memory module (supporting ECC if you want) up to 4GBYTE.

On chip is more memory 6KB per PE, and a further 128 shared by all PEs on the chip.

NOTE for priming, unlike CELL chip, these FPUs are DOUBLE PRECISION (ie 64 bits accurate).

The company is in a somewhat "stealth mode" until recently.

You cannot buy these products today but they do exist.

In a few months the PCI-X board will be available for purchase.

Board price was said to be somewhere "under $10K".

I have obtained clearer pricing details from Clearspeed on both chips and boards but think I should not advertise these until they make formal announcement.

What I can say is just cos it fits on a PCI card dont expect it to be cheap :-)

At the prices I was given it appears better value to buy chips. If I were making a board I would replace the Xilinx Virtex 2 pro with Virtex 4FX which is faster newer and cheaper so would keep board price down. They are quite happy for other companies to make boards as their core business is chipmaking.

CSX600 has 64 bit address space (48 physical).

Chip bandwidth TWO ports at 3.2 Gbyte/sec chip-chip bandwidth
3.2 Gbyte/s EXTERNAL mem bandwidth to the DDR2
Internal aggregate bw 96 Gbyte/s in the chip.


The chip is built on 130nm process so I would prefer them to go 90 or 65 to improve it :-(

Of interest to maths progs like factoring is the GMP math library.

Clearspeed currently ported several libraries for math, chemical molecules, etc but NOT GMP (cos nobody they were working with had asked for it).


They will think about it. I suspect a petition from mersenneforum math users would show that there exists plenty people who use GMP! So they ought to get GMP supported if they want to sell more chips :-)

I believe they intend to sell the SDK (eg to compile your own app/library in C for VLIW) separately and charge extra for it (maybe we can get for George ).

We have agreed to get in touch again once they have products available for sale. I may also speak with their technical people about protocols/requirements for interfacing the chips on a custom board.

For comparison, the two chip board delivers about 50 Gflops Dgemm and for Linpack benchmark somewhere over 30 Gflop (sustained). That makes it better linpack performance to the 12 processor Orion systems DT12 machine 14 Gflop Linpack sustained (on a peak 28 GF).

Four boards (30x4=120GF?) would surpass the performance of the Orion DS-96 cluster (110 GF sustained). Orion are very good on power consumption 220W DT-12 / 1500W peak DS-96 but four boards of CSX600 would eat just 100W power which is even better.

And in both scenarios the price would be cheaper and it would not be limited by "mere" Gigabit ether interconnect.

I would be interested for other mersenneforum participants to discuss this architecture further.

I think it could certainly be good for trial factoring and probably LL too.

Because it costs more than a PC the price/performance ratio may not be VASTLY superior BUT the power consumption definitely is. I know I dont have the electrical infrastructure to run 100 PCs in my house but a smaller farm using these accelerator cards certainly would. And reduced power bill helps offset the high price of chips/boards.

If you wondered why I didn't post this earlier I was thinking of buying myself some shares/stock in the company before the news spread of these chips :-)
But hopefully it will remain secret in the realm of an obscure math community :-) hehehehe

Sorry for posting the techie stuff pretty randomly, I have more .....
Peter Nelson is offline   Reply With Quote
Old 2005-09-25, 18:07   #4
IronBits's Avatar
Oct 2002
Glendale, AZ. (USA)

111310 Posts

Very intersting read!
I just hope we can use the darn thing with DC projects :)
IronBits is offline   Reply With Quote
Old 2005-09-25, 22:05   #5
Peter Nelson
Peter Nelson's Avatar
Oct 2004

232 Posts

Ironbits said "I hope we can use it with DC projects"

Well, they already made CSX600 work with the GROMACS library.

In case you didn't know, the GROMACS library is the current library used by folding at home (about 10x faster than the previous one they started with).

A board seems to accelerate GROMACS by at least 10x over a regular pc without card.

AND the guy in charge of folding at home is on the like advisory panel and company report of Clearspeed.

Therefore it is very strong possibility that F@H either already supports CSX600 or would require like a couple of minor changes to do so.

The library would need to check for presence of the board and use it if there.

So whilst lotsa wannabe pc "experts" dedicate their overclocked rigs to FOLDING, with one of these boards I could quickly blow away their achievement in stats.

Of course I would rather spend my cpu time looking for primes than folding.
Peter Nelson is offline   Reply With Quote
Old 2005-09-27, 06:12   #6
E_tron's Avatar
Sep 2002
Austin, TX

10001100012 Posts

why are these chips so much better at power management than our desktop PCs? Is their fabrication process that much better or is it more than that?
E_tron is offline   Reply With Quote
Old 2005-09-27, 09:29   #7
Dresdenboy's Avatar
Apr 2003
Berlin, Germany

1011010012 Posts

Originally Posted by E_tron
why are these chips so much better at power management than our desktop PCs? Is their fabrication process that much better or is it more than that?
This is, what happens, if you go from a fast and narrow to a slow and wide architecture (see GPUs for example). A lot of power consumed by desktop PC processors is used to extract some parallelism, managing tens or over a hundred instructions in flight etc. They are general purpose processors. This alone causes a lot of heat, because they have to be good at most kinds of tasks. More specialized processors can save all this stuff required to execute standard application code. More like a hybrid, applying both these principles, is Cell.

It is very likely (and also communicated by the big players), that future desktop processors will also use some added simpler cores for special tasks.
Dresdenboy is offline   Reply With Quote
Old 2005-09-27, 14:21   #8
Peter Nelson
Peter Nelson's Avatar
Oct 2004

232 Posts

The Clearspeed chip is manufactured by IBM using a 130nm process.

I believe it contains 128 million transistors (have also heard another similar figure).

That transistor budget isnt wasted on large amounts of cache.

Additionally, I think the instruction set and decoding is much simpler than Pentium 4. VLIW versus CISC. Also some of the control logic is shared and does not need to be repeated for each core.

Intel spend extra transistors on optimising like out-of-order instruction execution, branch prediction etc etc to squeeze every last drip of performance but at a high cost in logic gates (and long pipeline).

I image Clearspeed could improve their low power consumption FURTHER by moving to 90 or 65nm process geometries as their partner IBM should have some experience in that stuff by now. (but 130 kept the entry to market costs low for them).

One major difference in the heat and power consumption is caused by the clocking rate of 250MHz (but note each PE can do 2 FPU ops per cycle and other stuff) as opposed to 3.0-3.8 GHz of typical Intel P4. The upside of this is the CSX600 chips can be operated without any heatsink or fan.
Peter Nelson is offline   Reply With Quote
Old 2005-09-27, 14:48   #9
Dresdenboy's Avatar
Apr 2003
Berlin, Germany

5518 Posts

Some additional thoughts:
The very low clockspeed allows the manufacturers to use different types of transistors with much less leakage and a much better behaviour in regards of power consumption in general. Usually on modern CPUs there are a lot of different variants of transistors depending on their work and place (e.g. if time critical or not). Lower clocked designs allow to use power saving transistors everywhere.
Dresdenboy is offline   Reply With Quote
Old 2005-11-15, 22:32   #10
Mystwalker's Avatar
Jul 2004
Potsdam, Germany

3·277 Posts

Seems like ClearSpeed has hit a big one:

(some more info here: )
Mystwalker is offline   Reply With Quote
Old 2005-11-15, 23:19   #11
Bemusing Prompter
ixfd64's Avatar
Dec 2002

5×499 Posts

Previously, it was agreed that ClearSpeed was only suitable for trial factoring at best. Are the new ClearSpeed processors able to do L-L tests yet?
ixfd64 is offline   Reply With Quote

Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Nigerian professor claims to have solved Riemann hypothesis ixfd64 Miscellaneous Math 19 2015-11-23 14:31
Rebuttal to false claims about Earth's magnetic field cheesehead Science & Technology 14 2013-02-14 12:27
48-core chip - Intel hj47 Hardware 6 2009-12-20 13:59
EU Claims Internet Could Fall Apart in November 2005 cheesehead Soap Box 4 2005-12-26 08:01
Clearspeed? TauCeti Hardware 20 2003-12-17 17:57

All times are UTC. The time now is 19:54.

Fri Feb 3 19:54:11 UTC 2023 up 169 days, 17:22, 1 user, load averages: 1.05, 0.94, 0.93

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔