Beowolf cluster on the Cheap, breaking 100$/GFlop

News release, for those not technically inclined http://www.calvin.edu/news/releases/.../microwulf.htm More technical: http://www.calvin.edu/~adams/research/microwulf/ From link above: "providing over 26 Gflops of measured performance, for less than $2500" Price/Performance: http://www.calvin.edu/~adams/research/microwulf/PPR/ Power/Performance: http://www.calvin.edu/~adams/research/microwulf/power/ Other, related links: http://www.clustermonkey.net//content/view/211/1/

I heard about this on /., and it's really just four MB's networked together. Also, you could build the same system at newegg for $1k cheaper than the price listed there. And, 26.25 GFLOPS isn't all that much any more. According to Sisoftware Sandra, a Intel Core 2 Quad QX6700 is 33.538 GFLOPS, and at newegg costs just under$1k. That's about $30/GFLOPS. http://hardware.slashdot.org/article.../08/31/0235242 ^ The discussion at Slashdot about it.
 2007-08-31, 19:11 #3 Bundu     Jul 2004 Mid Calder, Scotland 101110012 Posts Pretty amazing, where can I buy one :-) however.... would this system run p95 any better than running it on several quad core PC's?
 2007-08-31, 19:57 #4 jflin   Aug 2003 5 Posts Prime95 It's just a buncha motherboards whacked together, so ~= 4 normal desktop machines
Quote:
 Originally Posted by Bundu Pretty amazing, where can I buy one :-) however.... would this system run p95 any better than running it on several quad core PC's?
Quote:
 Originally Posted by jflin It's just a buncha motherboards whacked together, so ~= 4 normal desktop machines
No, it wouldn't do any better than running on several PC's, unless you have it multithreaded across multiple CPU's, but I don't think that would help very much, considering the inter-CPU communication is so slow.
It has 4 AMD Athlon 64 X2 3800+'s, which each get .1023 iteration time on the current size of first-time tests per core per CPU. It is 10.928 GFLOPS CPU. It would be 43.712 GFLOPS if they could communicate perfectly. 17.45 GFLOPS is lost on the communication. I think that's largely because it's over a normal gigabit LAN.

Quote:
 Originally Posted by Mini-Geek No, it wouldn't do any better than running on several PC's, unless you have it multithreaded across multiple CPU's, but I don't think that would help very much, considering the inter-CPU communication is so slow. It has 4 AMD Athlon 64 X2 3800+'s, which each get .1023 iteration time on the current size of first-time tests per core per CPU. It is 10.928 GFLOPS CPU. It would be 43.712 GFLOPS if they could communicate perfectly. 17.45 GFLOPS is lost on the communication. I think that's largely because it's over a normal gigabit LAN.
Keep in mind that LL-testing, like ECM factorization and NFS sieving, is perfectly parallelizable. By that I mean the communications costs are completely negligible. A Beowulf cluster doesn't make so much economic sense for them because it contains an expensive network interconnect which could be replaced by something much much slower and cheaper. There was a time when I used sneakernet (i.e. carrying floppies around) for the interconnect on a parallel NFS sieving "cluster" (in reality, 5 machines which were not connected to the lab network). Worked perfectly well because the required bandwidth was a few megabytes per day.

Very few problems in real life are perfectly parallelizable. Some are fairly well parallelizable and it is this class of problem for which a Beowulf cluster makes sense.

Paul

Quote:
 Originally Posted by xilman Keep in mind that LL-testing, like ECM factorization and NFS sieving, is perfectly parallelizable. By that I mean the communications costs are completely negligible. A Beowulf cluster doesn't make so much economic sense for them because it contains an expensive network interconnect which could be replaced by something much much slower and cheaper. There was a time when I used sneakernet (i.e. carrying floppies around) for the interconnect on a parallel NFS sieving "cluster" (in reality, 5 machines which were not connected to the lab network). Worked perfectly well because the required bandwidth was a few megabytes per day. Very few problems in real life are perfectly parallelizable. Some are fairly well parallelizable and it is this class of problem for which a Beowulf cluster makes sense. Paul
Communication costs if each core is running its own number are negligible. I recognize that. I'm just saying that this cluster, if each core ran its own number, is no better than four separate machines, if each core ran its own number. I'm also saying that if you wanted to have all cores of all CPUs run one number together, it's losing a lot.
While LL tests are very "parallelizable" in the fact that one machine need not know anything about what another machine is doing, if they're each doing different numbers. LL tests aren't very parallelizable, at least not without very fast communication, in the fact that four machines stuck together (as this cluster effectively is) will not perform well when working together on one number.

 2007-09-03, 22:04 #8 lpmurray     Sep 2002 89 Posts I believe 200GFlops can be achieved by using 8 - Intel Xeon Quad-Core E5335 2GHz 1333MHz 771pin 8MB CPU's adding $2000 to the price tag (http://www.ewiz.com/detail.php?p=XEON5335BX&c=pw), and using 4 - Asus DSBF-DE Dual LGA771 Xeon MOTHERBOARDS Adding another$1000 (http://www.ewiz.com/detail.php?p=MB-DSBF-DE&c=pw), for a total of 200GFlops for $5500 which comes out to$27.50 per GFlop. By years end with Intels 45nm process you could see a 25-50% bump in speed with the same or less power requirements, which means we can see 250-300GFlops for around $5000. If nothing else when the new chips come out these processors should drop by about 50% which should shave off about$1300 and drop it down to about $21 per GFlop Last fiddled with by WraithX on 2016-02-15 at 05:05  2007-09-06, 08:25 #9 ixfd64 Bemusing Prompter "Danny" Dec 2002 California 2×19×61 Posts I've heard that the Xeon X5355 (less than$1,000) can achieve up to 60 GFLOP/s.

