mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2012-08-23, 18:43   #1
debrouxl
 
debrouxl's Avatar
 
Sep 2009

3D116 Posts
Default ARM-based servers...

I've stumbled across the Baserock Slab ( http://www.baserock.com/servers/specifications ) and MiTAC GFX series ( http://www.mitac.com/Business/GFX_servers.html ), two ARM-based servers announced in the past few weeks.

* the Baserock Slab is 8 x (quad-core ARMv7-A @ 1.33 GHz + 2 GB ECC DDR3 + 30-120 GB SSD) + 2 x 10 Gbps SFP+ Ethernet + 4 x 1 Gbps "classical" Ethernet in a 1U rack of half depth. That's nothing to sneer at, especially with a 260W PSU.
* the MiTAC GFX is 64 quad-core ARMv7-A @ 1.6 GHz + 32 HDDs in 4U rack. Not sure about the amount of RAM, since the indicated 16 GB seems low for a 256-core system - perhaps it's 16 GB for each of the 8 "compute modules" ?

The performance per watt of ARM-based gear is clearly significantly higher than that of x86_64-based gear...
Future 32-bit and 64-bit ARM cores will improve, but so will x86_64 cores, so the ratio might not change that much.


How would people around here estimate the crunching abilities of those platforms ?
High-end GPUs are probably too far above x86_64 CPUs at TF on Mersenne numbers for these ARM servers to dethrone them; but I think that servers like the Baserock Slab could prove good NFS machines, if memory bandwidth approaches that of x86_64 machines (and that might be a big "if"):
* with 512 MB of RAM per core, and 1 GB already announced for the next few months (maybe they'll raise the amount of RAM per core further later, I don't know), 15e wouldn't be a problem;
* 5 Gbps internal + 2x10 Gbps external network interconnect could prove attractive for MPI post-processing.
debrouxl is offline   Reply With Quote
Old 2012-08-23, 19:01   #2
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

142628 Posts
Default

For crunching heavy-duty FP, I am not convinced that currently-available ARM processors are flops-per-watt competitive with Ivy Bridge.

http://fullshovel.wordpress.com/2012...a-vs-c-on-arm/ runs scimark; yes, I appreciate this is a series of toy-sized benchmarks, but the Pandaboard has a pretty awful memory controller and so I'd expect it to do relatively better on things running out of cache. On one of the two cores on a Pandaboard ES, the matrix-multiply does 150MIPS; on one of the four on a Sandy Bridge it does 1770MIPS. A pandaboard running flat-out uses about six watts; I think one active core on an SNB can get by with less than sixty. The test with the best ratio gets 240MIPS on 1xARM and 1150 on 1xSNB.

http://www.phoronix.com/scan.php?pag..._cluster&num=4 does something similar; running an embarrassingly parallel benchmark over 12 cores on six pandaboards, he gets 53 Mops at 30.4 watts. In http://www.phoronix.com/scan.php?pag...cluster&num=11 he runs a slightly different benchmark on four threads of one i7/3770K and gets 277 Mops at 107 watts.

ARM's selling point if you're not fully loading the machines is irrefutable. But if you are, a single i7/3770K - which will run happily from a 260W PSU even if you put a dual-port 10GbE PCIe card in it - offers performance comparable to the whole baserock slab.

And the ARM server machines (the other one you might want to stumble across is http://www.boston.co.uk/solutions/viridis/default.aspx ) are at present boutique items designed to give software developers a time-to-market advantage, and so are really a lot more expensive than straight IVB boxes; the Boston Viridis FAQ gives an implied price of $3000 for a single card with four quad-core ARMs on it (IE comparable performance to one dual-core IVB), though I'll admit that that system has an exciting between-cards interconnect for which you'd have to pay five hundred dollars for an Infiniband QDR HCA and another $500-per-port for the switch.

I have just spent £103.28 buying myself an Odroid-X (Exynos 4412 so quad 1.4GHz Cortex-A9, 1G memory, though only 100Mbps ethernet - effectively a Galaxy S3 without the display) from http://www.hardkernel.com/renewal_20...=G133999328931 to see if I can get gnfs-lasieve4I15e running. This will inevitably cause an Exynos 5250 devboard to be released before my Odroid-X turns up from Gyeonggi Korea: consider this a public service

To get more than 4GB total memory you will have to wait for Cortex-A15-based chips (eg the Exynos 5250, OMAP 543x, Tegra 4) because the memory controller for the A9 only has 32-bit physical addresses. 4GB on a package-on-package (the cellphone chips, and therefore the cheap devboards) is unlikely to show up before 2013.

Last fiddled with by fivemack on 2012-08-23 at 19:19
fivemack is offline   Reply With Quote
Old 2012-08-23, 19:45   #3
debrouxl
 
debrouxl's Avatar
 
Sep 2009

977 Posts
Default

Thanks for your input

Quote:
But if you are, a single i7/3770K - which will run happily from a 260W PSU even if you put a dual-port 10GbE PCIe card in it - offers performance comparable to the whole baserock slab.
ACK.

Quote:
though I'll admit that that system has an exciting between-cards interconnect for which you'd have to pay five hundred dollars for an Infiniband QDR HCA and another $500-per-port for the switch.
That's pretty expensive indeed... maybe, in the mid-term, they'll have no choice but lowering their price tags, due to non-IB interconnects such as the one in the Boston Viridis ?

Quote:
I have just spent £103.28 buying myself an Odroid-X (Exynos 4412 so quad 1.4GHz Cortex-A9, 1G memory, though only 100Mbps ethernet - effectively a Galaxy S3 without the display) ... to see if I can get gnfs-lasieve4I15e running.
Good
I might get my hands on one such system in the next few months as well.

Quote:
To get more than 4GB total memory you will have to wait for Cortex-A15-based chips
Yup, in fact I knew that but I failed to mention it explicitly in the "later".
Cortex-A15 chips will do large RAM support for the 32-bit ARM architecture, and then 64-bit ARM chips (probably not before 2014, sadly) won't have that 4 GB limit.
debrouxl is offline   Reply With Quote
Old 2012-08-24, 14:04   #4
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

24×3×11 Posts
Default

Quote:
Originally Posted by fivemack View Post
http://www.phoronix.com/scan.php?pag..._cluster&num=4 does something similar; running an embarrassingly parallel benchmark over 12 cores on six pandaboards, he gets 53 Mops at 30.4 watts. In http://www.phoronix.com/scan.php?pag...cluster&num=11 he runs a slightly different benchmark on four threads of one i7/3770K and gets 277 Mops at 107 watts.
The problem with that setup is that the ARM cluster is made of 6 full boards. That setup is said to idle at 15-16W and observed peak power is 31W.

For the example you give, let's say idle is 15W, so that'd give about 16W of power consumption, for 55.2 Mop/s. So 3.45 Mop/s/W.

The Ivy Bridge system is idling at 41W and 107W on the benchmark. So 277.9 Mop/s for 66W. So 4.21 Mop/s/W.

Of course, this assumes that idling is really idling on both platforms

Anyway I think that for many FP intensive tasks IVB would be more power efficient. Perhaps with ARMv8 and proper FP SIMD support will things change.
ldesnogu is offline   Reply With Quote
Old 2012-08-24, 15:05   #5
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

2·29·109 Posts
Default

Quote:
Originally Posted by ldesnogu View Post
The problem with that setup is that the ARM cluster is made of 6 full boards. That setup is said to idle at 15-16W and observed peak power is 31W.
There are a couple of startups who've tried to attack the problem of the high power consumption of idling x86 systems - Seamicro's SM10000-XE Sandy Bridge machine piles up low-voltage Xeons, avoids duplicating motherboard peripherals, and 'reduces the power consumed by the CPU by consolidating and powering down unused functions', though its web page only gives an 'average power consumption' figure and that's 3.5kW for 64 quad-cores.

I've not got a good handle on the power consumption of DRAM, though I've heard disconcertingly high figures on the order of one watt per gigabyte at idle - that gives a slightly unfair advantage to the unfortunately memory-constrained ARM systems.

Quote:
Anyway I think that for many FP intensive tasks IVB would be more power efficient. Perhaps with ARMv8 and proper FP SIMD support will things change.
Yes, ARMv8 has double-precision SIMD, but it's roughly SSE2-level: operations on only 128 bits at a time.
fivemack is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
ECMnet Servers M0CZY GMP-ECM 12 2019-10-27 09:54
GB servers back mdettweiler No Prime Left Behind 0 2009-12-27 15:10
PRPNet servers down? opyrt Prime Sierpinski Project 13 2009-11-04 21:33
288GB RAM for servers ET_ Hardware 4 2008-08-25 02:23
Proxy Servers and 22.8 Prime95 Software 1 2002-09-07 19:01

All times are UTC. The time now is 12:13.

Fri Oct 23 12:13:22 UTC 2020 up 43 days, 9:24, 0 users, load averages: 1.16, 1.44, 1.40

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.