20120103, 13:33  #1 
Mar 2003
Melbourne
515_{10} Posts 
AVX CPU LL vs CUDA LL
Which one comes out on top?
I have a feeling using 27.x prime95 code with AVX on a 6core i7 3930k is beating CUDA LL on a GTX580 in these metrics: 1) GHzdays per initial cost* 2) GHzdays per ongoing costs** 3) Raw GHzdays/day throughput Of course a GTX580 still beats a single AVX 3930k core for latency. Anyone have authoritive stats between the 2? Bonus points if you can prove either way effective LL throughput of doing TF only on the GPUs/CPUs with mfaktc vs LL on AVX 6core CPU.  Craig *I'm thinking initial CPU costs include only cpu, cooler, mboard, ram. Initial GPU costs are cost of the GPU. **I'm thinking ongoings is equivalent to power costs. Total power of CPU setup is 'wall' power measurement of 6cores at 100% LL each core. One can get GPU power consumption from any number of sites incl wiki pages. 
20120103, 13:40  #2 
Mar 2003
Melbourne
515_{10} Posts 
I guess what I'm getting at, hypothetically if I were to hand over a credit card* for you to buy equipment and to pay the power bill for the equipment, what would you buy to maximize it's contribution to the project to make the most of the funds you had access too.
Or to phrase it in a more down to earth way  given what equipment I have at my disposal  what do I do to maximize my contribution. (BTW I've removed the fx8120 from my farm, and replaced it with a core i73930k@4.2GHz, it's _much_ _much_ better than fx8120.)  Craig *Don't get any ideas  I'm not going to hand over any of my credit cards :) 
20120103, 14:12  #3 
Oct 2011
Maryland
2×5×29 Posts 
I would be curious to see how the 3930k stacks up with a 2500k on your mentioned metrics. Between the expensive chip and the expensive board and the much higher base wattage (lower OC headroom), I would imagine that the 2500k is still probably the way to go.
I imagine that you are correct in thinking that processor LL is superior to GPU LL, if you are only interested in doing LL. But I would be interested in seeing statistics too. 
20120103, 21:34  #4 
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 89<O<88
3·29·83 Posts 
Is the fx still doing anything at all?
(Also I have no wall measurement device, so unfortunately I can't contribute (2600).) Last fiddled with by Dubslow on 20120103 at 21:34 
20120103, 21:51  #5 
Dec 2009
Peine, Germany
513_{8} Posts 
breakeven point
Here we have a GTX 580 running a 54M exponent at 8.6ms per iteration.
3930K needs 130W, GTX 580 315W, ratio = 2.4 3930K does 6 jobs on 6 cores, GTX 580 1 job 8.6 ms * 2.4 * 6 = 125ms @54M exponent This implies that the computing power / energy breakeven point is at 125ms, so a single 3930K LL test would have 125ms time for an iteration to be "as powerful as the GPU"... Last fiddled with by Brain on 20120103 at 21:56 Reason: +energy 
20120103, 23:39  #6 
Dec 2011
2×3^{2} Posts 
It also depends on how fast you clock the 3930k. Tom's Hardware's test of the chip last month gives us a power draw of around 165W for the processor at 4.2GHz.
http://www.tomshardware.com/reviews/...rk,30902.html James has iteration times here for a 3930 at 4.9GHz. This would draw over 180W according to the above chart http://mersennearies.sili.net/bench...z&l2=&orderby= A report of 9.3ms/iteration for a 54M exponent on a GTX 580 is here. No mention of what clock speed the card is operating at. http://www.mersenneforum.org/showpos...7&postcount=61 A 54M exponent uses around a what, 2900k FFT size? A similar exponent on the benchmarked 3930k overclocked to 4.9GHZ would take a little under 23 ms per iteration. If the 3930k can keep that up across all 6 cores, it would have roughly 2.4 times the throughput of a GTX580. Iteration times on the 3930k would have to increase to more than 55ms before the two reached parity. 
20120103, 23:58  #7 
Dec 2011
2·3^{2} Posts 
Making some assumptions about TF rate for a GTX 580...
1 hour to TF from 71 to 72, probability of finding a factor at this depth is 1/72 = an average of 72 hours of computation to clear one 54M exponent for the GTX. At this time, TF on the GTX clears clears exponents more than 3x faster than running LL tests on the GTX. (9.3 ms/iteration x 54M iterations x 2 = 279 hours) For the 3930k, 23ms per iteration x 54M iterations = 1242 x 10^6 ms = 345 hours per core, divided by 6 cores = 57.5 hours x 2 LL checks = an average throughput of one exponent cleared every 115 hours. 
20120104, 00:42  #8 
Dec 2011
2×3^{2} Posts 
One more thing to consider when calculating power for your graphics card is that when it performs trial factoring work, it requires not only the dedicated card power supply, but also a portion of the power needed to operate the CPU as well. A card performing LL work needs only minimal CPU resources.
3930k @4.9GHZ: 180W x 115 hours = 20.7kWh to clear an exponent GTX 580 Power consumption: 315W + 100W(2 cores of a 2500k @4.6GHz drawing ~200W) = 415W x 72 hours = ~29.9kWh to clear an exponent Some very rough approximations of system costs: From the Tom's Hardware article, $880 for the 3930k system plus maybe 120 for a decent power supply would make $1000 total. It would probably be most economical to run two 580s on a 2500k system. It would be around $500 for the processor, motherboard, and memory, around $200 for a 900+W power supply, and the cards themselves are starting to dip into the $450 range, giving an approximate cost of $1600. A system with three 580s running off a 3930k would cost nearly $2500. The GPU system costs 1.6 times as much, but is 3.2 times faster. However, in the process, it will end up drawing 4.6 times the power. So the 3930k system is more power efficient, but slower. 
20120104, 01:07  #9 
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 89<O<88
3×29×83 Posts 
Hmm. Interesting. In another page of that TH article,
http://www.tomshardware.com/reviews/...rk,30906.html shows the 8 core zambezi pulling equal with the Intel parts, often above the 2600. Same as the next page too. Perhaps a reevaluation is in order? Did TH get some new drivers or something? Heh, maybe not. That's only in multithreaded benchmarks. In the single threaded stuff, Zambezi still falls below the Deneb. Last fiddled with by Dubslow on 20120104 at 01:14 
20120104, 06:36  #10  
Dec 2009
Peine, Germany
331_{10} Posts 
Quote:
Every CPU LL result costs me 5€. Better than smoking... 

20120104, 16:57  #11 
"Carlos Pinho"
Oct 2011
Milton Keynes, UK
3^{3}·181 Posts 
It's like specific energy (energy per unit mass). Here it is energy per work LL done. It is this ratio that should be compared between CPU and GPU, not only energy.
Last fiddled with by pinhodecarlos on 20120104 at 16:58 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
CUDA 5.5  ET_  GPU Computing  2  20130613 15:50 
Best CUDA GPU for the $$  Christenson  GPU Computing  24  20110501 00:06 
Cuda on GEForce 210?  Christenson  GPU Computing  8  20110322 02:33 
CUDA P1?  nucleon  GPU Computing  2  20101117 17:52 
CUDA?  Xentar  Conjectures 'R Us  6  20100331 07:43 