![]() |
![]() |
#1 | |
Dec 2014
3778 Posts |
![]()
The ASRock EPC612D4I is a Mini ITX with LGA 2011 socket so it has 4 memory channels.
(Everyone knows prime95 loves memory channels.) I initially thought I would use an i7-5820K (6 core / 12 thread) but decided against it. The approved memory for the EPC uses ECC, the i7 does not support ECC, and the i7 is not on the approved CPU list. So I used a Xeon E5-1620 V3 (3.5 GHz). (The E5-1xxx do not support dual CPU so can be clocked higher.) I think 6 core and 4 memory channels is a good fit, but the 6 core Xeon was a budget buster. Then I used a memory part from the approved list Kingston KVR21SE15D8/8HA DDR4 2133 ECC. (SODIMM) Here are the benchmark results Quote:
with the H110 and i5-6500 combinations. The C612 chipset and Xeon combination gets about the same performance, so the extra price is not worth it. I was also wondering if SODIMM memory had the same bandwidth as regular memory. It seems about the same. The next board I am looking at is the ASUS H110T/CSM. It costs a little more but does not need the Pico PSU. (Some of these boards need 19V power, but this one says 12V or 19V.) Also being low profile means better air flow in a server case and they can be rotated 90 degree from my other boards. |
|
![]() |
![]() |
![]() |
#2 |
Feb 2016
UK
1B816 Posts |
![]()
In your situation you have a moderate amount of CPU power with a ton of memory bandwidth. So overall, your CPU is running pretty close to its max potential, but that potential isn't as high as others.
I recently got what was sold as an E5-2683v3. No, I wouldn't get one new but there are some really cheap ones on ebay, cheaper than a 6600k. Genuine Intel(R) CPU @ 2.00GHz CPU speed: 1861.88 MHz, 14 cores CPU features: Prefetch, SSE, SSE2, SSE4, AVX, AVX2, FMA L1 cache size: 32 KB L2 cache size: 256 KB, L3 cache size: 35 MB L1 cache line size: 64 bytes L2 cache line size: 64 bytes TLBS: 64 Prime95 64-bit version 28.7, RdtscTiming=1 Timings for 4096K FFT length (1 cpu, 1 worker): 26.05 ms. Throughput: 38.38 iter/sec. Timings for 4096K FFT length (2 cpus, 2 workers): 26.99, 26.94 ms. Throughput: 74.18 iter/sec. Timings for 4096K FFT length (3 cpus, 3 workers): 29.02, 28.05, 28.09 ms. Throughput: 105.71 iter/sec. Timings for 4096K FFT length (4 cpus, 4 workers): 29.58, 29.41, 29.21, 29.18 ms. Throughput: 136.32 iter/sec. Timings for 4096K FFT length (5 cpus, 5 workers): 30.57, 30.17, 30.04, 30.05, 29.81 ms. Throughput: 165.96 iter/sec. Timings for 4096K FFT length (6 cpus, 6 workers): 31.30, 31.84, 31.02, 30.95, 30.77, 30.76 ms. Throughput: 192.92 iter/sec. Timings for 4096K FFT length (7 cpus, 7 workers): 32.53, 32.44, 32.44, 32.44, 32.24, 32.34, 32.29 ms. Throughput: 216.12 iter/sec. Timings for 4096K FFT length (8 cpus, 8 workers): 34.91, 33.97, 33.91, 33.89, 33.75, 33.84, 33.86, 33.57 ms. Throughput: 235.57 iter/sec. Timings for 4096K FFT length (9 cpus, 9 workers): 36.00, 35.81, 35.78, 35.73, 35.71, 35.70, 35.76, 35.64, 35.53 ms. Throughput: 251.82 iter/sec. Timings for 4096K FFT length (10 cpus, 10 workers): 38.31, 37.83, 37.87, 37.76, 37.66, 37.77, 37.77, 37.75, 37.66, 37.51 ms. Throughput: 264.64 iter/sec. Timings for 4096K FFT length (11 cpus, 11 workers): 41.20, 40.45, 40.54, 40.43, 40.28, 40.40, 40.54, 40.40, 40.41, 40.20, 40.32 ms. Throughput: 271.82 iter/sec. Timings for 4096K FFT length (12 cpus, 12 workers): 43.45, 43.37, 43.20, 43.15, 42.98, 43.17, 43.25, 43.07, 43.12, 42.96, 43.32, 42.88 ms. Throughput: 278.03 iter/sec. Timings for 4096K FFT length (13 cpus, 13 workers): 47.06, 46.63, 46.46, 46.44, 46.11, 46.20, 46.29, 46.31, 46.12, 46.05, 46.38, 46.48, 46.00 ms. Throughput: 280.49 iter/sec. Timings for 4096K FFT length (14 cpus, 14 workers): 50.55, 50.42, 50.01, 51.65, 49.51, 49.57, 49.87, 49.93, 49.65, 49.37, 49.67, 49.74, 49.73, 50.00 ms. Throughput: 280.16 iter/sec. Above is a quick partial copy and paste from an earlier run. Might be more interesting to try with fewer workers... All cores were capped to 2.3 GHz, HT disabled. I'm running non-ECC, single rank, quad channel ram at 2133 (no ram OC possible). I've heard elsewhere that ECC may come at some performance impact but I don't have data either way on that. Even if so it would not significantly affect things here. I'm not aware of any differences between SODIMM and regular sized ones of similar specifications other than the physical aspects obviously. I think if anyone is considering building a farm around prime finding activities, the cheap but higher end v3 Xeons on ebay could be a consideration. Specifically aim for the v3 models as FMA instruction helps a lot over the even cheaper E5-2670 for example. There were also some cheaper 12 core models too. They're all low clock, but the number of cores makes up for it, and for multi-threaded tasks they offer nice throughput. If these had been around at the time, I would have skipped on the 6600k and 6700k boxes I did earlier as the Xeons would give ball park 50% more performance for a similar overall cost. |
![]() |
![]() |
![]() |
#3 |
Jun 2003
2·2,687 Posts |
![]()
With 35MB L3 cache, a single 4M FFT would fit in completely, so a 1w14t might give even higher thruput
|
![]() |
![]() |
![]() |
#4 | |
Serpentine Vermin Jar
Jul 2014
22×72×17 Posts |
![]() Quote:
Stick with 1 or 2 workers... I use 1 worker per CPU using all the cores and that works well for me. Do the benchmark again and enable the option to have it do multiple cores per worker and you should see the throughput is going to be optimized with 1 worker, 14 threads. I'm not really sure why the benchmark values for "14 CPUs/14 workers" is so far removed from what I see in a real world test, where memory contention between all those workers drags things to a crawl, but that's definitely been my experience. |
|
![]() |
![]() |
![]() |
#5 |
Aug 2002
2·3·29 Posts |
![]()
Please post the 4096K FFT benchmark with 1 worker/4(1620) or 14(2683) threads results.
Use the following in prime.txt to cut to the chase Code:
MinBenchFFT=4096 MaxBenchFFT=4096 BenchTime=30 BenchMultithreads=1 BenchHyperthreads=0 OnlyBenchThroughput=1 OnlyBenchMaxCPUs=1 OnlyBench5678=0 BenchAllComplex=0 Last fiddled with by xtreme2k on 2016-06-14 at 12:44 |
![]() |
![]() |
![]() |
#6 | |
Dec 2014
25510 Posts |
![]()
This is the output with those lines added to prime.txt
Quote:
|
|
![]() |
![]() |
![]() |
#7 |
Aug 2002
2568 Posts |
![]()
My understanding is 188 it/s is a good speed for a Haswell-E quad core.
The board is amazing now looking at some reviews. I must admit DDR4 ECC SODIMM is not an easy find ![]() Plenty of upgrade opportunities for the S2011-3 platform as well. If you can source some cheap 2683v3/2675v3/2673v3 or v4s sky is the limit! |
![]() |
![]() |
![]() |
#8 |
Feb 2016
UK
1B816 Posts |
![]()
Just had a chance to do the testing also.
Code:
[Sat Jun 18 10:46:08 2016] Compare your results to other computers at http://www.mersenne.org/report_benchmarks Genuine Intel(R) CPU @ 2.00GHz CPU speed: 1861.06 MHz, 14 cores CPU features: Prefetch, SSE, SSE2, SSE4, AVX, AVX2, FMA L1 cache size: 32 KB L2 cache size: 256 KB, L3 cache size: 35 MB L1 cache line size: 64 bytes L2 cache line size: 64 bytes TLBS: 64 Prime95 64-bit version 28.9, RdtscTiming=1 Timings for 4096K FFT length (14 cpus, 1 worker): 3.06 ms. Throughput: 327.04 iter/sec. Timings for 4096K FFT length (14 cpus, 2 workers): 7.05, 6.92 ms. Throughput: 286.37 iter/sec. Timings for 4096K FFT length (14 cpus, 7 workers): 24.36, 24.41, 24.23, 24.29, 24.26, 24.28, 24.32 ms. Throughput: 288.00 iter/sec. Timings for 4096K FFT length (14 cpus, 14 workers): 49.28, 49.22, 49.10, 49.18, 48.86, 48.91, 49.32, 48.79, 49.16, 48.94, 49.06, 48.97, 48.87, 48.79 ms. Throughput: 285.53 iter/sec. Code:
Timings for 4096K FFT length (14 cpus, 1 worker): 5.97 ms. Throughput: 167.46 iter/sec. Timings for 4096K FFT length (14 cpus, 2 workers): 6.39, 6.40 ms. Throughput: 312.71 iter/sec. Timings for 4096K FFT length (14 cpus, 7 workers): 24.80, 24.84, 24.81, 24.78, 24.77, 24.70, 24.69 ms. Throughput: 282.58 iter/sec. Timings for 4096K FFT length (14 cpus, 14 workers): 49.90, 49.91, 49.96, 50.04, 49.69, 49.78, 50.25, 49.76, 49.97, 49.72, 49.92, 49.85, 49.78, 49.67 ms. Throughput: 280.72 iter/sec. |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
2011's POTY | Xyzzy | Lounge | 38 | 2012-06-19 12:36 |
Largest k*2^n-1 Primes in 2011 | Kosmaj | Riesel Prime Search | 0 | 2012-01-01 16:52 |
End of the world May 21st, 2011? | jasong | Lounge | 67 | 2011-05-30 04:15 |
Plans and goals for 2011 | mdettweiler | No Prime Left Behind | 3 | 2010-11-05 18:55 |
How do I get British news channels over the web? | jasong | jasong | 9 | 2007-09-26 11:21 |