![]() |
![]() |
#1 |
"Mihai Preda"
Apr 2015
2×11×61 Posts |
![]()
PrimeNet measures the amount of compute work done in GHz-days (e.g. https://www.mersenne.org/report_top_500/ )
Dividing GHz-Days by "days" to get compute throughput (compute per unit of time), would produce "GHz" as a unit of compute power. E.g. a computer that does 100 GHz-days in 24h, has a compute power of 100GHz. Another common unit of "compute power" is GFLOPS, e.g. used in https://en.wikipedia.org/wiki/TOP500 https://en.wikipedia.org/wiki/FLOPS What is the correspondence between the two units of "compute power", "PrimeNet GHz" and GFLOPS? E.g. for the example computer that does 100GHz-Days per day, how many GFLOPS would that be? |
![]() |
![]() |
![]() |
#2 | |
"Forget I exist"
Jul 2009
Dumbassville
26·131 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#3 |
Aug 2002
North San Diego County
2·11·31 Posts |
![]()
IIRC, for Prime95 purposes, 1 GHz/Day = the work done in 24 hrs by a sole 1 Ghz Core2 processor core.
Edit2: Code:
// In Primenet v4 we used a 90 MHz Pentium CPU as the benchmark machine // for calculating CPU credit. The official unit of measure became the // P-90 CPU year. In 2007, not many people own a plain Pentium CPU, so we // adopted a new benchmark machine - a single core of a 2.4 GHz Core 2 Duo. // Our official unit of measure became the C2GHD (Core 2 GHz Day). That is, // the amount of work produced by the single core of a hypothetical // 1 GHz Core 2 Duo machine. A 2.4 GHz should be able to produce 4.8 C2GHD // per day. // // To compare P-90 CPU years to C2GHDs, we need to factor in both the // the raw speed improvements of modern chips and the architectural // improvements of modern chips. Examining prime95 version 24.14 benchmarks // for 640K to 2048K FFTs from a P100, PII-400, P4-2000, and a C2D-2400 // and compensating for speed differences, we get the following architectural // multipliers: // // One core of a C2D = 1.68 P4. // A P4 = 3.44 PIIs // A PII = 1.12 Pentium // // Thus, a P-90 CPU year = 365 days * 1 C2GHD * // (90MHz / 1000MHz) / 1.68 / 3.44 / 1.12 // = 5.075 C2GHDs Last fiddled with by sdbardwick on 2017-12-01 at 00:23 |
![]() |
![]() |
![]() |
#4 |
"Forget I exist"
Jul 2009
Dumbassville
26×131 Posts |
![]()
I may be wrong, but I believe Ghz is a measure of clock frequency, flops is a measure of number of a specific type of operation. some operations can be done twice a clock cycle, some more so there's not really a good conversion between the two, hence why the type of operation matters.
Last fiddled with by science_man_88 on 2017-12-01 at 00:42 |
![]() |
![]() |
![]() |
#5 |
Einyen
Dec 2003
Denmark
BD716 Posts |
![]()
Long ago I calculated 1 Ghz-days to be ~ 171 TeraFLOP (Not FLOPS)
Post #7 and #25 in this old thread: http://www.mersenneforum.org/showthread.php?t=10235 Now we can calculate it more easily from here: https://www.mersenne.org/primenet/ Under "Aggregate Computing Power" - "Today, last 24 hours" we can see that: 123137 Ghz-Days = 246.275 TFLOP/sec * 86400 sec = 21278160 TFLOP => 1 Ghz-Days = 21278160 TFLOP / 123137 = 172.8 TFLOP We can test "last 7 days" and "last 30 days" to see if we get the same results: 994150 Ghz-Days = 284.043 TFLOP/s * 86400s/day * 7 days => 1 Ghz-Days = 172.8 TFLOP 4395208 Ghz-Days = 293.014 TFLOP/s * 86400s/day * 30 days => 1 Ghz-Days = 172.8 TFLOP |
![]() |
![]() |
![]() |
#6 | |
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
3·29·83 Posts |
![]() Quote:
![]() So a 100 EGHz CPU (?? GPU?) is doing 200 GFLOPS according to the PrimeNet definition of EGHz and FLOP. I'm not sure how these line up to an actual Core 2 or to the "standard" definitions of FLOP. Last fiddled with by Dubslow on 2017-12-01 at 01:25 |
|
![]() |
![]() |
![]() |
#7 | |
"Mihai Preda"
Apr 2015
2×11×61 Posts |
![]() Quote:
Now, an LL / PRP test at 76M exponent is evaluated to about 216 GHz-Days, which comes to 37.3 PFLOP (i.e. 216 * 24 * 60 * 60 * 2 GFLOP). I would be curious to make a comparison with the actual number of FP64 operations done by the FFTs at that size. |
|
![]() |
![]() |
![]() |
#8 |
"Victor de Hollander"
Aug 2011
the Netherlands
22308 Posts |
![]()
A Core2Duo core has a SSE registers of 128bits for SIMD . It can 'split' the SIMDregister to do 2x FP (DoublePrecision, FP64) adds and execute them at the same time in parallel. In addition it can also do 2x FP64 multiplications (mul) at the same time. So in theory a C2D core could do 4 DP FLOPs (2adds+2mul) per clockcycle.
However, in practice getting half that is already extremely difficult, even though Prime95 is very efficient. You hit latencies and memory bandwidth issues pretty fast. AVX introduced bigger 256bit SIMD registers (Sandy, Ivy Bridge). So theoretically can do 8 DP FLOPs per clockcyle. AVX2 extends the instruction set introduced with AVX and introduces FMA3 (Haswell, Broadwell, Skylake-S, Kabylake-S). Now it increases to 16 DP FLOPs per cycle if using FMA. AVX512 extends the SIMD register once again, to 512 bits (and a multitude of subsets, which differ between the Knights Landing co-processor and the normal CPUs). 32 DP FLOPs per cycle. |
![]() |
![]() |
![]() |
#9 |
"Victor de Hollander"
Aug 2011
the Netherlands
23·3·72 Posts |
![]()
Intel Core i5 2500k @4GHz
DDR3-2133 (Dual Channel) Theoretical: 8 DP/clock * 4 GHz = 32 GFLOP/s per core 32 GFLOPS * 4 cores = 128 GFLOP/s total LINPACK with Intel pre-compiled binaries (read: ultra optimised for highest throughput/ benchmark results). 1 core: Code:
Performance Summary (GFlops) Size LDA Align. Average Maximal 1000 1000 4 23.9988 24.2828 2000 2000 4 25.9523 26.4666 5000 5008 4 27.9580 28.1283 10000 10000 4 28.9317 28.9918 15000 15000 4 29.4065 29.4075 18000 18008 4 29.4349 29.4349 20000 20016 4 29.3201 29.3201 22000 22008 4 29.4509 29.4509 25000 25000 4 29.5640 29.5640 26000 26000 4 29.6370 29.6370 27000 27000 4 29.4683 29.4683 30000 30000 1 29.6019 29.6019 35000 35000 1 29.6650 29.6650 40000 40000 1 29.5893 29.5893 Residual checks PASSED End of tests Code:
Performance Summary (GFlops) Size LDA Align. Average Maximal 1000 1000 4 71.0218 73.5734 2000 2000 4 84.0847 85.3924 5000 5008 4 88.9042 93.5976 10000 10000 4 100.6101 100.8921 15000 15000 4 99.3512 104.1565 18000 18008 4 106.5735 106.9429 20000 20016 4 106.7802 107.0437 22000 22008 4 112.7814 112.8765 25000 25000 4 112.9538 112.9650 26000 26000 4 113.1638 113.2873 27000 27000 4 113.0612 113.0612 30000 30000 1 113.7686 113.7686 35000 35000 1 112.7552 112.7552 40000 40000 1 112.7320 112.7320 Residual checks PASSED End of tests Prime95 benchmark for this machine (somebody can probably convert these to GHzdays) Code:
Prime95 64-bit version 29.3, RdtscTiming=1 Timings for 1024K FFT length (4 cores, 4 workers): 6.01, 6.19, 5.98, 6.10 ms. Throughput: 659.13 iter/sec. Timings for 2048K FFT length (4 cores, 4 workers): 12.72, 13.06, 12.67, 12.91 ms. Throughput: 311.53 iter/sec. Timings for 4096K FFT length (4 cores, 4 workers): 27.12, 27.71, 26.93, 27.32 ms. Throughput: 146.71 iter/sec. Timings for 6144K FFT length (4 cores, 4 workers): 40.92, 41.86, 40.62, 41.34 ms. Throughput: 97.13 iter/sec. Timings for 8192K FFT length (4 cores, 4 workers): 57.47, 59.16, 56.94, 58.09 ms. Throughput: 69.08 iter/sec. |
![]() |
![]() |
![]() |
#10 |
Romulan Interpreter
Jun 2011
Thailand
5×17×109 Posts |
![]() |
![]() |
![]() |
![]() |
#11 |
"Vasiliy"
Apr 2017
Ukraine
22·3·5 Posts |
![]()
From post #9
112349 seconds to test exponent M35000011 4 cores 43.75 Ghz-day=12 PFLOP So 1 Ghz-day=0.27 PFLOP=274 TFLOP I made measurements on my AMD 760208 seconds to test exponent M35000011 1 core 43.75 Ghz-day=16.7 PFLOP So 1 GHz-day=0.38 PFLOP=372 TFLOP Maybe my calculations wrong |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
"Nehalem" quad-cores faster than 100 GFLOPS? | ixfd64 | Hardware | 11 | 2009-03-09 18:17 |
v5 Server Conversion | compusion | Software | 3 | 2008-11-14 19:22 |
Units Conversion Puzzle | JHagerson | Lounge | 19 | 2005-11-24 05:38 |
conversion to GF(2) | bigbud | Math | 9 | 2005-04-16 01:13 |
Date conversion with Python | leifbk | Programming | 2 | 2005-01-26 23:00 |