Go Back > Great Internet Mersenne Prime Search > Software

Thread Tools
Old 2020-11-20, 18:37   #1
Aug 2020

7216 Posts
Default mprime: (slightly) worse performance when utilizing benchmark results

I ran some benchmark on Google colab, hoping that it will improve throughput by helping to choose the best fft implementation. It turns out the one performs the best in benchmark isn't the best one in actual combat. All of these happen on the same runtime (i.e., the same CPU).

without the benchmark files, clm=1 is chosen.
[Work thread Nov 20 18:12] Using FMA3 FFT length 3M, Pass1=768, Pass2=4K, clm=1, 2 threads
[Work thread Nov 20 18:12] M56051509 stage 1 is 49.99% complete.
[Work thread Nov 20 18:15] M56051509 stage 1 is 50.32% complete. Time: 156.665 sec.
[Work thread Nov 20 18:17] M56051509 stage 1 is 50.66% complete. Time: 156.890 sec.
[Work thread Nov 20 18:20] M56051509 stage 1 is 50.99% complete. Time: 156.673 sec.
[Work thread Nov 20 18:22] M56051509 stage 1 is 51.33% complete. Time: 156.623 sec.
[Work thread Nov 20 18:25] M56051509 stage 1 is 51.66% complete. Time: 156.503 sec.
with the benchmark files, clm=2 is chosen.
[Work thread Nov 20 17:58] Using FMA3 FFT length 3M, Pass1=768, Pass2=4K, clm=2, 2 threads
[Work thread Nov 20 17:58] M56051509 stage 1 is 48.27% complete.
[Work thread Nov 20 18:00] M56051509 stage 1 is 48.60% complete. Time: 158.061 sec.
[Work thread Nov 20 18:03] M56051509 stage 1 is 48.93% complete. Time: 157.612 sec.
[Work thread Nov 20 18:06] M56051509 stage 1 is 49.27% complete. Time: 157.592 sec.
[Work thread Nov 20 18:08] M56051509 stage 1 is 49.60% complete. Time: 157.035 sec.
[Work thread Nov 20 18:11] M56051509 stage 1 is 49.93% complete. Time: 157.060 sec.
relevant lines in results.bench.txt
FFTlen=3072K, Type=3, Arch=4, Pass1=768, Pass2=4096, clm=2 (1 core, 1 worker): 18.51 ms. Throughput: 54.03 iter/sec.
FFTlen=3072K, Type=3, Arch=4, Pass1=768, Pass2=4096, clm=2 (1 core hyperthreaded, 1 worker): 16.71 ms. Throughput: 59.85 iter/sec.
FFTlen=3072K, Type=3, Arch=4, Pass1=768, Pass2=4096, clm=1 (1 core, 1 worker): 19.32 ms. Throughput: 51.76 iter/sec.
FFTlen=3072K, Type=3, Arch=4, Pass1=768, Pass2=4096, clm=1 (1 core hyperthreaded, 1 worker): 16.79 ms. Throughput: 59.55 iter/sec.
Attached Files
File Type: txt gwnum.txt (39.5 KB, 32 views)
File Type: txt results.bench.txt (100.0 KB, 33 views)
Ensigm is offline   Reply With Quote
Old 2020-11-20, 18:44   #2
Aug 2020

2·3·19 Posts

Information about the cpu (result of !lscpu)
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  2
Core(s) per socket:  1
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               79
Model name:          Intel(R) Xeon(R) CPU @ 2.20GHz
Stepping:            0
CPU MHz:             2200.000
BogoMIPS:            4400.00
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            56320K
NUMA node0 CPU(s):   0,1
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat md_clear arch_capabilities
Ensigm is offline   Reply With Quote
Old 2020-11-21, 14:30   #3
Aug 2020

2×3×19 Posts

The benchmark result seems to be unreproducible and may be due to just "jittering". Another benchmark on the same model (not the same CPU though) shows Pass1=768, Pass2=4096, clm=1 as better than Pass1=768, Pass2=4096, clm=2, which is consistent with real work performance.
Ensigm is offline   Reply With Quote

Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
“Odd” P95 memory benchmark results? 4EvrYng Hardware 11 2020-07-01 00:40
Prime95 benchmark results in GHz-days/day? mnd9 Information & Answers 0 2019-09-24 19:46
mprime benchmark tests backwards? PerformanceTest Software 4 2017-03-01 14:15
Strange benchmark results AlTonno15 Information & Answers 3 2013-01-29 02:23
Benchmark using linux mprime client? nngs Software 2 2005-03-08 19:01

All times are UTC. The time now is 08:15.

Sat Mar 6 08:15:56 UTC 2021 up 93 days, 4:27, 0 users, load averages: 1.16, 1.24, 1.22

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.