mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   mprime: (slightly) worse performance when utilizing benchmark results (https://www.mersenneforum.org/showthread.php?t=26211)

Ensigm 2020-11-20 18:37

mprime: (slightly) worse performance when utilizing benchmark results
 
2 Attachment(s)
I ran some benchmark on Google colab, hoping that it will improve throughput by helping to choose the best fft implementation. It turns out the one performs the best in benchmark isn't the best one in actual combat. All of these happen on the same runtime (i.e., the same CPU).

without the benchmark files, [C]clm=1[/C] is chosen.
[QUOTE][Work thread Nov 20 18:12] Using FMA3 FFT length 3M, Pass1=768, Pass2=4K, clm=1, 2 threads
[Work thread Nov 20 18:12] M56051509 stage 1 is 49.99% complete.
[Work thread Nov 20 18:15] M56051509 stage 1 is 50.32% complete. Time: 156.665 sec.
[Work thread Nov 20 18:17] M56051509 stage 1 is 50.66% complete. Time: 156.890 sec.
[Work thread Nov 20 18:20] M56051509 stage 1 is 50.99% complete. Time: 156.673 sec.
[Work thread Nov 20 18:22] M56051509 stage 1 is 51.33% complete. Time: 156.623 sec.
[Work thread Nov 20 18:25] M56051509 stage 1 is 51.66% complete. Time: 156.503 sec.
[/QUOTE]with the benchmark files, [C]clm=2[/C] is chosen.
[QUOTE][Work thread Nov 20 17:58] Using FMA3 FFT length 3M, Pass1=768, Pass2=4K, clm=2, 2 threads
[Work thread Nov 20 17:58] M56051509 stage 1 is 48.27% complete.
[Work thread Nov 20 18:00] M56051509 stage 1 is 48.60% complete. Time: 158.061 sec.
[Work thread Nov 20 18:03] M56051509 stage 1 is 48.93% complete. Time: 157.612 sec.
[Work thread Nov 20 18:06] M56051509 stage 1 is 49.27% complete. Time: 157.592 sec.
[Work thread Nov 20 18:08] M56051509 stage 1 is 49.60% complete. Time: 157.035 sec.
[Work thread Nov 20 18:11] M56051509 stage 1 is 49.93% complete. Time: 157.060 sec.[/QUOTE]relevant lines in [I]results.bench.txt[/I]
[QUOTE]FFTlen=3072K, Type=3, Arch=4, Pass1=768, Pass2=4096, clm=2 (1 core, 1 worker): 18.51 ms. Throughput: 54.03 iter/sec.
FFTlen=3072K, Type=3, Arch=4, Pass1=768, Pass2=4096, clm=2 (1 core hyperthreaded, 1 worker): 16.71 ms. Throughput: 59.85 iter/sec.
FFTlen=3072K, Type=3, Arch=4, Pass1=768, Pass2=4096, clm=1 (1 core, 1 worker): 19.32 ms. Throughput: 51.76 iter/sec.
FFTlen=3072K, Type=3, Arch=4, Pass1=768, Pass2=4096, clm=1 (1 core hyperthreaded, 1 worker): 16.79 ms. Throughput: 59.55 iter/sec.[/QUOTE]

Ensigm 2020-11-20 18:44

Information about the cpu (result of [C]!lscpu[/C])
[CODE]Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 2
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU @ 2.20GHz
Stepping: 0
CPU MHz: 2200.000
BogoMIPS: 4400.00
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 56320K
NUMA node0 CPU(s): 0,1
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat md_clear arch_capabilities[/CODE]

Ensigm 2020-11-21 14:30

The benchmark result seems to be unreproducible and may be due to just "jittering". Another benchmark on the same model (not the same CPU though) shows [C]Pass1=768, Pass2=4096, clm=1[/C] as better than [C]Pass1=768, Pass2=4096, clm=2[/C], which is consistent with real work performance.


All times are UTC. The time now is 04:28.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.