View Single Post
Old 2020-11-20, 18:37   #1
Ensigm
 
Aug 2020

2·3·19 Posts
Default mprime: (slightly) worse performance when utilizing benchmark results

I ran some benchmark on Google colab, hoping that it will improve throughput by helping to choose the best fft implementation. It turns out the one performs the best in benchmark isn't the best one in actual combat. All of these happen on the same runtime (i.e., the same CPU).

without the benchmark files, clm=1 is chosen.
Quote:
[Work thread Nov 20 18:12] Using FMA3 FFT length 3M, Pass1=768, Pass2=4K, clm=1, 2 threads
[Work thread Nov 20 18:12] M56051509 stage 1 is 49.99% complete.
[Work thread Nov 20 18:15] M56051509 stage 1 is 50.32% complete. Time: 156.665 sec.
[Work thread Nov 20 18:17] M56051509 stage 1 is 50.66% complete. Time: 156.890 sec.
[Work thread Nov 20 18:20] M56051509 stage 1 is 50.99% complete. Time: 156.673 sec.
[Work thread Nov 20 18:22] M56051509 stage 1 is 51.33% complete. Time: 156.623 sec.
[Work thread Nov 20 18:25] M56051509 stage 1 is 51.66% complete. Time: 156.503 sec.
with the benchmark files, clm=2 is chosen.
Quote:
[Work thread Nov 20 17:58] Using FMA3 FFT length 3M, Pass1=768, Pass2=4K, clm=2, 2 threads
[Work thread Nov 20 17:58] M56051509 stage 1 is 48.27% complete.
[Work thread Nov 20 18:00] M56051509 stage 1 is 48.60% complete. Time: 158.061 sec.
[Work thread Nov 20 18:03] M56051509 stage 1 is 48.93% complete. Time: 157.612 sec.
[Work thread Nov 20 18:06] M56051509 stage 1 is 49.27% complete. Time: 157.592 sec.
[Work thread Nov 20 18:08] M56051509 stage 1 is 49.60% complete. Time: 157.035 sec.
[Work thread Nov 20 18:11] M56051509 stage 1 is 49.93% complete. Time: 157.060 sec.
relevant lines in results.bench.txt
Quote:
FFTlen=3072K, Type=3, Arch=4, Pass1=768, Pass2=4096, clm=2 (1 core, 1 worker): 18.51 ms. Throughput: 54.03 iter/sec.
FFTlen=3072K, Type=3, Arch=4, Pass1=768, Pass2=4096, clm=2 (1 core hyperthreaded, 1 worker): 16.71 ms. Throughput: 59.85 iter/sec.
FFTlen=3072K, Type=3, Arch=4, Pass1=768, Pass2=4096, clm=1 (1 core, 1 worker): 19.32 ms. Throughput: 51.76 iter/sec.
FFTlen=3072K, Type=3, Arch=4, Pass1=768, Pass2=4096, clm=1 (1 core hyperthreaded, 1 worker): 16.79 ms. Throughput: 59.55 iter/sec.
Attached Files
File Type: txt gwnum.txt (39.5 KB, 41 views)
File Type: txt results.bench.txt (100.0 KB, 42 views)
Ensigm is offline   Reply With Quote