View Single Post
Old 2021-01-10, 17:03   #64
pvn
 
Nov 2020

22 Posts
Default

Hi ernst, thanks for looking at this and apologies for delays on my end.

Quote:
Do you recall which precise radix set you saw the warning at in your case? To see it for 4-threads implies radix0/2 is not divisible by 4, which is only true for a handful of small leading radices: radix0 = 12,20,28,36,44,52,60. That's no problem, it just means that in using the self-tests to create the mlucas.cfg file for your particular -cpu [lo:hi] choice, the above suboptimality will likely cause a different FFT-radix-combo at the given FFT length to run best, which will be reflected in the corresponding mlucas.cfg file entry.
Does this mean that the self-test run is taking longer because it's... weeding out the unsuitable radicies? I think this makes sense given what I see in the resulting cfg files (at any given FFT length, the msec/iter (roughly) scales with the number of cores used even when the 4-core self test takes unexpectedly too much time overall.


Also, it seems important to note that all of the radicies that actually get saved in the mlucas.cfg when running -cpu 0:3 are evenly divisible by NTHREADS*2 (in this case, NTHREADS=4).


here's some of the output with the radix sets that gave the "this will hurt perforamnce" message (these runs seem to take about 50% more time than the other runs at the same FFT size):

M43765019: using FFT length 2304K = 2359296 8-byte floats, initial residue shift count = 29224505
this gives an average 18.550033145480686 bits per digit
Using complex FFT radices 36 32 32 32
mers_mod_square: radix0/2 not exactly divisible by NTHREADS - This will hurt performance.

M48515021: using FFT length 2560K = 2621440 8-byte floats, initial residue shift count = 31467905
this gives an average 18.507011795043944 bits per digit
Using complex FFT radices 20 16 16 16 16
mers_mod_square: radix0/2 not exactly divisible by NTHREADS - This will hurt performance.

M53254447: using FFT length 2816K = 2883584 8-byte floats, initial residue shift count = 35280290
this gives an average 18.468144850297406 bits per digit
Using complex FFT radices 44 32 32 32
mers_mod_square: radix0/2 not exactly divisible by NTHREADS - This will hurt performance.

M53254447: using FFT length 2816K = 2883584 8-byte floats, initial residue shift count = 23722047
this gives an average 18.468144850297406 bits per digit
Using complex FFT radices 44 8 16 16 16
mers_mod_square: radix0/2 not exactly divisible by NTHREADS - This will hurt performance.

M62705077: using FFT length 3328K = 3407872 8-byte floats, initial residue shift count = 61480382
this gives an average 18.400068136361931 bits per digit
Using complex FFT radices 52 32 32 32
mers_mod_square: radix0/2 not exactly divisible by NTHREADS - This will hurt performance.

M67417873: using FFT length 3584K = 3670016 8-byte floats, initial residue shift count = 63290971
this gives an average 18.369912556239537 bits per digit
Using complex FFT radices 28 16 16 16 16
mers_mod_square: radix0/2 not exactly divisible by NTHREADS - This will hurt performance.

M72123137: using FFT length 3840K = 3932160 8-byte floats, initial residue shift count = 65799790
this gives an average 18.341862233479819 bits per digit
Using complex FFT radices 60 32 32 32
mers_mod_square: radix0/2 not exactly divisible by NTHREADS - This will hurt performance.

M86198291: using FFT length 4608K = 4718592 8-byte floats, initial residue shift count = 21266494
this gives an average 18.267799165513779 bits per digit
Using complex FFT radices 36 16 16 16 16
mers_mod_square: radix0/2 not exactly divisible by NTHREADS - This will hurt performance.

M95551873: using FFT length 5120K = 5242880 8-byte floats, initial residue shift count = 93620243
this gives an average 18.225073432922365 bits per digit
Using complex FFT radices 20 16 16 16 32
mers_mod_square: radix0/2 not exactly divisible by NTHREADS - This will hurt performance.

M95551873: using FFT length 5120K = 5242880 8-byte floats, initial residue shift count = 43929528
this gives an average 18.225073432922365 bits per digit
Using complex FFT radices 20 32 16 16 16
mers_mod_square: radix0/2 not exactly divisible by NTHREADS - This will hurt performance.

M104884309: using FFT length 5632K = 5767168 8-byte floats, initial residue shift count = 24783492
this gives an average 18.186449397693981 bits per digit
Using complex FFT radices 44 16 16 16 16
mers_mod_square: radix0/2 not exactly divisible by NTHREADS - This will hurt performance.

M123493333: using FFT length 6656K = 6815744 8-byte floats, initial residue shift count = 30371346
this gives an average 18.118833835308369 bits per digit
Using complex FFT radices 52 16 16 16 16
mers_mod_square: radix0/2 not exactly divisible by NTHREADS - This will hurt performance.

M132772789: using FFT length 7168K = 7340032 8-byte floats, initial residue shift count = 24638813
this gives an average 18.088856969560897 bits per digit
Using complex FFT radices 28 16 16 16 32
mers_mod_square: radix0/2 not exactly divisible by NTHREADS - This will hurt performance.

M132772789: using FFT length 7168K = 7340032 8-byte floats, initial residue shift count = 92450206
this gives an average 18.088856969560897 bits per digit
Using complex FFT radices 28 32 16 16 16
mers_mod_square: radix0/2 not exactly divisible by NTHREADS - This will hurt performance.

M142037359: using FFT length 7680K = 7864320 8-byte floats, initial residue shift count = 90349695
this gives an average 18.060984166463218 bits per digit
Using complex FFT radices 60 16 16 16 16
mers_mod_square: radix0/2 not exactly divisible by NTHREADS - This will hurt performance.

Last fiddled with by pvn on 2021-01-10 at 17:06
pvn is offline   Reply With Quote