View Single Post
Old 2020-12-11, 21:41   #63
ewmayer's Avatar
Sep 2002
Rep├║blica de California

1162610 Posts

@pvn: Sorry for belated reply - that warning message is more common for larger threadcounts, it's basically telling you that part of the FFT code needs the leading (leftmost in the "Using complex FFT radices" info-print) to be divisible by #threads in order to run optimally. Example from a DC my last functioning bought-cheap-used Android phone is currently doing:

Using complex FFT radices 192 32 16 16

The leading radix here is radix0 = 192, thus radix0/2 = 96 = 32*3. Sticking to power-of-2 thread counts (which the other main part of my 2-phases-per-iteration FFT code needs to run optimally) we'd be fine for #threads = 2,4,8,16,32, but 64 would give you the warning you saw.

Do you recall which precise radix set you saw the warning at in your case? To see it for 4-threads implies radix0/2 is not divisible by 4, which is only true for a handful of small leading radices: radix0 = 12,20,28,36,44,52,60. That's no problem, it just means that in using the self-tests to create the mlucas.cfg file for your particular -cpu [lo:hi] choice, the above suboptimality will likely cause a different FFT-radix-combo at the given FFT length to run best, which will be reflected in the corresponding mlucas.cfg file entry.

I've always gotten quite good multithreaded scaling on my Arm devices (Odroid min-PC and Android phone) up to 4-threads - did you run separate self-tests for -cpu 0, -cpu 0:1 and -cpu 0:3 and compare the resulting mlucas.cfg files?

On the Graviton instance you're using, what does /proc/cpu show in terms of #cores?
ewmayer is offline   Reply With Quote