View Single Post
Old 2021-01-19, 13:46   #74
tdulcet
 
tdulcet's Avatar
 
"Teal Dulcet"
Jun 2018

2·3·5 Posts
Default

Quote:
Originally Posted by ewmayer View Post
@tdulcet - How about I add support in v19.1 for the -radset flag to take either an index into the big table, or an actual set of comma-separated FFT radices?
That would be very helpful to automate this!

Quote:
Originally Posted by ewmayer View Post
Edit: Why make people wait - here is a modified version of Mlucas.c which supports the above-described -radset argument.
Wow, thanks for doing it so quickly! This will be very helpful. I committed and pushed the the changes I described in my previous post to GitHub here, which basically implements step # 1, 2 and part of 4. I will now get started on step 3 and the rest of 4 using your new version of Mlucas.c.

In my previous post on an example 8c/16t system, I said it will multiply the 4x2t msec/iter times by 1.5 before comparing them to the 8x1t times, following the instructions on the Mlucas README. After doing more testing, I was getting unexpected results with this formula ((CPU cores / workers) - 0.5), so it will now multiply the times by 2 (CPU cores / workers) for this example. This should be irrelevant once I implement step 3.

I thought I should note that some systems like the Intel Xeon Phi can have more then two CPU threads per CPU core. The Mlucas README does not mention this case, but my script should correctly handle it for Intel and AMD x86 systems. For example, on a 64 core/256 thread Intel Xeon Phi system it would try these combinations (only showing the first -cpu argument for brevity):
Code:
#   Workers/Runs  Threads          -cpu arguments
1   1             64, 1 per core   0:63
2   2             32, 1 per core   0:31
3   4             16, 1 per core   0:15
4   8             8, 1 per core    0:7
5   16            4, 1 per core    0:3
6   32            2, 1 per core    0:1
7   64            1, 1 per core    0
8   1             128, 2 per core  0:63,64:127
9   2             64, 2 per core   0:31,64:95
10  4             32, 2 per core   0:15,64:79
11  8             16, 2 per core   0:7,64:71
12  16            8, 2 per core    0:3,64:67
13  32            4, 2 per core    0:1,64:65
14  64            2, 2 per core    0,64
15  1             256, 4 per core  0:63,64:127,128:191,192:255
16  2             128, 4 per core  0:31,64:95,128:159,192:223
17  4             64, 4 per core   0:15,64:79,128:143,192:207
18  8             32, 4 per core   0:7,64:71,128:135,192:199
19  16            16, 4 per core   0:3,64:67,128:131,192:195
20  32            8, 4 per core    0:1,64:65,128:129,192:193
21  64            4, 4 per core    0,64,128,192

Last fiddled with by tdulcet on 2021-01-19 at 13:52
tdulcet is offline   Reply With Quote