 2016-10-12, 00:19 #1 frmky     Jul 2003 So Cal 22×547 Posts msieve on KNL I've been playing with msieve linear algebra on Knights Landing cpus. Specifically, each compute node has one Intel(R) Xeon Phi(TM) CPU 7250 @ 1.40GHz. This processor has 68 cores in 34 tiles, each with 4 threads, for a total of 272 threads per node. I compiled msieve with MPI support using icc with the -xMIC-AVX512 option. This worked just fine. I also tried disabling the ASM instructions and using just the C code to see if the compiler would vectorize using AVX-512, but the resultant binary was slightly slower. Trying out different parameters, I get by far the best performance with one MPI process per tile with 8 threads per process. So with one compute node, the best layout is a 2x17 MPI grid with 8 threads. Here is a table of estimated runtimes on a 42.1M matrix: Code: cores nodes time (hrs) 68 1 444 136 2 233 272 4 131 544 8 83 1088 16 46 2176 32 33 The last entry uses a 32x34 MPI grid, which is the largest I can use without recompiling and rebuilding the matrix. Would explicit use of AVX-512 speed up the matmul?
 2016-10-12, 02:33 #2 jasonp Tribal Bullet     Oct 2004 3·1,181 Posts Probably, the scatter-gather instructions could be useful. Using 512-bit vectors explicitly in block Lanczos may or may not be faster, the vector-vector operations would need hugely more memory for precomputations.
 2016-11-05, 23:03 #3 frmky     Jul 2003 So Cal 22×547 Posts Turns out KNL doesn't like a nearly symmetric grid. In the table above, I had run 544 cores as a 16x17 grid, but instead using an 8x34 grid runs nearly 10% faster. Therefore I have also removed the 2176 core run, which used a 32x34 grid. Code: cores nodes time (hrs) 68 1 444 136 2 233 272 4 131 544 8 76 1088 16 46 2176 32 ?? Currently msieve has a max MPI grid dimension of 35. Is increasing this simply a matter of changing the value in include/common.h, or are there possible overflows or other gotchas to watch out for? BTW, the last half of the 2,1285- linear algebra was run using the KNL nodes, so it works correctly.
 2016-11-06, 11:45 #4 jasonp Tribal Bullet     Oct 2004 3×1,181 Posts I saw, that was awesome. The maximum grid size is just a definition in the code, but also controls the size of a binary file, so once you change the definition you will be binary incompatible with previous savefiles. (Just change MAX_MPI_GRID_DIM in common.h)

