Thread: Intel Xeon PHI? View Single Post
2020-11-29, 23:22   #133
ewmayer
2ω=0

Sep 2002
República de California

23·1,453 Posts

Quote:
 Originally Posted by ewmayer So if the 53.0C for the KNL is to be believed - and the fact that a similar run using 'only' 32 cores gives a cooler 44.0C indicates so - that water cooling is working very well indeed.
Spoke too soon - I neglected to mention that temperature was with the case side panel on the CPU side of the mobo removed - I put the panel back on last night and the temp quickly rose by over 10C into the 65-70C range. When I rechecked just now I saw it at 70C but an added ALARM (CRIT) at end of the sensors output line - not sure precisely what temp triggers that, because it was still at 70C, which I first saw last night, without said alarm message. It probably rose a few degrees higher at some point in the last 15 hours and tripped the alarm. It seems to be a "once tripped, the alarm message persists" deal because I took the side panel back off and the temp quickly dropped back to ~60C, but the message still shows. I looked at the manpage to see if the 'sensors' command has a 'clear alarm' option, didn't find one.

Will look into replacing the side panel in question with a fine-perforated metal-mesh one, similar to the one on top of the casem covering the 2 water-cooler vent fans.

Here some Mlucas avx-512 build timings at 64M-FFT - more below on why that large FFT length is of special interest ATM - on the KNL, all same FFT length, 1-thread-per-core (I found no benefit from any combination of hyperthreading I tried), #threads from 1-64. Parallel scaling is good through 16-threads but then falls off a cliff beyond that:
Code:
64M FFT, 1-thread-per-core, #threads from 1-64:              #thread:	|| scaling (vs 1-thr):
65536  msec/iter = 1765.36  radices =  16 16 16 16 16 32	 1	1.00
65536  msec/iter =  943.43  radices =  16 16 16 16 16 32	 2	.936
65536  msec/iter =  496.24  radices =  16 16 16 16 16 32	 4	.889
65536  msec/iter =  259.18  radices =  16 16 16 16 16 32	 8	.851
65536  msec/iter =  125.93  radices =  16 16 16 16 16 32	16	.876
65536  msec/iter =   85.70  radices = 256 16 16 16 32  	32	.644
65536  msec/iter =   69.06  radices = 256 16 16 16 32  	64	.399
The actual runtimes for a production run, once things settle down after a few minutes, are 5-10% faster - getting ~64ms/iter at 64-threads for the 64M-FFT run described below. Here results - these are just representative examples, I did many more experiments - of several supplemental timing tests, illustrating the ineffectiveness of hyperthreading and the total-throughput boost from running multiple jobs, each using 16 or 32 threads on nonoverlapping sets of cores: