View Single Post
Old 2017-12-28, 07:40   #60
ewmayer's Avatar
Sep 2002
Rep├║blica de California

101101011011002 Posts

Update: GP2 did Mlucas 17.0 avx-512 build on c5.18xlarge, timing down to 2.9 ms/iter, we continue to play with core/threadcount params to try to reduce that further. Note this was not on the default instance of this type - in his words, a "pre-LTS (beta) version of Amazon Linux 2, which has gcc 7.2.1", compared to the gcc 6.4 of the default c5 instances, for which his timing of an avx-512 build at the same FFT length was 4 ms/iter. The latter is weird, because that's the timing ATH got using an avx2 build on the (IIRC) same instance type.

But timings weirdness seems to be the name of the game for lots of threads running on fairly exotic hardware- in my own avx2-build timing tests on David Stanfill's Xeon, I was able to get at best 4.8 ms/iter on the unloaded system, but just for giggles I redid all the timings with my big nice'd F30 run crunching away in the background, and voila! The LL-test timings dropped by 20%. Both jobs (new-prime verify at full priority, F30 test nice'd) have been running side-by-side for around 10 hours now, confirming the speedup-under-background-load is not a chimera - and even with the LL-DC consuming most of the FLOPS, the F30 run is still proceeding at ~1/3 of its normal speed. IIRC Serge Batalov ran into a similar phenomenon with his Mlucas verify of a then-new M-prime a few years back - he got permission to do it on a high-end server of the company he worked for at the time, meaning he had to put up with whatever 'real work' his colleagues were using the system for competing for cycles. And like with my run, his run was fastest when other jobs were running. It's like the NBA player who can only shoot straight when he has a defender in his face. :)

Last fiddled with by ewmayer on 2017-12-28 at 07:55
ewmayer is offline   Reply With Quote