View Single Post
Old 2017-12-28, 06:35   #58
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Rep├║blica de California

101101011011002 Posts
Default

Quote:
Originally Posted by Madpoo View Post
Wouldn't hurt to have someone run it with mlucas just as an extra validation. I'm not running any Linux distros at the moment and too lazy to set one up, otherwise I'd totally give that a shot.
ATH fired up an Mlucas run on an Amazon C5.18xlarge instance this morning, but hit avx-512 build issues with the 17.1 code and was forced to use an avx2 build as a result, got ~4 ms/iter before moving to a much cheaper C4 instance), since that also supports avx2. I sent him a patched v17.1 tarball just now, hoping to see appreciably better timings from use of avx-512, though even if so, I'm not sure he'd do the verify using that build due to the expense of running on C5.

I did some manythreaded timing tests using avx2 on David Stanfill's 32-core Xeon and the GIMPS KNL (the 2 machines I am currently using for side-by-side primality tests of F30 at FFT lengths 60M and 64M, respectively) ... getting just a smidge under 4 ms/iter there. Alas the KNL offers no faster alternative because

[1] It is severely underclocked relative to the Xeon to keep the massive die from melting;

[2] Even though the KNL has 64 physical cores and 256 logical ones vs the Xeon's 32/64, I cannot take advantage of the higher core count to make up for the lower clock speed because the parallel speedups peter out between 16 and 32 threads at the relatively small FFT length needed for the new-prime candidate. (OTOH, the large FFT lengths needed by my F30 runs take much better advantage of the high core/thread counts.)

Worst case we'll get an Mlucas verify within 72 hours - if Andreas gets some nice speedups from an avx-512 build on C5, that estimate will drop, even if I have to Paypal-bribe him the C5 run costs to get him to use that for the verify. :)
ewmayer is offline   Reply With Quote