2021-04-21, 02:36 | #1 |
"Ben"
Feb 2007
3,617 Posts |
AVX-PM1/PP1
There are a couple new features now available in yafu 2.0: parallel P-1 and P+1 using AVX512 (similar to AVX-ECM). These are still work-in-progress, but I've tested the basic functionality on a couple systems now.
The "parallel" part is through testing up to 8 inputs simultaneously to the same B1/B2 bounds. To do this, put the inputs one-per-line in a file named either pm1_work.ini or pp1_work.ini. Then call the function "vpm1" or "vpp1". Examples: ./yafu "vpm1" -B1pm1 10000000 -v ./yafu "vpp1" -B1pp1 10000000 -v Because all of the inputs get loaded into the same vector, they should ideally be of similar size. The code will use a vector length that accommodates the largest input. Stage 1 is run in parallel using the vector AVX512 bignum library in yafu, then stage 2 is run using yafu's internal gmp-ecm library code (one at a time). I've measured speedups in stage 1 up to about 3.5x for P-1 and up to about 4.5x on P+1 on a system with AVX-512F. On my laptop with AVX-512IFMA, the stage 1 speedups are up to about 4.5x for P-1 and 5.5x for P+1. Still a work in progress, but if anyone tests it out I'd welcome any feedback. Planned todo's: * make input filenames configurable * write output to factor.log * read/write stage 1 savefiles * accommodate special forms (mersenne/pseudo-mersenne/mersenne+2) * accommodate external gmp-ecm for stage 2 * ? [edit] I should maybe also mention that these functions apply to smallish inputs only, up to a few thousand bits or so. After that the AVX-512 library runs out of gas compared to gmp-ecm or prime95 of course for big Mersenne's. On the long-term todo list is to add karatsuba and other subquadratic methods to the library, but that's not happening soon. Last fiddled with by bsquared on 2021-04-21 at 02:50 Reason: todo's |