r372 has new AVX512 code for SIQS, plus a couple tweaks for more efficiently running on large numbers of threads. RSA-100 in 6 minutes on a 256-thread KNL 7210. Even without the AVX512 code the KNL is pretty fast (about 9 minutes).

I've also tested on a skylake 7800x but the performance gains aren't as nice there. Maybe 10% and only with larger (80 digits or more) inputs.

This is linux only at this point... I have not yet been able to find a compiler capable of avx512 for windows and so I haven't been able to test there. The Intel compiler can, I know, but I don't have a windows version of it.

Code:

starting SIQS on c100: 1522605027922533360535618378132637429718068114961380688657908494580122963258952897654000350692006139
==== sieve params ====
n = 100 digits, 330 bits
factor base: 115520 primes (max prime = 3204611)
single large prime cutoff: 464668595 (145 * pmax)
double large prime range from 44 to 52 bits
double large prime range from 10269531661321 to 3988944358351073
allocating 9 large prime slices of factor base
buckets hold 2048 elements
large prime hashtables have 1179648 bytes
using KNL-AVX2 enabled 32k sieve core
sieve interval: 8 blocks of size 32768
polynomial A has ~ 13 factors
using multiplier of 1
using SPV correction of 19 bits, starting at offset 30
trial factoring cutoff at 100 bits
==== sieving in progress (256 threads): 115584 relations needed ====
==== Press ctrl-c to abort and save state ====
119777 rels found: 29067 full + 90710 from 1703942 partial, (4709.08 rels/sec)
sieving required 6419308 total polynomials (4705 'A' polynomials)
trial division touched 258498542 sieve locations out of 3365566152704
squfof: 0 failures, 1386863 attempts, 121153992 outside range, 6726584 prp, 1281923 useful
total reports = 258498542, total surviving reports = 129718525
total blocks sieved = 102712976,avg surviving reports per block = 1.26
large prime scan failures = 0
QS elapsed time = 368.4496 seconds.