Okay, I think I can replace those routines with calls to gwtobinary32 and binary32togw and use some array arithmetic to achieve the mod calculation...
Completed both to n=400,000 and continuing.

I have got the time down to 5:40 minutes (for a buggy program). This means the array arithmetic is taking 2:10 minutes, and I only have 3040 seconds to play with. I really doubt I can beat Georges generic modular reduction.

Add (2^442042+1)^22 as a new prime!

Completed to 475,000. Nothing new, but continuing

Completed to 505,000. Nothing new, but continuing

I have a command line siever that works, but it slower than MultiSieve for base 2. I couple of reasons that it is slower is that I have generic logic for base 2 and poorly coded hash map code. Fortunately I think the non base 2 code is much faster the MultiSieve. The negative with MultiSieve is that it misses some factors, but that isn't a big enough issue to use the new code (yet).

Here is a Windows build of my sieving code. I (ahem) borrowed parts of the discrete log code from srsieve so I won't release the source until I clean it up and remove references to that. This code runs about 10% faster than MultiSieve (for p < 1e9), unless you are logging factors. That slows it down a bit based upon my testing, but as all factors are doublechecked, that shouldn't be a big deal. I've stated before MultiSieve misses 10% to 15% of the factors due to some bug, but as cksieve is faster there is no reason to continue using MultiSieve for sieving nearsquare primes. One core should be able to sieve about 150G for a range of 500,000 n per day. Unlike MultiSieve, this code is base agnostic, so sieving shouldn't be impacted when sieving a larger base. There is multithreading code in the source, but I know it doesn't work so I don't know if I'm going to try to fix it or not.
