Quote:
Originally Posted by SPWorley
Prime95, however, is very interesting but also fairly opaque. Despite the many comments in factor32.asm and factor64.asm, it's hard to decipher its approach! It seems as if it computes an (approximate!) floating point reciprocal to q, then uses that to compute how many multiples of q to subtract off. So this may be doing classic division and subtraction to compute its mods, boosted by using a floating point initial approximation step.

There are actually many different algorithms in factor32.asm  each written for a different target CPU. Also, it has been a long time since I last looked at the code, so I've forgotten quite a lot.
To compute x mod y, you take the approximate reciprocal accurate to 53bits and compute (top bits of x) * (approx. recip) to get 32 bits of the quotient (almost always accurate, Knuth guarantees it is accurate to within 2). Multiply q by y, subtract from x, repeat til done.
Also, note that you can compute the reciprocal quickly without division. Just take the last reciprocal you computed (it will be real close) and use Newton's method to compute the new reciprocal.