a)
I have also done some investigations in optimal multipliers.
One way was just to factor many, many numbers and to record the multipliers which lead to a factorization. It turned out that some multipliers had a much higher chance to factor a number then others. Then after learning, when finding factors we iterate over the good multipliers (biggest number of factorizations) first. For moderate inputs (i.e. 'n' below 40 bits) using the list of good multipliers gives a reasonable speedup (~10%). The list of good multipliers has some kind structure for lower multipliers (lets say < 8000). But above a certain limit this structure seems to be gone.
b)
There are only c*k^1/3 multipliers. So we can easily create them before factoring numbers n < 2 ^ 54 and store them in an array off size 1.000.00 . This is what i have done in a). For numbers above 2^52 The Brent Pollard Rho algorithm is faster anyway.
c) An other way to avoid the cost of multiplying the k_i is to use a gray code style iterative iteration over the multipliers. Here in each iteration only one multiplier will be exchanged.
d)
Above n > 40 Bits choosing a fixed multiplier still works best. Without applying some mod 2^k arguments to improve the chosen 'a' (which is D^2 in your notation) multipliers like 720, 5040 work best.
Til and me apply some modifications to the after choosing the 'a'. We increase until there is a solution to a^2  4kn = b^2 mod 2^k. This generalized mod 2^k arguments improve the behavior of odd multipliers. Lehman also gives a special case of our generalization.
In our (Til and me) unpublished versions 315 (odd!) was the best multiplier. The approach in a was only faster below 40 Bits.
e) Due to the observations above, it seems like there is no gain in choosing special (smooth, ???) multipliers for numbers above 40 bits. But maybe my learning was to short, or it does not discover the real structure of the multipliers. A running algorithm to prove that choosing a special order of multipliers would be nice.
