mersenneforum.org SQUFOF implementation
 Register FAQ Search Today's Posts Mark Forums Read

 2010-04-10, 23:14 #1 alpertron     Aug 2002 Buenos Aires, Argentina 22×3×113 Posts SQUFOF implementation Since there are not many places on the Internet where you can find a SQUFOF implementation, I've written mine in Java based on the algorithm explained in the Gower and Wagstaff paper. This will be used in my factorization applet. Code:  /* Implementation of algorithm explained in Gower and Wagstaff paper */ static int SQUFOF(long N, int queue[]) { double sqrt; int B, Q, Q1, P, P1, L, S; int i, r, s, t, q, u; int queueHead, queueTail, queueIndex; /* Step 1: Initialize */ if ((N & 3) == 1) { N <<= 1; } sqrt = Math.sqrt(N); S = (int)sqrt; if ((long)(S+1)*(long)(S+1)<=N) { S++; } if ((long)S*(long)S > N) { S--; } if ((long)S*(long)S == N) { return S; } Q1 = 1; P = S; Q = (int)N - P*P; L = (int)(2*Math.sqrt(2*sqrt)); B = L << 1; queueHead = 0; queueTail = 0; /* Step 2: Cycle forward to find a proper square form */ for (i=0; i<=B; i++) { q = (S+P)/Q; P1 = q*Q-P; if (Q <= L) { if ((Q & 1) == 0) { queue[queueHead++] = Q >> 1; queue[queueHead++] = P % (Q >> 1); if (queueHead == 100) { queueHead = 0; } } else if (Q+Q<=L) { queue[queueHead++] = Q; queue[queueHead++] = P % Q; if (queueHead == 100) { queueHead = 0; } } } t = Q1+q*(P-P1); Q1 = Q; Q = t; P = P1; if ((i & 1) == 0 && ((Q & 7) < 2 || (Q & 7) == 4)) { r = (int)Math.sqrt(Q); if (r*r == Q) { queueIndex = queueTail; for (;;) { if (queueIndex == queueHead) { /* Step 3: Compute inverse square root of the square form */ Q1 = r; u = (S-P)%r; u += (u >> 31) & r; P = S-u; Q = (int)((N-(long)P*(long)P)/Q1); /* Step 4: Cycle in the reverse direction to find a factor of N */ for (;;) { q = (S+P)/Q; P1 = q*Q-P; if (P == P1) { break; } t = Q1 +q*(P-P1); Q1 = Q; Q = t; P = P1; } /* Step 5: Get the factor of N */ if ((Q & 1) == 0) { return Q >> 1; } return Q; } s = queue[queueIndex++]; t = queue[queueIndex++]; if (queueIndex == 100) { queueIndex = 0; } if ((P-t)%s == 0) { break; } } if (r > 1) { queueTail = queueIndex; } if (r == 1) { queueIndex = queueTail; for (;;) { if (queueIndex == queueHead) { break; } if (queue[queueIndex] == 1) { return 0; } queueIndex += 2; if (queueIndex == 100) { queueIndex = 0; } } } } } } return 0; } The queue (an integer array with 100 elements) is created by the caller so it can be reused. Notice that in Step 2 a square root must be computed every two cycles of the loop. This can probably be optimized. The paper says that step 4 can be optimized but I will need to investigate it. Maybe some forum participants who programmed SQUFOF before (Bob and others) can give me some hints in order to optimize it. Last fiddled with by alpertron on 2010-04-10 at 23:16
 2010-04-11, 01:45 #2 bsquared     "Ben" Feb 2007 343810 Posts Are you using multipliers? I didn't see that you were, but maybe I missed where they were implemented or maybe they are already in N when this routine is called. Racing several small multipliers (by keeping a small amount of state memory for each multipler and running smallish batches of iterations on each) is a good optimization. Msieve and yafu both have implementations which do this. The step 4 optimization I believe is called fast return. Search for papers by Steven McMath on this subject - he researched it extensively and also published some code, I think... yes, it's called "Parallel integer factorization using quadratic forms". I never got around to implementing fast return, so if you are successful I'd be very interested in seeing it.
 2010-04-11, 02:01 #3 alpertron     Aug 2002 Buenos Aires, Argentina 22×3×113 Posts I have not implemented multipliers in the code. First I wanted to be sure that the code worked. There were two errors in the algorithm (section 3.3 of the Gower and Wagstaff paper) that I corrected in the implementation. From what I read in http://hal.inria.fr/docs/00/45/16/04...draft_02_1.pdf it appears that the "fast return" is really faster when factoring numbers greater than 20 digits, but I think that in this case SIQS will be faster.
 2010-04-11, 11:57 #4 jasonp Tribal Bullet     Oct 2004 1101110100012 Posts What corrections did you have to make? It's been a long time, but the SQUFOF code in msieve was taken directly from Gower's description, and I didn't understand enough of the underlying math to make corrections. Also, it was quite helpful in C to handle a special case of q=1 before doing the divisions at the beginning of the even and odd cycles. It probably wouldn't help in Java though. Last fiddled with by jasonp on 2010-04-11 at 11:59
2010-04-11, 12:16   #5
R.D. Silverman

Nov 2003

22·5·373 Posts

Quote:
 Originally Posted by alpertron Since there are not many places on the Internet where you can find a SQUFOF implementation, I've written mine in Java based on the algorithm explained in the Gower and Wagstaff paper. This will be used in my factorization applet. Code:  /* Implementation of algorithm explained in Gower and Wagstaff paper */ static int SQUFOF(long N, int queue[]) { double sqrt; int B, Q, Q1, P, P1, L, S; int i, r, s, t, q, u; int queueHead, queueTail, queueIndex; /* Step 1: Initialize */ if ((N & 3) == 1) { N <<= 1; } sqrt = Math.sqrt(N); S = (int)sqrt; if ((long)(S+1)*(long)(S+1)<=N) { S++; } if ((long)S*(long)S > N) { S--; } if ((long)S*(long)S == N) { return S; } Q1 = 1; P = S; Q = (int)N - P*P; L = (int)(2*Math.sqrt(2*sqrt)); B = L << 1; queueHead = 0; queueTail = 0; /* Step 2: Cycle forward to find a proper square form */ for (i=0; i<=B; i++) { q = (S+P)/Q; P1 = q*Q-P; if (Q <= L) { if ((Q & 1) == 0) { queue[queueHead++] = Q >> 1; queue[queueHead++] = P % (Q >> 1); if (queueHead == 100) { queueHead = 0; } } else if (Q+Q<=L) { queue[queueHead++] = Q; queue[queueHead++] = P % Q; if (queueHead == 100) { queueHead = 0; } } } t = Q1+q*(P-P1); Q1 = Q; Q = t; P = P1; if ((i & 1) == 0 && ((Q & 7) < 2 || (Q & 7) == 4)) { r = (int)Math.sqrt(Q); if (r*r == Q) { queueIndex = queueTail; for (;;) { if (queueIndex == queueHead) { /* Step 3: Compute inverse square root of the square form */ Q1 = r; u = (S-P)%r; u += (u >> 31) & r; P = S-u; Q = (int)((N-(long)P*(long)P)/Q1); /* Step 4: Cycle in the reverse direction to find a factor of N */ for (;;) { q = (S+P)/Q; P1 = q*Q-P; if (P == P1) { break; } t = Q1 +q*(P-P1); Q1 = Q; Q = t; P = P1; } /* Step 5: Get the factor of N */ if ((Q & 1) == 0) { return Q >> 1; } return Q; } s = queue[queueIndex++]; t = queue[queueIndex++]; if (queueIndex == 100) { queueIndex = 0; } if ((P-t)%s == 0) { break; } } if (r > 1) { queueTail = queueIndex; } if (r == 1) { queueIndex = queueTail; for (;;) { if (queueIndex == queueHead) { break; } if (queue[queueIndex] == 1) { return 0; } queueIndex += 2; if (queueIndex == 100) { queueIndex = 0; } } } } } } return 0; } The queue (an integer array with 100 elements) is created by the caller so it can be reused. Notice that in Step 2 a square root must be computed every two cycles of the loop. This can probably be optimized. The paper says that step 4 can be optimized but I will need to investigate it. Maybe some forum participants who programmed SQUFOF before (Bob and others) can give me some hints in order to optimize it.

Code:
squfof(n,fac1,fac2)
int n[];
int *fac1, *fac2;

{   /* start of squfof: factor n as fac1*fac2  faster in FP?????*/

int temp[MPDIM], temp1[MPDIM];
register int iq,ll,l2,p,pnext,q,qlast,r,s,t,i;
int jter,iter;

if (SQUFOF) squfof_calls++;
#define   DEBUG_QSC   0
if (DEBUG_QSC) { (void) printf("trying qs on: "); write_num(n,numstr); fflush(stdout); }
jter = small_qs(n,fac1,fac2);
if (DEBUG_QSC)
{
(void) printf("qs returns %d\n",jter);
(void) printf("tried qs on "); write_num(n, numstr);
(void) printf("fac1, fac2 = %d %d, squfof returning %d\n", *fac1, *fac2, jter);
fflush(stdout);
}         /* -1 should be very rare; QS found no non-trivial GCD's   */
if (jter != -1) return(jter);

/*   Notes: could replace calls to the MP routines [on Pentium] by using*/
/*  __int64  for temp,temp1, convert n to __int64 etc. But the MP stuff */
/*  is not inside a loop. It is not worth doing                         */

qlast = 1;
mpsqrt(n,temp);

s = LAST(temp);
p = s;
mult(temp,temp,temp1);
subt(n,temp1,temp);                    /* temp = n - floor(sqrt(n))^2   */
if (SIZE(temp) <= SINGLE && LAST(temp) == 0)
{                                   /* Here n is a square            */
*fac1 = s;
*fac2 = s;
return(1);
}

q = LAST(temp);              /* q = excess of n over next smaller square */
ll = 1 + 2*(int)sqrt((double)(p+p));
l2 = ll/2;
qpoint = 0;

/*   In the loop below, we need to check if q is a square right before   */
/*  the end of the loop.  Is there a faster way? The current way is      */
/*   EXPENSIVE! (many branches and double prec sqrt)                     */

for (jter=0; jter < 800000; jter++)      /* I see no way to speed this   */
{                                     /*  main loop                   */
iq = (s + p)/q;
pnext = iq*q - p;
if (q <= ll)
{
if ((q & 1) == 0) enqu(q/2,&jter);
else if (q <= l2) enqu(q,&jter);
if (jter < 0)
{                        /* queue overflow: try pollard rho   */
if (Q_OVERFLOW) overflow_count++;    /* should be rare         */
return(pollardrho2(n,fac1,fac2));   /* also try squfof(3N)?   */
}
}
t = qlast + iq*(p - pnext);
qlast = q;
q = t;
p = pnext;                          /* check for square; even iter   */
if (jter & 1) continue;             /* jter is odd:omit square test  */
//   if (t = (q & 3) > 1) continue;    /* skip t = 2,3 mod 4            */
//   if (q & 7 == 5) continue;         /* skip q = 5 mod 8              */
//   if (t == 0 && (q & 15) > 5) continue;   /* skip 8, 12 mod 16       */
//   if (s3465[q % 3465] == 'n') continue;   /* skip QNR's mod 3465     */
r = (int)sqrt((double)q);                 /* r = floor(sqrt(q))      */
if (q != r*r) continue;
if (qpoint == 0) goto gotit;
for (i=0; i<qpoint-1; i+=2)      /* treat queue as list for simplicity*/
{
if (r == qqueue[i]) goto contin;
if (r == qqueue[i+1]) goto contin;
}
if (r == qqueue[qpoint-1]) continue;
goto gotit;
contin:;
}   /* end of main loop */

void sqfprep()
{
int i;
for (i=0; i<3465; i++) s3465[i] = 'n';
for (i=0; i<1733; i++) s3465[(i*i) % 3465] = 'r';
}

void enqu(q,iter)
int q;
int *iter;
{
qqueue[qpoint] = q;
if (++qpoint > 50) *iter = -1;
}

2010-04-11, 13:40   #6
alpertron

Aug 2002
Buenos Aires, Argentina

22·3·113 Posts

Quote:
 Originally Posted by jasonp What corrections did you have to make? It's been a long time, but the SQUFOF code in msieve was taken directly from Gower's description, and I didn't understand enough of the underlying math to make corrections. Also, it was quite helpful in C to handle a special case of q=1 before doing the divisions at the beginning of the even and odd cycles. It probably wouldn't help in Java though.
In Step 3 the paper says: $Q \leftarrow (N-P*P)/Q'$ , but it must be $Q \leftarrow (D-P*P)/Q'$

In Step 4a the paper says Set $q \leftarrow \lfloor (S+P)/Q'\rfloor$ but it must be $q \leftarrow \lfloor (S+P)/Q\rfloor$

The last one should be easy to catch because even the example provided does not work using the step 4a as written on the paper.

Last fiddled with by alpertron on 2010-04-11 at 14:39

 2010-04-11, 14:37 #7 alpertron     Aug 2002 Buenos Aires, Argentina 25148 Posts Acording to the paper, the implementation of the queue as a list as written by Bob above is slower than the queue because some proper square forms could be skipped.
 2010-04-11, 16:49 #8 bsquared     "Ben" Feb 2007 343810 Posts comparisons I went ahead and did some comparisons. Bob, I had to port your code so that it works outside your lattice siever, in my test routine. If anything it should be faster, since it uses native 64 bit integers rather than your multiple precision routines, when necessary. I also got rid of the calls to pollard rho, and your mini-qs, so that the comparison is between squfof routines only. Here is your modified code: Code: int qqueue[100]; int qpoint; void enqu(int q,int *iter) { qqueue[qpoint] = q; if (++qpoint > 100) *iter = -1; } int squfof_rds(int64 n,int *fac1,int *fac2) { /* start of squfof: factor n as fac1*fac2 faster in FP?????*/ int64 temp, temp1; register int iq,ll,l2,p,pnext,q,qlast,r,s,t,i; int jter,iter; qlast = 1; s = (int)sqrt(n); p = s; temp1 = s * s; temp = n - temp1; /* temp = n - floor(sqrt(n))^2 */ if (temp == 0) { /* Here n is a square */ *fac1 = s; *fac2 = s; return(1); } q = (int)temp; /* q = excess of n over next smaller square */ ll = 1 + 2*(int)sqrt((double)(p+p)); l2 = ll/2; qpoint = 0; /* In the loop below, we need to check if q is a square right before */ /* the end of the loop. Is there a faster way? The current way is */ /* EXPENSIVE! (many branches and double prec sqrt) */ for (jter=0; jter < 800000; jter++) /* I see no way to speed this */ { /* main loop */ iq = (s + p)/q; pnext = iq*q - p; if (q <= ll) { if ((q & 1) == 0) enqu(q/2,&jter); else if (q <= l2) enqu(q,&jter); if (jter < 0) { return 0; } } t = qlast + iq*(p - pnext); qlast = q; q = t; p = pnext; /* check for square; even iter */ if (jter & 1) continue; /* jter is odd:omit square test */ r = (int)sqrt((double)q); /* r = floor(sqrt(q)) */ if (q != r*r) continue; if (qpoint == 0) goto gotit; for (i=0; i
 2010-04-11, 17:12 #9 jasonp Tribal Bullet     Oct 2004 DD116 Posts Actually the pseudocode in Gower's thesis did not look familiar, I probably used McMath's papers as pointed to in the SQUFOF wiki page.
 2010-04-12, 02:14 #10 alpertron     Aug 2002 Buenos Aires, Argentina 135610 Posts By unrolling both loops so there are two copies for each, the code is now 15% faster, according to my measurements.
2010-04-12, 15:22   #11
R.D. Silverman

Nov 2003

746010 Posts

Quote:
 Originally Posted by bsquared I went ahead and did some comparisons. Bob, I had to port your code so that it works outside your lattice siever, in my test routine. If anything it should be faster, since it uses native 64 bit integers rather than your multiple precision routines, when necessary. I also got rid of the calls to pollard rho, and your mini-qs, so that the comparison is between squfof routines only. Here is your modified code: Code: int qqueue[100]; int qpoint; void enqu(int q,int *iter) { qqueue[qpoint] = q; if (++qpoint > 100) *iter = -1; } int squfof_rds(int64 n,int *fac1,int *fac2) { /* start of squfof: factor n as fac1*fac2 faster in FP?????*/ int64 temp, temp1; register int iq,ll,l2,p,pnext,q,qlast,r,s,t,i; int jter,iter; qlast = 1; s = (int)sqrt(n); p = s; temp1 = s * s; temp = n - temp1; /* temp = n - floor(sqrt(n))^2 */ if (temp == 0) { /* Here n is a square */ *fac1 = s; *fac2 = s; return(1); } q = (int)temp; /* q = excess of n over next smaller square */ ll = 1 + 2*(int)sqrt((double)(p+p)); l2 = ll/2; qpoint = 0; /* In the loop below, we need to check if q is a square right before */ /* the end of the loop. Is there a faster way? The current way is */ /* EXPENSIVE! (many branches and double prec sqrt) */ for (jter=0; jter < 800000; jter++) /* I see no way to speed this */ { /* main loop */ iq = (s + p)/q; pnext = iq*q - p; if (q <= ll) { if ((q & 1) == 0) enqu(q/2,&jter); else if (q <= l2) enqu(q,&jter); if (jter < 0) { return 0; } } t = qlast + iq*(p - pnext); qlast = q; q = t; p = pnext; /* check for square; even iter */ if (jter & 1) continue; /* jter is odd:omit square test */ r = (int)sqrt((double)q); /* r = floor(sqrt(q)) */ if (q != r*r) continue; if (qpoint == 0) goto gotit; for (i=0; i
I never bothered to do more than trivial optimizations to the squfof code.

(1) NFS spends so little time inside this routine, that optimizing it will not
have much effect.

(2) MPQS is a lot faster for numbers of this size.

 Similar Threads Thread Thread Starter Forum Replies Last Post Maximus Software 9 2012-01-30 22:41 rdotson Hardware 12 2006-03-26 22:58 Leith Miscellaneous Math 4 2005-01-18 23:14 flava Programming 12 2004-10-26 03:51 Chris Card Factoring 9 2004-09-24 14:19

All times are UTC. The time now is 20:19.

Tue May 11 20:19:11 UTC 2021 up 33 days, 15 hrs, 1 user, load averages: 4.51, 2.93, 2.32