![]() |
![]() |
#1 |
"Bob Silverman"
Nov 2003
North of Boston
750610 Posts |
![]()
Hello everybody,
I want to extend my thanks to everyone who participated in my little "challenge". Some nice speedups were obtained. Kudos to those who helped ![]() come from going to assembler; this clearly makes it less portable. (but thanks to xenon for nice work and thanks to Alex for improvements to the C code ![]() ![]() [Yes, we are all aware that going completely to assembler would yield faster code] I am a bit surprised that noone responded to the challenge to improve the lattice reduction code. ![]() both routines because the algorithms are * almost* identical. I reworked Arjen's conditional shift & subtract code for the lattice reduction into something closer to the modinverse code. I give that code here: /************************************************************************/ /* */ /* New routine to do lattice basis reduction via Euclidean algorithm */ /* */ /************************************************************************/ new_latred(q,r, v1, v2) int q,r, v1[], v2[]; { /* start of new_latred */ int a11, a12, a21, a22, t, rem; double norm_row1, tmp_norm, dot_prod; int accum1, accum2, t1,t2; int tmp_a11, tmp_a12; a11 = q; a21 = r; a12 = 0; a22 = 1; //printf("| %d %d |\n", a11, a12); //printf("| %d %d |\n\n", a21, a22); /* we take advantage of the fact that most partial quotients are small */ /* by doing the division by successive conditional subtraction. */ norm_row1 = (double)a11 * (double)a11 + (double)a12 * (double)a12; while(1) { accum1 = a21; accum2 = a22; rem = a11 - a21; if (rem >= a21) { accum1 += a21; accum2 += a22; rem -= a21; if (rem >= a21) { accum1 += a21; accum2 += a22; rem -= a21; if (rem >= a21) { accum1 += a21; accum2 += a22; rem -= a21; if (rem >= a21) { accum1 += a21; accum2 += a22; rem -= a21; if (rem >= a21) { accum1 += a21; accum2 += a22; rem -= a21; if (rem >= a21) { t = a11/a21; accum1 = t * a21; accum2 = t * a22; } } } } } } tmp_a11 = a11 - accum1; // update row 1 tmp_a12 = a12 - accum2; tmp_norm = (double)tmp_a11 * (double)tmp_a11 + (double)tmp_a12 * (double)tmp_a12; if (tmp_norm > norm_row1) break; // we are done t1 = tmp_a11; // swap rows t2 = tmp_a12; a11 = a21; a12 = a22; a21 = t1; a22 = t2; norm_row1 = tmp_norm; // printf("| %d %d |\n", a11, a12); // printf("| %d %d |\n\n", a21, a22); } /* Need to do a single Gram-Schmidt row reduction if possible */ dot_prod = (double)a11 * a21 + (double)a12 * a22; t = round_double(dot_prod/norm_row1); //printf("G-S t value = %d\n",t); v1[0] = a11 - t * a21;; v1[1] = a12 - t * a22; v2[0] = a21; v2[1] = a22; //printf("final result:\n"); //printf("| %d %d |\n", v1[0], v1[1]); //printf("| %d %d |\n\n", v2[0], v2[1]); return(0); } /* end of new_latred */ -------------------------------------------------------------------- I then compared the two routines for 3 different values of p as r varied over 10000 different, consecutive values. Here are the time RATIOS for the two routines (old routine/new routine) . I give values for 20 different r values for each prime. r: 2.153262 r: 2.536551 r: 2.333333 r: 2.467522 r: 2.265208 r: 2.054390 r: 2.488469 r: 2.240045 r: 2.071930 r: 2.285150 r: 2.020040 r: 2.409091 r: 2.093275 r: 2.271066 r: 2.068036 r: 2.264728 r: 2.444172 r: 2.147155 r: 2.220281 ----------------- r: 1.811325 r: 2.209565 r: 1.966776 r: 2.392782 r: 1.994787 r: 1.534354 r: 2.500306 r: 2.003802 r: 2.159395 r: 1.934477 r: 2.259038 r: 1.589700 r: 2.106504 r: 1.920269 r: 1.893444 r: 2.049665 r: 1.839205 r: 1.827455 r: 1.873463 ----------------- r: 2.010581 r: 2.088218 r: 2.262241 r: 2.306903 r: 2.110598 r: 1.971844 r: 2.464969 r: 1.773949 r: 2.348313 r: 2.564947 r: 2.322762 r: 2.154452 r: 2.184591 r: 2.193857 r: 2.186851 r: 1.842859 r: 2.255534 r: 2.199179 r: 3.877670 =============================================== The new routine is more than twice as fast as the old one!!!!! ![]() ![]() I see some additional improvements that might be made. Rather than just proceeding with the final Gram-Schmidt reduction, we can (perhaps?) avoid the final floating point division by checking that t > 1/2 before the division simply by comparing the two norms. i.e. 2*dot_prod > norm_row1. Whether this check gains anything depends on how often t is non-zero. We can also do some more subtracts in the inner loop before succumbing to doing the division etc. Would anyone like to take a crack at the improved version? ![]() ![]() |
![]() |
![]() |
![]() |
#2 |
Sep 2002
2×331 Posts |
![]()
For the lattice basis reduction:
What values of p and range of r ? How can the results be verified ? My reasoning behind modinv versus latred: The mathematics behind modinv routine was very understandable, familar, easily verifiable. The code was understandable and short. The code for lattice basis reduction was harder to follow, and contained conditional compiler code and was twice as long as the modinv. You provided very helpful, detailed information on the range of the parameters for modinv. (I did redo the modiv routine in Delphi and MASM32, but your using C/C++ and xenon handled the x86 Assembly.) Is there any unoptimized code to look at ? The 2X+ speed increase is already quite good. Code in x86 assembly or C or both ? |
![]() |
![]() |
![]() |
#3 | |
"Bob Silverman"
Nov 2003
North of Boston
2×33×139 Posts |
![]() Quote:
For the lattice basis reduction: What values of p and range of r ? Single precision. Same range of p and r as in modinv. How can the results be verified ? The matrix determinant is p. It remains invariant. Each reduction step is equivalent to multiplying by a uni-modular marix, and hence the determinant does not change. The coefficients in the matrix get smaller, until the min is reached, then further "reductions" result in one row becoming very small while the other grows big again. The condition on the L2 norm in the code says when to stop. My reasoning behind modinv versus latred: The mathematics behind modinv routine was very understandable, familar, easily verifiable. But the math for the lattice reduction is the same: Euclidean Algorithm [reduction follws with a Gram-Schmidt orthogonalization at the end if it is useful] The code was understandable and short. Thank you for the compliment. Code in x86 assembly or C or both ? C is strongly preferred, but of course both are acceptable. Note: I already tried the test I suggested that avoids the final floating div at the expense of an extra branch. It does give a slight speedup. |
|
![]() |
![]() |
![]() |
#4 |
Sep 2002
2×331 Posts |
![]()
The newer code is easier to follow.
I'll try in assembly, have some implementation ideas but no algorithmic ones. modinv used a, modulus with modulus the prime new_latred has q , r so is r for a and q for modulus ? |
![]() |
![]() |
![]() |
#5 | |
"Bob Silverman"
Nov 2003
North of Boston
2·33·139 Posts |
![]() Quote:
algorithmically, but their meaning is different. r stands for 'root' It is the root of a polynomial (which itself is irrelevant to the algorithm) taken mod p. Arjen's old code basically tried to replace the division operation that computes t at each iteration with a series of trial shifts and subtracts... Better variable names would have aided the clarity...... |
|
![]() |
![]() |
![]() |
#6 | |
Jul 2004
5×7 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#7 | |
"Bob Silverman"
Nov 2003
North of Boston
165228 Posts |
![]() Quote:
r is indeed in [1,p-1]. q is typically 15 to 27 bits; choose some at random. BTW, I have already improved the code with: dot_prod = (double)a11 * (double)a21 + (double)a12 * (double)a22; if ((dot_prod + dot_prod) < norm_row1) { v1[0] = a11; v1[1] = a12; v2[0] = a21; v2[1] = a22; return(0); } t = round_double(dot_prod/norm_row1); etc. The if test avoids quite a few divisions at low cost. I tried adding a couple of more conditional subtracts in the main loop, but saw little difference. Thanks again for the help!!!! |
|
![]() |
![]() |
![]() |
#8 |
Jul 2004
3510 Posts |
![]()
The code doesn't work with r=1. Perhaps you mean r in [2, p-1].
Before looking at timings, I want to get the result correct. While modular inverse has one correct answer, lattice reduction can have more than one. You didn't specify exactly what you want. I assume shorter vectors are better. But I found that your code doesn't produce the shortest vectors when r is small. I get shorter vector by computing t = nearest_int(dot_product/norm) in every iteration (this is slow). About verifying the results, any other method besides computing determinant? Computing determinant isn't "strict" enough to catch all errors. i.e. if r changes, the determinant doesn't change. |
![]() |
![]() |
![]() |
#9 | |
Sep 2002
2·331 Posts |
![]() Quote:
If the same input for testing is used by everyone it wil be easier to verify accuracy and compare the efficiency of improvements. ======================================================== A few results with somewhat random q and r ( each q is prime and r is not). Are these results from Delphi conversion correct ? (a11 == q a21 == r) | 134217689 0| a11 a12 | 92342327 1| a21 a22 No need for G-S t | 92342327 1| v1[0] v1[1] | 41875362 -92342328| v2[0] v2[1] | 524309 0| a11 a12 | 10003 1| a21 a22 G-S t value = 2 | 1697 105| v1[0] v1[1] | 4153 -52| v2[0] v2[1] The MASM32 version gives completely different answers. | 524309 0| a11 a12 | 10003 1| a21 a22 G-S t value = 0 | 759 -262| v1[0] v1[1] | 179 629| v2[0] v2[1] haven't added the if ((dot_prod + dot_prod) < norm_row1) code to the MASM32 version yet. |
|
![]() |
![]() |
![]() |
#10 | |
"Bob Silverman"
Nov 2003
North of Boston
2·33·139 Posts |
![]() Quote:
with the shortest possible vector. But this comes at the expense of the other vector being quite large. To determine if you have a correct final answer consider the original and final row vectors (say) V1, V2 and V1F , V2F if e * V1 + f * V2 = e1 * V1F + f1 * V2F has a solution (e1, f1) for all (e,f), then the final basis is valid. |
|
![]() |
![]() |
![]() |
#11 | |
Jul 2004
1000112 Posts |
![]() Quote:
Code:
Example: q=1299721, r=30 | 30 1 | |1441 -43276 | | 1 -43324 | | 30 1 | option 1 option 2 |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Anyone know enough about coding to do this? | jrafanelli | Software | 2 | 2018-01-11 15:16 |
Python Coding Help? | kelzo | Programming | 3 | 2016-11-27 05:16 |
Two possible challenges | devarajkandadai | Miscellaneous Math | 1 | 2013-11-09 23:30 |
coding midlet for TF | starrynte | Programming | 1 | 2008-12-30 22:31 |
The RSA Challenges - RIP | BotXXX | Factoring | 16 | 2007-05-27 03:37 |