mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Programming

Reply
 
Thread Tools
Old 2005-08-05, 16:50   #1
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

26×113 Posts
Talking Coding Challenges

Hello everybody,

I want to extend my thanks to everyone who participated in my little
"challenge". Some nice speedups were obtained. Kudos to those who
helped . It is unfortunate that the fastest code produced had to
come from going to assembler; this clearly makes it less portable. (but thanks
to xenon for nice work and thanks to Alex for improvements to the C code
)

[Yes, we are all aware that going completely to assembler would yield
faster code]

I am a bit surprised that noone responded to the challenge to improve the
lattice reduction code. It is not by coincidence that I presented
both routines because the algorithms are * almost* identical.

I reworked Arjen's conditional shift & subtract code for the lattice reduction
into something closer to the modinverse code. I give that code here:

/************************************************************************/
/* */
/* New routine to do lattice basis reduction via Euclidean algorithm */
/* */
/************************************************************************/

new_latred(q,r, v1, v2)
int q,r, v1[], v2[];

{ /* start of new_latred */
int a11, a12, a21, a22, t, rem;
double norm_row1, tmp_norm, dot_prod;
int accum1, accum2, t1,t2;
int tmp_a11, tmp_a12;

a11 = q;
a21 = r;
a12 = 0;
a22 = 1;

//printf("| %d %d |\n", a11, a12);
//printf("| %d %d |\n\n", a21, a22);

/* we take advantage of the fact that most partial quotients are small */
/* by doing the division by successive conditional subtraction. */

norm_row1 = (double)a11 * (double)a11 + (double)a12 * (double)a12;

while(1)
{
accum1 = a21;
accum2 = a22;
rem = a11 - a21;
if (rem >= a21)
{
accum1 += a21;
accum2 += a22;
rem -= a21;
if (rem >= a21)
{
accum1 += a21;
accum2 += a22;
rem -= a21;
if (rem >= a21)
{
accum1 += a21;
accum2 += a22;
rem -= a21;
if (rem >= a21)
{
accum1 += a21;
accum2 += a22;
rem -= a21;
if (rem >= a21)
{
accum1 += a21;
accum2 += a22;
rem -= a21;
if (rem >= a21)
{
t = a11/a21;
accum1 = t * a21;
accum2 = t * a22;
}
}
}
}
}
}
tmp_a11 = a11 - accum1; // update row 1
tmp_a12 = a12 - accum2;
tmp_norm = (double)tmp_a11 * (double)tmp_a11 + (double)tmp_a12 * (double)tmp_a12;
if (tmp_norm > norm_row1) break; // we are done

t1 = tmp_a11; // swap rows
t2 = tmp_a12;
a11 = a21;
a12 = a22;
a21 = t1;
a22 = t2;
norm_row1 = tmp_norm;

// printf("| %d %d |\n", a11, a12);
// printf("| %d %d |\n\n", a21, a22);
}

/* Need to do a single Gram-Schmidt row reduction if possible */

dot_prod = (double)a11 * a21 + (double)a12 * a22;
t = round_double(dot_prod/norm_row1);

//printf("G-S t value = %d\n",t);

v1[0] = a11 - t * a21;;
v1[1] = a12 - t * a22;
v2[0] = a21;
v2[1] = a22;

//printf("final result:\n");
//printf("| %d %d |\n", v1[0], v1[1]);
//printf("| %d %d |\n\n", v2[0], v2[1]);

return(0);

} /* end of new_latred */
--------------------------------------------------------------------

I then compared the two routines for 3 different values of p as r
varied over 10000 different, consecutive values. Here are the time
RATIOS for the two routines (old routine/new routine) . I give values
for 20 different r values for each prime.

r: 2.153262
r: 2.536551
r: 2.333333
r: 2.467522
r: 2.265208
r: 2.054390
r: 2.488469
r: 2.240045
r: 2.071930
r: 2.285150
r: 2.020040
r: 2.409091
r: 2.093275
r: 2.271066
r: 2.068036
r: 2.264728
r: 2.444172
r: 2.147155
r: 2.220281
-----------------
r: 1.811325
r: 2.209565
r: 1.966776
r: 2.392782
r: 1.994787
r: 1.534354
r: 2.500306
r: 2.003802
r: 2.159395
r: 1.934477
r: 2.259038
r: 1.589700
r: 2.106504
r: 1.920269
r: 1.893444
r: 2.049665
r: 1.839205
r: 1.827455
r: 1.873463
-----------------
r: 2.010581
r: 2.088218
r: 2.262241
r: 2.306903
r: 2.110598
r: 1.971844
r: 2.464969
r: 1.773949
r: 2.348313
r: 2.564947
r: 2.322762
r: 2.154452
r: 2.184591
r: 2.193857
r: 2.186851
r: 1.842859
r: 2.255534
r: 2.199179
r: 3.877670


===============================================

The new routine is more than twice as fast as the old one!!!!!

I see some additional improvements that might be made.
Rather than just proceeding with the final Gram-Schmidt reduction,
we can (perhaps?) avoid the final floating point division by checking
that t > 1/2 before the division simply by comparing the two norms.
i.e. 2*dot_prod > norm_row1. Whether this check gains anything
depends on how often t is non-zero.

We can also do some more subtracts in the inner loop before succumbing
to doing the division etc.

Would anyone like to take a crack at the improved version?
R.D. Silverman is offline   Reply With Quote
Old 2005-08-05, 18:42   #2
dsouza123
 
dsouza123's Avatar
 
Sep 2002

2×331 Posts
Default

For the lattice basis reduction:
What values of p and range of r ?
How can the results be verified ?

My reasoning behind modinv versus latred:
The mathematics behind modinv routine
was very understandable, familar, easily verifiable.
The code was understandable and short.

The code for lattice basis reduction was harder to follow,
and contained conditional compiler code
and was twice as long as the modinv.

You provided very helpful, detailed information
on the range of the parameters for modinv.

(I did redo the modiv routine in Delphi and MASM32,
but your using C/C++ and xenon handled the x86 Assembly.)

Is there any unoptimized code to look at ?

The 2X+ speed increase is already quite good.

Code in x86 assembly or C or both ?
dsouza123 is offline   Reply With Quote
Old 2005-08-05, 19:14   #3
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

26×113 Posts
Thumbs up

Quote:
Originally Posted by dsouza123
For the lattice basis reduction:
What values of p and range of r ?

<snip>


For the lattice basis reduction:
What values of p and range of r ?


Single precision. Same range of p and r as in modinv.

How can the results be verified ?

The matrix determinant is p. It remains invariant.
Each reduction step is equivalent to multiplying by a
uni-modular marix, and hence the determinant does not change.
The coefficients in the matrix get smaller, until the min is reached,
then further "reductions" result in one row becoming very small
while the other grows big again. The condition on the L2 norm
in the code says when to stop.


My reasoning behind modinv versus latred:
The mathematics behind modinv routine
was very understandable, familar, easily verifiable.


But the math for the lattice reduction is the same: Euclidean Algorithm
[reduction follws with a Gram-Schmidt orthogonalization at the end
if it is useful]

The code was understandable and short.

Thank you for the compliment.

Code in x86 assembly or C or both ?

C is strongly preferred, but of course both are acceptable.

Note: I already tried the test I suggested that avoids the final
floating div at the expense of an extra branch. It does give a slight
speedup.
R.D. Silverman is offline   Reply With Quote
Old 2005-08-05, 20:43   #4
dsouza123
 
dsouza123's Avatar
 
Sep 2002

29616 Posts
Default

The newer code is easier to follow.
I'll try in assembly,
have some implementation ideas but no algorithmic ones.

modinv used a, modulus with modulus the prime
new_latred has q , r
so is r for a and q for modulus ?
dsouza123 is offline   Reply With Quote
Old 2005-08-05, 20:52   #5
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

161008 Posts
Thumbs up

Quote:
Originally Posted by dsouza123
The newer code is easier to follow.
I'll try in assembly,
have some implementation ideas but no algorithmic ones.

modinv used a, modulus with modulus the prime
new_latred has q , r
so is r for a and q for modulus ?
Sort of. p is a prime in both cases. a and r play similar roles
algorithmically, but their meaning is different. r stands for 'root'

It is the root of a polynomial (which itself is irrelevant to the
algorithm) taken mod p.

Arjen's old code basically tried to replace the division operation
that computes t at each iteration with a series of trial shifts and
subtracts... Better variable names would have aided the clarity......
R.D. Silverman is offline   Reply With Quote
Old 2005-08-06, 05:04   #6
xenon
 
Jul 2004

1000112 Posts
Default

Quote:
Originally Posted by R.D. Silverman
Sort of. p is a prime in both cases. a and r play similar roles algorithmically, but their meaning is different. r stands for 'root'
Please give a few p values for benchmarking. Can you confirm that r is guaranteed to be in the interval [1, p-1]? Well, I see that your code uses "q" and we talk much of "p", I assume we mean the same thing.
xenon is offline   Reply With Quote
Old 2005-08-06, 12:52   #7
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

26×113 Posts
Thumbs up

Quote:
Originally Posted by xenon
Please give a few p values for benchmarking. Can you confirm that r is guaranteed to be in the interval [1, p-1]? Well, I see that your code uses "q" and we talk much of "p", I assume we mean the same thing.
'q' arises from its standard usage in NFS literature as 'special-q' prime.
r is indeed in [1,p-1]. q is typically 15 to 27 bits; choose some at random.

BTW, I have already improved the code with:

dot_prod = (double)a11 * (double)a21 + (double)a12 * (double)a22;

if ((dot_prod + dot_prod) < norm_row1)
{
v1[0] = a11;
v1[1] = a12;
v2[0] = a21;
v2[1] = a22;
return(0);
}

t = round_double(dot_prod/norm_row1); etc.

The if test avoids quite a few divisions at low cost.

I tried adding a couple of more conditional subtracts in the main loop,
but saw little difference.

Thanks again for the help!!!!
R.D. Silverman is offline   Reply With Quote
Old 2005-08-06, 13:34   #8
xenon
 
Jul 2004

2316 Posts
Default

The code doesn't work with r=1. Perhaps you mean r in [2, p-1].

Before looking at timings, I want to get the result correct. While modular inverse has one correct answer, lattice reduction can have more than one. You didn't specify exactly what you want. I assume shorter vectors are better. But I found that your code doesn't produce the shortest vectors when r is small. I get shorter vector by computing t = nearest_int(dot_product/norm) in every iteration (this is slow).

About verifying the results, any other method besides computing determinant? Computing determinant isn't "strict" enough to catch all errors. i.e. if r changes, the determinant doesn't change.
xenon is offline   Reply With Quote
Old 2005-08-06, 18:05   #9
dsouza123
 
dsouza123's Avatar
 
Sep 2002

12268 Posts
Default

Quote:
Originally Posted by R.D. Silverman
I then compared the two routines for 3 different values of p as r
varied over 10000 different, consecutive values. Here are the time
RATIOS for the two routines (old routine/new routine) . I give values
for 20 different r values for each prime.
What were the 3 different values of p ( aka q ) along with its starting r ?

If the same input for testing is used by everyone it wil be easier
to verify accuracy and compare the efficiency of improvements.

========================================================
A few results with somewhat random q and r ( each q is prime and r is not).
Are these results from Delphi conversion correct ? (a11 == q a21 == r)

| 134217689 0| a11 a12
| 92342327 1| a21 a22
No need for G-S t
| 92342327 1| v1[0] v1[1]
| 41875362 -92342328| v2[0] v2[1]

| 524309 0| a11 a12
| 10003 1| a21 a22
G-S t value = 2
| 1697 105| v1[0] v1[1]
| 4153 -52| v2[0] v2[1]

The MASM32 version gives completely different answers.

| 524309 0| a11 a12
| 10003 1| a21 a22
G-S t value = 0
| 759 -262| v1[0] v1[1]
| 179 629| v2[0] v2[1]

haven't added the if ((dot_prod + dot_prod) < norm_row1) code
to the MASM32 version yet.
dsouza123 is offline   Reply With Quote
Old 2005-08-06, 20:38   #10
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

26×113 Posts
Default

Quote:
Originally Posted by xenon
The code doesn't work with r=1. Perhaps you mean r in [2, p-1].

Before looking at timings, I want to get the result correct. While modular inverse has one correct answer, lattice reduction can have more than one. You didn't specify exactly what you want. I assume shorter vectors are better. But I found that your code doesn't produce the shortest vectors when r is small. I get shorter vector by computing t = nearest_int(dot_product/norm) in every iteration (this is slow).

About verifying the results, any other method besides computing determinant? Computing determinant isn't "strict" enough to catch all errors. i.e. if r changes, the determinant doesn't change.
If G-S is carried forward completely, it will always produce a basis
with the shortest possible vector. But this comes at the expense of
the other vector being quite large.

To determine if you have a correct final answer consider the original
and final row vectors (say) V1, V2 and V1F , V2F

if e * V1 + f * V2 = e1 * V1F + f1 * V2F has a solution (e1, f1) for
all (e,f), then the final basis is valid.
R.D. Silverman is offline   Reply With Quote
Old 2005-08-07, 00:38   #11
xenon
 
Jul 2004

5·7 Posts
Default

Quote:
Originally Posted by R.D. Silverman
But this comes at the expense of the other vector being quite large.
Code:
Example: q=1299721, r=30

|  30          1   |      |1441     -43276  |
|   1     -43324   |      |  30         1   |

      option 1                  option 2
The second one is shorter. But your code produces the first one. Am I correct?
xenon is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Anyone know enough about coding to do this? jrafanelli Software 2 2018-01-11 15:16
Python Coding Help? kelzo Programming 3 2016-11-27 05:16
Two possible challenges devarajkandadai Miscellaneous Math 1 2013-11-09 23:30
coding midlet for TF starrynte Programming 1 2008-12-30 22:31
The RSA Challenges - RIP BotXXX Factoring 16 2007-05-27 03:37

All times are UTC. The time now is 02:00.

Thu Nov 26 02:00:42 UTC 2020 up 76 days, 23:11, 3 users, load averages: 1.17, 1.21, 1.25

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.