20070206, 14:12  #1  
Jun 2005
lehigh.edu
1024_{10} Posts 
Distribution of Parallel Linear Algebra (Thorsten's report)
Tucked away inside of Thorsten's report of a new DL record (mod a
160digit prime) is a description of what would appear to be the first public announcement of the latest development in the type of linear algebra used in NFS factorizations. Here's the NMBRTHRY paragraph: Quote:
that each job may not need the complete matrix. Each part seems to have gotten (the usual?) parallel linear algebra; after which there presumably would be some final assembly. One hopes that more explicit details of the distribution will appear. bdodson 

20070206, 14:27  #2  
"Bob Silverman"
Nov 2003
North of Boston
2^{3}×3×311 Posts 
Quote:
Note that this was a DL computation. The matrix must be solved modulo the order of the group. One typically does this by solving it modulo small primes, then pasting the results together with the CRT. Solving the system modulo one of the small primes can be done completely independently of the others. This is a natural parallel partition of the problem. One then brings everything together with the CRT. Such a partition does not seem to exist for solving it over GF(2). I wish it did. 

20070206, 14:52  #3  
Oct 2006
vomit_frame_pointer
2^{3}×3^{2}×5 Posts 
Quote:
So many factorization algorithms depend on solving x^2 = y^2 mod n, which ties us to a characteristic 2 computation in the matrix stage. How about jumping to the congruence x^k = y^k mod n (k large and smooth)? Two huge problems with this, of course: (i) sieving, or whatever approach one uses to collect relations, could be inefficient  but we'd already be at the point where the matrix was much more inefficient anyway; (ii) the linear algebra is now over Z/(k), which ain't a field, but this could be a minor issue. Anyone think it might go in this direction some day? 

20070206, 14:57  #4  
"Bob Silverman"
Nov 2003
North of Boston
16450_{8} Posts 
Quote:
Hint: Ask yourself what are the requirements on the factors of n (and n) for a solution to the congruence x^3 = y^3 mod n to even exist? Further hint: such a congruence does not exist for most n and almost all k. 

20070206, 18:45  #5  
Jun 2005
lehigh.edu
10000000000_{2} Posts 
Quote:
Quote:
section of Thorsten's post on solving logs for small primes, which was "done at the end of the matrix step". I stand by my description; let's discuss again in six months. Bruce 

20070207, 18:06  #6  
Oct 2006
vomit_frame_pointer
2^{3}·3^{2}·5 Posts 
Quote:
For a prime exponent k, x^k = x and y^k = y (mod k), so x^(k1)+x^(k2)y+...+y^(k1) = (x^k  y^k)/(x  y) will be congruent to 1 whenever x and y are incongruent mod k. Hence you can only split off factors which are congruent to 1 modulo k. Yech. 

20070207, 18:13  #7  
"Bob Silverman"
Nov 2003
North of Boston
2^{3}×3×311 Posts 
Quote:
Bingo! The exponent 2 works because all primes (except 2) are 1 mod 2. 

20070207, 21:03  #8  
Jun 2005
lehigh.edu
2^{10} Posts 
Quote:
there's a reference to JouxLercier, Math Comp, Nov., 2002 (from a revision in Oct. 2000). From that article ("improvements in gnfs for DL .."), Thorsten reports 3 "small differences", including an improvement in the linear algebra. I'm still somewhat puzzled by Bob's suggestion that DL gnfs is more inherently parallel than gnfs for integer factorization. One possible reduction uses the prime factors of p1; but Thorsten takes the prime p in the "usual form" with q= (p1)/2 also prime. The linear algebra is then mod q (not 2, indeed). JL describes an initial structured gauss step; which sounds as though it may correspond to merging large primes during filtering (where JL doesn't use large primes). I'm still seeing a central computation on a single large matrix  parallel Lanczos for JL, "Wiedemann" for Thorsten. The result of this "matrix step" does supply logs for some small primes ("good" ones in JL, "many" in Th). We haven't yet gotten to the particular number whose log we're supposed to be computing  the entire matrix step is a "precomputation" (JL) before getting to the particular log; but the postcomputation is then relatively routine, and notso intensive. So that's 14 cpuyears for the matrix, and "a few hours per individual log" (Th). JL report an additional sieving step (after the matrix solution) to write x = A/B; Th has y = (d/n) mod p. In either case, A,B or d,n are supposed to be products of small "good" primes or medium primes ("larger" in Th). The logs of these primes are, in fact, reduced to logs of primes below 2^32. Is this the step Bob wants to distribute? If so, it's just this final postcomputation, not the main matrix "precomputation" (independent/ before) getting to the specific number whose log is being computed. I do know for a fact (I was there to hear Contini's version), that block Lanczos mod 2 did/was adaptable to mod q. So far as I can tell (pending further enlightenment; please), the other stuff being discussed doesn't touch the main step of the computation; the "linear algebra", before getting to the specific log being computed. The central question (to me) remains those 8 jobs, 1224 cpus each in the main matrix computation. I'd like to be able to say that mod q vs mod 2 doesn't matter; and that it's incidental that those 8 jobs are on a single cluster, instead of being separately done in 8 different countries, across several continents. I'll admit that doesn't appear to fit Thorsten's description of a "small difference", unless it's included in the "Wiedemann" that's currently part of the gnfs suit of FrankeBahrKleinjung. Anyone here that's seen current code? bdodson 

20070208, 17:32  #9  
"Bob Silverman"
Nov 2003
North of Boston
1110100101000_{2} Posts 
Quote:
for fairly large q we can do the following instead. Select a set of 32 bit primes p_i such that product(p_i) > q. Reduce the matrix mod each p_i, then solve it mod p_i for each p_i. This can be done totally independently for each p_i. Then find the null space mod q by pasting together the solutions for the p_i via the Chinese Remainder Theorem. This reduces the problem of solving a big matrix modulo a (moderately large) multiprecision integer, to one of solving multiple matrices modulo single precision primes. One solves these smaller coefficient matrices totally independently. Each of the reduced matrices can of course also be solved in parallel. We have changed the problem from solving a matrix with big numbers using many processors in parallel to one of solving multiple smaller number matrices. Each of the smaller matrices is still solved in parallel, but we use fewer processors for solving each of these small number matrices. Processor utilization *decreases* for large numbers of processors working on a parallel linear algebra problem. Breaking a large number of processors into smaller clusters, each cluster working independently should increase per processor utilization. 

20070208, 19:35  #10  
Aug 2004
New Zealand
2^{2}·3·19 Posts 
Quote:
All this was for runs I did with MPQS in the C110 range. 

20070209, 06:34  #11  
Jun 2005
lehigh.edu
2^{10} Posts 
Quote:
ahead of the curve on deciphering/anticipating posts from FrankeBahr Kleinjung, I'm now inclined towards the view that Thorsten's description wasn't referring to a new feature at all. (That would fit the "small difference ... in the linear algebra" in his post.) Rather, we seem to be getting, at long last, new details about how the matrix for RSA200 was done. Anyone care to wager on whether there was a version of these eight "chunks" that we have to thank for RSA200 being factored? On Bob's post, Thanks for a clear description of the classical linear algebra for mod p DL. I recognize the Chinese Remaindering from Peter's ecm/fft output. Does anyone know whether the method described is still applied postLanczos (much less, postWiedemann)? I certainly didn't hear it in Scott's description (which appears to have been the first use of Peter's Lanczos applied to the mod p DL matrix; if I recall, this would have been in 1995, before Crypto95); and it's not referred to in Thorsten's reference to JouxLercier's paper (revised, 2000), which seems to otherwise be fairly complete in it's detail. Bruce 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Has anyone tried linear algebra on a Threadripper yet?  fivemack  Hardware  3  20171003 03:11 
restarting nfs linear algebra  cubaq  YAFU  2  20170402 11:35 
Linear algebra at 600%  CRGreathouse  Msieve  8  20090805 07:25 
Linear algebra crashes  10metreh  Msieve  3  20090202 08:34 
Linear algebra proof  Damian  Math  8  20070212 22:25 