20111212, 22:34  #1 
"Ed Hall"
Dec 2009
Adirondack Mtns
2×1,637 Posts 
Using Several Instances of Aliqueit for a large gnfs job
For those who like the automation of Aliqueit and would like to work on larger numbers, but not wait quite so long for completion, I have been playing with a way to use several machines on the same gnfs job via Aliqueit. I'm not sure of the results, though and am asking for some comments on the following:
I recently ran a c135 on a 64bit linux machine, with help from several 32bit linux machines. What I did was to start the 64bit machine and once the poly selection was done and relations were being added, I copied the ggnfs_###... directory and the ###.elf file to the other machines. I then got rid of the spairs files on those other machines. (I did some other things due to the dual thread work by the 64bit machine, but those items will be addressed if and when I create a "howto" about this effort.) Next, I waited for the first completion of q range on the 64bit machine to see what percentage was completed. I used this info to calculate how high the q should get if left alone to 100%. I then used this value to adjust each of the other machines to work above this value and separate from each other, by editing the test.job.T0 and test.job.resume files. Then I started Aliqueit with the e switch on each 32bit machine. As each machine completed a cycle and created test.dat, I removed it to a holding area where I concatenated all of them into spairs.add. I then placed spairs.add into the 64bit ggnfs_##... directory. All seemed to work well, but the 64bit machine needed to go back out several times for 1000000 more relations. It ended up needing to get to 113.9% of the original estimate. Is there a way I can tell if this is due to the added relations being no good, or if this is just due to an underestimate in the 64bit machine's factmsieve.py script? I realize that I can just run gnfslasieve with the same results, but I'm exploring whether this might be an easier way for those that may not want to go the gnfslasieve route... Thanks for any comments... 
20111212, 22:52  #2 
"Ben"
Feb 2007
6314_{8} Posts 
If the relations were "no good" then you would have either had skads of error message during filtering or perhaps much higher than usual duplicate relations (30% or more, say). If you saw neither of these situations, then the relations were probably fine and the 113% is just due to do a low initial guess or something.
As long as you are doing things manually on linux anyway, that doesn't sound any easier than just working directly with gnfslasieve*. Although I suppose it avoids figuring out starting Q and min rels figures. 
20111212, 23:36  #3 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
3·13·233 Posts 

20111213, 03:12  #4 
"Ed Hall"
Dec 2009
Adirondack Mtns
2·1,637 Posts 
Thanks Guys,
I saw no error messages and these are the duplicate removals: Code:
Mon Dec 12 06:30:50 2011 found 3884965 hash collisions in 23001181 relations Mon Dec 12 06:31:29 2011 added 36 free relations Mon Dec 12 06:31:29 2011 commencing duplicate removal, pass 2 Mon Dec 12 06:33:09 2011 found 3653999 duplicates and 19347218 unique I'm going to evaluate the ease of both methods and "maybe" write some steps to take to add machines. That way I'll know where to remind myself how and maybe someone else can find it useful. 
20111213, 03:33  #5 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
3·13·233 Posts 
Seriously speaking, there's also a possibility that you evaluated the necessary Qrange on the admission that the relation yield is a constant. But it isn't, and it is not easy to guesstimate it. Generally, it goes down as Q goes up, but the question is  how fast.
One way (frequently used before launching large projects) is a dense set of spot checking runs (with many starting Qs and a span of 2000 or a 1000), followed by a spline (or better yet with normalization the by number of reported special_q's), and a guesstimate from experience with similar runs of what redundancy is going to be. 
20111213, 04:58  #6  
"Ed Hall"
Dec 2009
Adirondack Mtns
2·1,637 Posts 
Quote:
For example, let's say q is going up by 1M each time and relations are growing at a rate of 5% for each 1M. And, it started at 20M. 100% (in a perfect world) would place the top at 40M. So, let's start machine 2 at 40M. I'm hoping that the relations turned up by machine 2 will offset the 40M top of machine 1 downward more so than the diminishing relations will affect the overall count. The trickier part is figuring out the starting points for machines 3, 4, 5, etc. I don't want any overlap there either, but the further away from the machine 1 range, the less return. Last fiddled with by EdH on 20111213 at 05:00 Reason: removal of a word for clarity... 

20111213, 18:58  #7 
Sep 2010
Portland, OR
7·53 Posts 
Note that factmseive.py already has some support for running multiple threads on each of multiple machines. It doesn't hook directly into aliqueit, but as bsquared said if you're doing a bunch of manual work anyway it may be simpler just to throw the number to factmsieve and then pass the answer back to aliqueit.

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Aliqueit.exe discussion  bsquared  Aliquot Sequences  587  20200814 07:38 
c5 instances are now available  GP2  Cloud Computing  19  20180212 16:23 
Resuming aliqueit  johnadam74  Aliquot Sequences  4  20160328 12:32 
Advice for large GNFS jobs?  WraithX  Factoring  59  20130730 01:13 
2 instances  brandonriffel  Software  3  20070215 16:15 