20081118, 19:49  #1  
Jun 2005
lehigh.edu
2^{10} Posts 
Filtering on large NFS jobs, particularly 2^908+1
Quote:
initial filtering bound on large primes (set for memory use), to 720M (for the last part). Here that's Code:
... Fri Oct 3 22:24:35 2008 found 80406920 duplicates and 301728045 unique relations Fri Oct 3 22:24:35 2008 memory use: 2195.0 MB Fri Oct 3 22:24:36 2008 reading rational ideals above 339804160 Fri Oct 3 22:24:36 2008 reading algebraic ideals above 339804160 ... Fri Oct 3 23:11:31 2008 301728045 relations and about 108858095 large ideals ... Sat Oct 4 18:20:06 2008 removing 1161225 relations and 761225 ideals in 400000 cliques Sat Oct 4 18:20:07 2008 commencing inmemory singleton removal Sat Oct 4 18:20:12 2008 begin with 50364234 relations and 31834525 unique ideals Sat Oct 4 18:20:41 2008 reduce to 50341956 relations and 31050679 ideals in 5 passes Sat Oct 4 18:20:41 2008 max relations containing the same ideal: 17 Sat Oct 4 18:21:10 2008 removing 869078 relations and 589859 ideals in 279219 cliques ... Sat Oct 4 18:21:46 2008 reduce to 49463363 relations and 30451234 ideals in 5 passes Sat Oct 4 18:21:46 2008 max relations containing the same ideal: 17 Sat Oct 4 18:21:51 2008 dataset too sparse, retrying Sat Oct 4 18:21:52 2008 reading rational ideals above 720000 Sat Oct 4 18:21:52 2008 reading algebraic ideals above 720000 Sat Oct 4 18:21:52 2008 commencing singleton removal, final pass Sat Oct 4 22:59:33 2008 keeping 145552576 ideals with weight <= 25, new excess is 12962109 Sat Oct 4 23:03:59 2008 memory use: 5161.8 MB Sat Oct 4 23:04:07 2008 commencing inmemory singleton removal Sat Oct 4 23:04:42 2008 begin with 209403509 relations and 145552576 ... the first filter bound reports removing enough relations/ideals to drop the number of relns from 301M to 50M. I can see that dropping the filter bound will raise the number of ideals; so there were 30.4M above 39804160, and then 145552576 ideals above 720M. But I'm not clear on why the number of relns didn't stay at 50M; and instead jumped way up to 209M. That sounds like c. 250M relns removed, then some 150M relns put back. Our current number is 2,908+ C268 on which we're sieving with 32bit large primes on both the algebraic and the rational sides. We got a 25.729M^2 matrix with 324.779M nonduplicate relns; and I've been waiting since the 5^421 factorization report for an update. The number of nondup relations went up to 414.150M and the filter past 720K at 26 hours (at which point my question ...). It's now another 20 hours since then, and the max relns has just dropped from 20 to 19. Meanwhile, another 100M range of q's have finished, and I'm hoping that one more will get us up past Greg's target 450M nondup. Bruce 30M500M first; added 500M800M; with 800M900M during duplicate removal and filtering the new relns. Last fiddled with by bdodson on 20081118 at 19:56 Reason: that's 100M ... 

20081118, 21:36  #2  
Tribal Bullet
Oct 2004
3×1,163 Posts 
Quote:
The goal here is to get a dataset that has the correct amount of excess and for which the maximum ideal weight is about 20, or slightly more; if the max weight is much less, then you are probably hiding usable relation sets from the merge phase (which can do spanningtreebased merges up to weight 20), and the initial parameters should have been set up differently. The clique processing restarts when the max ideal weight is less than 18. Quote:


20081118, 23:34  #3 
(loop (#_fork))
Feb 2006
Cambridge, England
6,323 Posts 
Bruce: would you mind elaborating a little on sieving yields for 2,908+?
I decipher your paragraph as AR 30M500M got 324.779M nonduplicate and a 25.729M^2 matrix AR 30M800M got 414.150M nonduplicate, matrix size not yet known AR 30M900M is expected to not quite reach 450M nonduplicate which suggests either an incredibly law raw rate of relations (200k relations on each side in a millionQ range?) or an absolutely cataclysmic rate of duplication; what's your factorbase size, and what sort of raw yield of relations per millionQrange are you seeing on each side? I'm wondering if you're past the crossover point for gnfslasieve4I16e. I'm generally unhappy with yields below 1M relations per side in a 1MQ range: the furthestup range that I sieved for 5,421 was 339.5M340M with gnfslasieve4I15e total yield: 541558, q=340000019 (0.68690 sec/rel) and 3221,73 R 80M81M is total yield: 1532391, q=81000001 (0.43947 sec/rel) with an already rather worrying duplication proportion of 11.1% among the first 95 million relations. Whilst 2^908+1 S274 is a very big number, I'm surprised that it's big enough for the yields to have dropped back by almost another order of magnitude from an S253; does 4x^6+1 have truly dismal root properties? 
20081119, 02:08  #4  
Jun 2005
lehigh.edu
10000000000_{2} Posts 
Quote:
Quote:
Quote:
Perhaps this is a diagnostic step, to see that there are enough relns, but for these large datasets it seems a bit expensive. Or perhaps I misread your reply? Quote:
Code:
Tue Nov 18 15:44:56 2008 found 25001658 cycles, need 22508330 Tue Nov 18 15:45:24 2008 weight of 22508330 cycles is about 1575923426 (70.02/cycle) progress than I was hoping for; looks like I'll be sieving further than I expected/hoped. On the duplicates, I got Code:
p908m300m400.err:Found 72271509 unique, 2875993 duplicate, and 0 p908m400m500.err:Found 65544866 unique, 1000514 duplicate, and 0 p908m500m600.err:Found 60542443 unique, 1572584 duplicate, and 0 p908m600m700.err:Found 56633905 unique, 1259063 duplicate, and 0 p908m700m800.err:Found 53454220 unique, 1038601 duplicate, and 0 of 414.150324.779 = 89.371M did see quite a bit of a hit from duplicates. This is with rlim = alim: 90000000. Bruce 

20081119, 02:44  #5 
Tribal Bullet
Oct 2004
3×1,163 Posts 
No, you have it correct. In this case the dataset was sparse enough that you could have skipped the second clique removal pass, but there are other factorizations where the max ideal weight was 17 (for example) and the largest relation set after merging also had 17 relations, meaning that you could have gotten away with a smaller matrix if you had a higher max frequency going into the merge phase. Perhaps it would be better to run the merge phase anyway (it only takes a few minutes) and find out if merging could benefit from a rerun of the clique removal.

20081119, 17:59  #6 
(loop (#_fork))
Feb 2006
Cambridge, England
6,323 Posts 
Those yields look really alarmingly low to me, to the point that I wonder if you're using lasieve14e: with lasieve15e I get (in a 1k interval at 500M on the algebraic and rational sides respectively)
total yield: 837, q=500001001 (1.95886 sec/rel) total yield: 763, q=500001001 (2.43182 sec/rel) for the 90M .. 90M+1k range I tried also using 16e, which is much slower per relation but gets more than twice as many relations per Q: total yield: 1492, q=90001003 (1.31825 sec/rel) with 15e total yield: 3163, q=90001003 (1.93834 sec/rel) with 16e total yield: 1748, q=500001001 (2.59062 sec/rel) for 500M .. 500M+1k using 16e Code:
n: 1523730872006476065948363676129458676149278444031460349472566973967334168144122839831601974268424799167249117433966350288390769855219490595979807225811317497929238722439950754301281289518517474506321086318238010563399750224881258591495613213552981037427691447296080033 c6: 4 c0: 1 Y1: 1 Y0: 2854495385411919762116571938898990272765493248 type: snfs skew: 1 rlambda: 2.6 alambda: 2.6 alim: 90000000 rlim: 90000000 lpbr: 32 lpba: 32 mfbr: 64 mfba: 64 Last fiddled with by fivemack on 20081119 at 18:02 
20081119, 18:32  #7  
Oct 2004
Austria
7·353 Posts 
Quote:


20081119, 19:49  #8  
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
21703_{8} Posts 
Quote:
Serge P.S. Sorry, I don't do Windows and cannot answer that. The GGNFS project may later have everything updated, when Chris releases the new version (which was preannounced in the Yahoo group). Last fiddled with by Batalov on 20081119 at 20:19 Reason: patched binaries 

20081119, 20:50  #10  
Jun 2005
lehigh.edu
400_{16} Posts 
Quote:
binaries; just 15e. Root properties should be reflected in the scores(?); we have Code:
size score = 3.818099e13, Murphy alpha = 1.946683, combined = 2.189247e13 purpose, extra work sieving, attempting to keep the matrix from getting outofrange. Greg's .poly says skew 0.7937, but everything else matches. In fact, I did 30M100M with alim:=100M, rlim:=110M (for skew < 1); but that hardly made a dent. If we survive P908, next up would be M919, for which it's clear that we'll need something more like 150M. About Quote:
is "Found 163201489 unique, 7429080 duplicate". That was 174500816 = 170630568+3870248 raw from the three 100M ranges, then another 7429080 duplicates for the entire 300M range, which should have been 163.2M "raw uniq", of which only the 89.371M was new uniq out of 30M800M. I'm not panicking here; just settling in to head towards 1500Morso. By contrast, M857 is seriously broken, according to a report from Batalov with another instance of the "empty col" error, for which it seems that Jason and he have a patch. It's not a corupt file. I'm wondering whether I can get past the difficulty (the heavy cols are too heavy, so that removing them leaves things too sparse?) with some additional oversieving; or wait for an update. I'm not in a hurry, 2p908 has my attention for the moment. Bruce PS  I did some extended checking with the 5p389 sieving, which appears to confirm our previous data that for these snfs's doing both rational and algebraic just gives even more duplicates  adding another small alg range gives way more raw relns, but not after duplicates  it was always better to add a way larger rational range, after allowing for duplicates. 

20081119, 22:43  #11  
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
5×1,831 Posts 
Digression: the short "empty col" story
Quote:
M857 may be far too large for such tweaking. (For reference, my lucky filtering run was something like 8th... but in my case they only took a few hours each. For monstrous projects, like Bruce's or Tom's, it's days.) Jason says he will take a solid (noluckinvolved) fix on this in 1.39. And then M857 could be the first pancake. Serge ___ *meaning, most likely you will never be affected. Specifically, GNFS projects will definitely not be affected (as per Jasonp). 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
How many jobs should I run?  Warlord  Software  12  20131011 22:18 
Advice for large GNFS jobs?  WraithX  Factoring  59  20130730 01:13 
Advice for large SNFS jobs?  ryanp  Factoring  69  20130430 00:28 
doing large NFS jobs on Amazon EC2?  ixfd64  Factoring  3  20120606 08:27 
Jobs  R.D. Silverman  Lounge  25  20091015 05:41 