mersenneforum.org I want to factorize 2^1277-1
 Register FAQ Search Today's Posts Mark Forums Read

2018-04-24, 17:08   #23
VBCurtis

"Curtis"
Feb 2005
Riverside, CA

11×461 Posts

Quote:
 Originally Posted by chris2be8 I *think* that a multithreaded siever could share the factor base among all threads instead of each process having it's own copy. Whether that cuts memory use enough is another matter. Chris
This is indeed what CADO does. A single 4-threaded instance of the I=17 siever will use the same memory as 1-threaded; I haven't tried more than 4 threads.

I had in mind that large-memory folks would use I=17 on the smallish Q values, while others would use I = 16 over a much larger range (say, Q=2G to Q=12G). Since this is SNFS and a poly is obvious, I may test yields this weekend for the sake of this thread and my curiosity.

 2018-04-25, 09:42 #24 fivemack (loop (#_fork))     Feb 2006 Cambridge, England 22×32×179 Posts Because I am a curious person: With 33-bit large primes (3 on rational side), 16e siever, and the polynomial 8x^7-1, the yield is 29 relations in 2031 seconds for Q=4e8 .. 4e8+1e3 Which is not zero, but is obviously not enough; suggests that a few thousand CPU-years might actually suffice. (the septic is better than the sextic here, though the rational side is still generally a good deal larger than the algebraic; I haven't tried degree-8 algebraic) I will leave further investigation to VBCurtis this weekend Last fiddled with by fivemack on 2018-04-25 at 09:42
 2018-04-26, 06:22 #25 VBCurtis     "Curtis" Feb 2005 Riverside, CA 11·461 Posts I got curious, too, so I tested the same poly as fivemack: 16f siever, prime Q only, lambda = 3.8, alim=rlim=536M, 39-bit LP: mfbr/a = 112 (3 large primes on each side) Q=400M 4062 relations 1.64 sec/rel Q=4000M 1757 relations 2.15 sec/rel At an average 2.5 sec/relation, 40G relations would take 100 gigathreadseconds, or roughly 3100 CPU-years. Sieve performance at large Q is unknown in CADO, and a factor of two or three is fairly likely. Still, 10k CPU-years for sieving should be a generous estimate. Changing mfba to 78 and alambda to 2.8 cut time to 1.0 sec/rel, but yield was 2000 or so at Q=400M. Over the 16f siever's Q-range of 50M to 4.3G (I believe Q is limited to 32 bits), an average yield that may be close to 3.0 means we can get roughly 12e9 raw relations, while 40e9 are likely needed. CADO has been used for Q up to 10G (see https://eprint.iacr.org/2012/369.pdf, where RSA-704 was factored with I=15!!!!). I'll test some more mbfr/mfba and alim/rlim settings this weekend, and report back. If anyone has advice for test-sieving with the CADO siever, please let me know! Edit to add: 16f memory use was around 1.7GB for one thread. Last fiddled with by VBCurtis on 2018-04-26 at 06:39
 2018-04-26, 06:55 #26 henryzz Just call me Henry     "David" Sep 2007 Cambridge (GMT/BST) 2·2,969 Posts It sounds to me like it would be necessary to use CADO for this factorization with either I >16 or very high Qs. Did the max q only increase from 2^31 to 2^32 between lasieve4 and lasieve5? Is it higher when composite q are considered?
 2018-04-26, 18:22 #27 chris2be8     Sep 2009 2×7×157 Posts Does 16f support -j 16 ? And if it does how much memory does it use and how much does it raise yield? Chris
 2018-04-27, 05:09 #28 VBCurtis     "Curtis" Feb 2005 Riverside, CA 11×461 Posts I tried that in past experiments, and -J 16 increases yield 40%, as expected. However, it is quite unstable; something like 25% of the test-sieves crashed (dQ=1000 per test). One candidate had no crashes, another had many, so it's possible that it is stable with certain polys. This poly did not crash in a 30-min test, and memory use was the same as 16f without the -J 16 flag. So, it's worth testing further. sec/rel for the short time I tested was slightly better than regular 16f. Edit: Naturally, I post after one test, and the next test with -J 16 crashes immediately. Last fiddled with by VBCurtis on 2018-04-27 at 05:15
 2018-05-06, 03:04 #29 VBCurtis     "Curtis" Feb 2005 Riverside, CA 11×461 Posts I did some more test-sieving, enough to figure out that lambda isn't quite what I thought it was: it seems that lambda is the multiple of the lim (in bits) that a cofactor is allowed to be, rather than a multiple of the LP bound as I thought it was. As evidence, I tested 39LP with MFB of 112, 113, 114 all with lambda = 3.8, and all were within 1% of each other for time and relation count was identical (114 was 4063 rels, 112 was 4062). When I bumped lambda to 4.0 with mfb=114, time per relation doubled while yield improved by 9-10%. I also tried 4LP on the r side, by setting mfbr to 125 and lambda to 4.2; yield increased a fair bit, but at 10 seconds per relation (cough) vs 2.5-3 for most of my other tests. I also tried mfbr at 135 or so, too high for the siever (a single special Q took over 30 minutes, with about 60 relations in the .dat when I killed the process, and lots of errors about cofactors). rlim=800m, alim=1072M, 39LP, lambda 3.9 on both sides, mfbr=113, mfba = 112 produced 2.52 sec/rel and 5103 relations for a 1kQ test at 400M. This is 25% better yield in 50% more sec/rel than my original estimate a few posts ago. I ran out of curiosity before testing a range of Q values.
 2018-05-14, 01:09 #30 kosta   Jan 2013 1110002 Posts Octic is not supported in cado.
 2021-08-28, 17:43 #31 firejuggler     "Vincent" Apr 2010 Over the rainbow 2,683 Posts I don't know much about gpu-ecm but does this https://www.mersenneforum.org/showthread.php?t=27103 could help the ecm phase?
 2021-08-28, 18:14 #32 Batalov     "Serge" Mar 2008 Phi(4,2^7658614+1)/2 59×163 Posts Irrelevant fodder moved to "I am experimenting with heuristic sieves."
2021-08-28, 20:52   #33
SethTro

"Seth"
Apr 2019

6018 Posts

Quote:
 Originally Posted by firejuggler I don't know much about gpu-ecm but does this https://www.mersenneforum.org/showthread.php?t=27103 could help the ecm phase?
Because VBCurtis was curious and having fun I also had some fun.

CGBN based ECM is possible helpful but only reduces overall ecm time by 1.5-2x so still many GPU/CPUs years left for t80.

Using the advice from paul's params.html t80 is 265K curves @ B1=25e9 for t80. Assuming we want to do 30% (50% of a t80 - 20% already done). that's 80K curves.

Using a tuned version for 1536 BITS my GPU (1080ti). processes 1792 curves @ B1=1e5 in 17.3 seconds.
Using a tuned version for 1312 BITS takes 16.4 seconds.
Using a tuned version for 1408 BITS takes 16.0 seconds.
Using a tuned version for 1280 BITS takes 12.0 seconds. (CGBN seems to need 4 carry bits, but maybe I could squeeze it down to 2)

Scaling is very linear so I project CGBN based ECM could theoretically finish the 80K curves in 80000/1792 * 25e9/1e5 * 12 seconds = 4.24 1080ti GPU-years!

Stage 2 on my system uses B2=2.5e15, I timed 2e5e12 at 362 seconds and used 4.2x per 10 larger B2, so an estimated 26788 seconds. Naively this is 80,000 * 26788 = 68 CPU core years.

This is a 16 CPU/GPU ratio so it's conceivable that a single machine could finish 30% of t80 in 4 years (or <2 years with a top end current gen GPU).

I heard 7% chance of a factor. 30% of 7% is 2.1% chance of factoring. At ~400 watts * 4 years = $1600 or an amortized$75,000 to factor.

Last fiddled with by SethTro on 2021-08-28 at 20:53

 Similar Threads Thread Thread Starter Forum Replies Last Post Godzilla Miscellaneous Math 28 2017-10-31 18:14 aaa120 Factoring 14 2008-12-07 13:14

All times are UTC. The time now is 21:57.

Fri Dec 3 21:57:57 UTC 2021 up 133 days, 16:26, 0 users, load averages: 1.94, 1.55, 1.41