mersenneforum.org Help factoring M127^k-1
 Register FAQ Search Today's Posts Mark Forums Read

2018-05-10, 04:42   #67
kosta

Jan 2013

23×7 Posts
nice paper

Nice paper by Silverman and Wagstaff, I'm reading it. The main contribution is to calculate B2 from given B1, it seems.

Quote:
 Originally Posted by henryzz Originally Posted by RDS: If one finds a factor D of a much larger number, then the next expected factor (ordered by size) is D^e. i.e. the size of the next factor is e*size(D). This may be derived from the Erdos-Kac theorem by considering order statistics of a normal random variable.
OK, so the exact factor is e, not 2. Close enough for me Sadly, that makes it even more unlikely to find the next factor even after 25 digits.

The paper has an interesting numerical example on page 448: they calculate the probability for a large number to have a factor between say 10^5 and 10^10 - it found to be very nearly 0.5. I think it means that the mathematical expectation of the length of the next factor is e*size(D) , but the median is closer to 2*size(D).

Finally, regarding the thread from 2016, its nice to know fivemack has implemented a full bayesian analysis, people by all means should use it, but the thread is missing the punchline - we all have been running way too many curves. BTW, it is the ultimate reason why everyone occasionally finds long factors with low B1's - because we should have moved to a higher B1 long before!!! The opposite, to find a short factor at a higher B1, is much rarer. Thus, the table of "Optimal Parameters" should not be spaced at 5 digits, but at say
10, 15, 20, 30, 45, and 70 digits only (based on table 6 and related explanations from the paper). The alternative, which some of you seemed to prefer, is to increase B1 very gradually and update bayesian probabilities after each curve. Or, to keep with tradition, keep the table as it is, with digits at step of 5, and the same B1, but run way less curves, I estimate roughly 60% fewer in the middle of the table where it matters. Fivemack could produce the correct number of curves from the exact bayesian approach, i am sure.

Note added: ostensibly this was already done by Silverman and Wagstaff, in table 7, but i dont understand that table yet. Is supposed to represent failure of one curve or of the prescribed number of curves?

Last fiddled with by kosta on 2018-05-10 at 05:15

 2018-05-10, 05:27 #68 VBCurtis     "Curtis" Feb 2005 Riverside, CA 10100100111102 Posts Kosta- (edited to remove my first statement, which wasn't very relevant) You also think "expected" means "gonna be that size". Logic like yours is what makes rookie Mersenne hunters think they should skip a big range after a found prime, because the *expected* next one is something like 50% bigger exponent. Both your logic and the prime-hunter logic miss the nature of the distributions entirely. Primes, like factors, are randomly distributed. No factor from 10^10 to 10^20 does nothing at all to tell us if the next factor is 21 digits or 31 or 41. Fivemack's Bayesian tool does use "no factor from these curves" info to decide what curves to run next, but that tool is far far from skipping from t45 curves directly to t70 curves like you suggested. What the tool does tell us roughly matches one of your suggestions- do half (or less) of a "T-level" at a given B1 and then move up to the next level. I do this myself, and think it's quite a bit more efficient; but I don't do (much) less ECM effort- I just spend more time at large bounds and less at small-to-medium bounds. For example, I might run 300 curves at 8M, 800 at 20M, 2000 at 60M, etc. Edit: the Prime Number Theorem gives us a good estimate for the expected factors from 10^5 to 10^10: 1/6 + 1/7 + 1/8 + 1/9 + 1/10. That's a bit over a half, roughly 0.65 expected factors, because more than one factor is possible in that range. Last fiddled with by VBCurtis on 2018-05-10 at 05:32
 2018-05-10, 05:53 #69 kosta   Jan 2013 23×7 Posts @VBCurtis If you are running 1/3 of the curves, at it seems you are actually doing, then it is nearly optimal. I have no problem with that.
 2018-05-10, 08:00 #70 kosta   Jan 2013 23×7 Posts What we are missing here is what the economists call a utility function. Someplace in that paper, Silverman and Wagstaff point out that running curves to 1-1/e probability of success is optimal in the sense of probability of success divided by computational effort. Thats ok at the level of one number. However, for a planetary scale effort like yoyo@home the cost function is the CPU-years or Teraflop-weeks or Megawatt-hours (these are equivalent up to a multiplicative constant), but the objective function? It is not obvious! You could define it: A) as the total quantity of complete factorizations in the e.g. Cunnnigham tables as a whole, B) total quantity of factors found C) or the total volume of digits of the factors found. I do not know how to analyse A). B) is no good as it gives no credit for longer factors. I'm afraid even C) would favor moving on to another target after running t30-t40 digits, i am pretty sure, but would be nice to confirm it by some more precise calculation.
 2018-05-10, 08:35 #71 pinhodecarlos     "Carlos Pinho" Oct 2011 Milton Keynes, UK 22×1,259 Posts I don't think yoyo people care about the objective function, they just care about points per task processed, not to minimise the problem.
 2018-05-10, 12:47 #72 ATH Einyen     Dec 2003 Denmark 7·11·43 Posts There is also a good ECM discussion here: http://www.mersenneforum.org/showthread.php?t=22609 In post #36-#48 I ran some test on constructed numbers with known factors. For example running the expected curves at the 35 digit level found a 30 digit factor 99.9% of the time, so skipping digit levels means if there is a missed lower factor it will be found very quickly at higher B1/B2. I am wondering if the built in increment option in GMP-ECM might not actually be a good way to find factors optimally, but I do not have the skills to analyze it correctly and find the optimal factor: -I f increment B1 by f*sqrt(B1) on each run
2018-05-10, 15:44   #73
kosta

Jan 2013

23×7 Posts

Quote:
 Originally Posted by ATH There is also a good ECM discussion here: http://www.mersenneforum.org/showthread.php?t=22609
That thread has a very fine post by Dubslow at #35, which is totally in agreement with what I'm trying to say here: the recommended curves at 5 digit steps are too much! But until now noone has been able to analyse the situation completely at a global level. It is necessary to keep track of probabilities of the factors remaining of all lengths, not just the length targeted at that B1. The bayesian approach is sufficient to accomplish this.

The ONLY thing missing is finding a consensus on the global objective function.

Note added: hmm, maybe we can postpone a discussion about global optimization, maybe "find me the next factor asap" as a objective which is ok with all of us?
https://en.wikipedia.org/wiki/Bayesian_search_theory

Last fiddled with by kosta on 2018-05-10 at 16:23

 2018-05-11, 07:24 #74 kosta   Jan 2013 23×7 Posts Mathematicians: 1 - Amateurs: 0 OK, i got some private schooling from Silverman, that and a little bit more reading of their paper concentrated my mind. I was in error, so sorry to everyone for the commotion. I made a little Q&A to make it simple, but to fully understand the issues one needs to read the paper of Silverman and Wagstaff. Q1: Is the table titled "Optimal parameters for ECM" actually optimal? A1: Yes, within a few percent. Q2: Is running all the recommended curves to the probability of success 1-1/e optimal? A2: Yes, it is optimal, IF you run at fixed B1. Q3: Why are the steps of that table at 5 digit increments? A3: The amount of work is increases roughly by a factor of 10 at each increment. That is enough to make sure that any loss of efficiency from a fixed B1 schedule is less than a few percent. Much closer AS WELL AS much wider spacing will be seriously wasteful. Silverman and Wagstaff, in section 5, tells you that even after running full t40 the most likely (and least costly too) place to find a factor is still at 41 digits - not at 45 nor at 70. The expected length of the next factor is e*40 =~ 108 digits. Q4: Is that schedule Kosta proposed with target digits spaced 1.5 times apart, i.e. at 15,20,30,45 and 70 digits better than the default? A4: Definitely not. IT IS wasteful to look for factors of 70 digits before more easy places to find them have been exhausted. (I was totally wrong about that) Q5: I want to gain that last few percent of efficiency. What must I do? A5: You must move from a fixed schedule to one that increased B1 gradually to the optimal value after each curve. New optimal B1 can be calculated after each curve using the bayesian estimation technique as in section 5 of that paper. Q6: Is there an easier way to optimally track the most likely place to find the next factor? A6: Theoretically, YES! The "-I f" switch available in GMP-ECM should accomplish the desired thing. It increases B1 each time by sqrt(B1). Empyrically it does fit the approximately quadratic increase of B1 relative to the number of curves that one can observe in the "Optimal parameters for ECM" table. But that does not correspond to Lenstra's asymptotic complexity analysis, in which both number of curves and B1 go like $L^(1/sqrt(2)$ so i am not sure whats the bottom line on that "-I" switch. Guidance from professionals would be very appreciated.
 2018-05-12, 03:58 #75 kosta   Jan 2013 23·7 Posts yoyo@home found a 67-digit factor of C243_M127_k24. Many thanks! Code: Using B1=260000000, B2=3178559884516, polynomial Dickson(30), sigma=0:11839839710820506480 Step 2 took 328547ms ********** Factor found in step 2: 8801385944588148587153855343468735536920009142571596149280058509737 Found prime factor of 67 digits: 8801385944588148587153855343468735536920009142571596149280058509737 Composite cofactor 94410615373218704797201510125651849712559525623382519147033843966374414757318643595714477201887765034604269728700950434464990003242159016268314282962617716190928317011619566657 has 176 digits
 2018-05-12, 05:22 #76 kosta   Jan 2013 23·7 Posts Anyone want to take that cofactor for NFS?
 2018-05-20, 16:40 #77 swellman     Jun 2012 22×53×7 Posts A bit of topic drift but C167_M31_k33 was factored by NFS@Home. Unfortunately previous ECM to t50 failed to find a single factor. Code: prp47 factor: 23282791356882039371488041461863853073033302989 prp50 factor: 49673385765434323132723414845619731999899898099073 prp71 factor: 73682911400163260264427017516156388364001743821334826866765033435391533

All times are UTC. The time now is 14:50.

Thu May 19 14:50:29 UTC 2022 up 35 days, 12:51, 0 users, load averages: 1.29, 1.52, 1.57