20180510, 04:42  #67  
Jan 2013
2^{3}×7 Posts 
nice paper
Nice paper by Silverman and Wagstaff, I'm reading it. The main contribution is to calculate B2 from given B1, it seems.
Quote:
The paper has an interesting numerical example on page 448: they calculate the probability for a large number to have a factor between say 10^5 and 10^10  it found to be very nearly 0.5. I think it means that the mathematical expectation of the length of the next factor is e*size(D) , but the median is closer to 2*size(D). Finally, regarding the thread from 2016, its nice to know fivemack has implemented a full bayesian analysis, people by all means should use it, but the thread is missing the punchline  we all have been running way too many curves. BTW, it is the ultimate reason why everyone occasionally finds long factors with low B1's  because we should have moved to a higher B1 long before!!! The opposite, to find a short factor at a higher B1, is much rarer. Thus, the table of "Optimal Parameters" should not be spaced at 5 digits, but at say 10, 15, 20, 30, 45, and 70 digits only (based on table 6 and related explanations from the paper). The alternative, which some of you seemed to prefer, is to increase B1 very gradually and update bayesian probabilities after each curve. Or, to keep with tradition, keep the table as it is, with digits at step of 5, and the same B1, but run way less curves, I estimate roughly 60% fewer in the middle of the table where it matters. Fivemack could produce the correct number of curves from the exact bayesian approach, i am sure. Note added: ostensibly this was already done by Silverman and Wagstaff, in table 7, but i dont understand that table yet. Is supposed to represent failure of one curve or of the prescribed number of curves? Last fiddled with by kosta on 20180510 at 05:15 

20180510, 05:27  #68 
"Curtis"
Feb 2005
Riverside, CA
1010010011110_{2} Posts 
Kosta
(edited to remove my first statement, which wasn't very relevant) You also think "expected" means "gonna be that size". Logic like yours is what makes rookie Mersenne hunters think they should skip a big range after a found prime, because the *expected* next one is something like 50% bigger exponent. Both your logic and the primehunter logic miss the nature of the distributions entirely. Primes, like factors, are randomly distributed. No factor from 10^10 to 10^20 does nothing at all to tell us if the next factor is 21 digits or 31 or 41. Fivemack's Bayesian tool does use "no factor from these curves" info to decide what curves to run next, but that tool is far far from skipping from t45 curves directly to t70 curves like you suggested. What the tool does tell us roughly matches one of your suggestions do half (or less) of a "Tlevel" at a given B1 and then move up to the next level. I do this myself, and think it's quite a bit more efficient; but I don't do (much) less ECM effort I just spend more time at large bounds and less at smalltomedium bounds. For example, I might run 300 curves at 8M, 800 at 20M, 2000 at 60M, etc. Edit: the Prime Number Theorem gives us a good estimate for the expected factors from 10^5 to 10^10: 1/6 + 1/7 + 1/8 + 1/9 + 1/10. That's a bit over a half, roughly 0.65 expected factors, because more than one factor is possible in that range. Last fiddled with by VBCurtis on 20180510 at 05:32 
20180510, 05:53  #69 
Jan 2013
2^{3}×7 Posts 
@VBCurtis
If you are running 1/3 of the curves, at it seems you are actually doing, then it is nearly optimal. I have no problem with that. 
20180510, 08:00  #70 
Jan 2013
2^{3}×7 Posts 
What we are missing here is what the economists call a utility function. Someplace in that paper, Silverman and Wagstaff point out that running curves to 11/e probability of success is optimal in the sense of probability of success divided by computational effort. Thats ok at the level of one number.
However, for a planetary scale effort like yoyo@home the cost function is the CPUyears or Teraflopweeks or Megawatthours (these are equivalent up to a multiplicative constant), but the objective function? It is not obvious! You could define it: A) as the total quantity of complete factorizations in the e.g. Cunnnigham tables as a whole, B) total quantity of factors found C) or the total volume of digits of the factors found. I do not know how to analyse A). B) is no good as it gives no credit for longer factors. I'm afraid even C) would favor moving on to another target after running t30t40 digits, i am pretty sure, but would be nice to confirm it by some more precise calculation. 
20180510, 08:35  #71 
"Carlos Pinho"
Oct 2011
Milton Keynes, UK
2^{2}×1,259 Posts 
I don't think yoyo people care about the objective function, they just care about points per task processed, not to minimise the problem.

20180510, 12:47  #72 
Einyen
Dec 2003
Denmark
7·11·43 Posts 
There is also a good ECM discussion here: http://www.mersenneforum.org/showthread.php?t=22609
In post #36#48 I ran some test on constructed numbers with known factors. For example running the expected curves at the 35 digit level found a 30 digit factor 99.9% of the time, so skipping digit levels means if there is a missed lower factor it will be found very quickly at higher B1/B2. I am wondering if the built in increment option in GMPECM might not actually be a good way to find factors optimally, but I do not have the skills to analyze it correctly and find the optimal factor: I f increment B1 by f*sqrt(B1) on each run 
20180510, 15:44  #73  
Jan 2013
2^{3}×7 Posts 
Quote:
The ONLY thing missing is finding a consensus on the global objective function. Note added: hmm, maybe we can postpone a discussion about global optimization, maybe "find me the next factor asap" as a objective which is ok with all of us? https://en.wikipedia.org/wiki/Bayesian_search_theory Last fiddled with by kosta on 20180510 at 16:23 

20180511, 07:24  #74 
Jan 2013
2^{3}×7 Posts 
Mathematicians: 1  Amateurs: 0
OK, i got some private schooling from Silverman, that and a little bit more reading of their paper concentrated my mind. I was in error, so sorry to everyone for the commotion. I made a little Q&A to make it simple, but to fully understand the issues one needs to read the paper of Silverman and Wagstaff. Q1: Is the table titled "Optimal parameters for ECM" actually optimal? A1: Yes, within a few percent. Q2: Is running all the recommended curves to the probability of success 11/e optimal? A2: Yes, it is optimal, IF you run at fixed B1. Q3: Why are the steps of that table at 5 digit increments? A3: The amount of work is increases roughly by a factor of 10 at each increment. That is enough to make sure that any loss of efficiency from a fixed B1 schedule is less than a few percent. Much closer AS WELL AS much wider spacing will be seriously wasteful. Silverman and Wagstaff, in section 5, tells you that even after running full t40 the most likely (and least costly too) place to find a factor is still at 41 digits  not at 45 nor at 70. The expected length of the next factor is e*40 =~ 108 digits. Q4: Is that schedule Kosta proposed with target digits spaced 1.5 times apart, i.e. at 15,20,30,45 and 70 digits better than the default? A4: Definitely not. IT IS wasteful to look for factors of 70 digits before more easy places to find them have been exhausted. (I was totally wrong about that) Q5: I want to gain that last few percent of efficiency. What must I do? A5: You must move from a fixed schedule to one that increased B1 gradually to the optimal value after each curve. New optimal B1 can be calculated after each curve using the bayesian estimation technique as in section 5 of that paper. Q6: Is there an easier way to optimally track the most likely place to find the next factor? A6: Theoretically, YES! The "I f" switch available in GMPECM should accomplish the desired thing. It increases B1 each time by sqrt(B1). Empyrically it does fit the approximately quadratic increase of B1 relative to the number of curves that one can observe in the "Optimal parameters for ECM" table. But that does not correspond to Lenstra's asymptotic complexity analysis, in which both number of curves and B1 go like $L^(1/sqrt(2)$ so i am not sure whats the bottom line on that "I" switch. Guidance from professionals would be very appreciated. 
20180512, 03:58  #75 
Jan 2013
2^{3}·7 Posts 
yoyo@home found a 67digit factor of C243_M127_k24.
Many thanks! Code:
Using B1=260000000, B2=3178559884516, polynomial Dickson(30), sigma=0:11839839710820506480 Step 2 took 328547ms ********** Factor found in step 2: 8801385944588148587153855343468735536920009142571596149280058509737 Found prime factor of 67 digits: 8801385944588148587153855343468735536920009142571596149280058509737 Composite cofactor 94410615373218704797201510125651849712559525623382519147033843966374414757318643595714477201887765034604269728700950434464990003242159016268314282962617716190928317011619566657 has 176 digits 
20180512, 05:22  #76 
Jan 2013
2^{3}·7 Posts 
Anyone want to take that cofactor for NFS?

20180520, 16:40  #77 
Jun 2012
2^{2}×5^{3}×7 Posts 
A bit of topic drift but C167_M31_k33 was factored by NFS@Home. Unfortunately previous ECM to t50 failed to find a single factor.
Code:
prp47 factor: 23282791356882039371488041461863853073033302989 prp50 factor: 49673385765434323132723414845619731999899898099073 prp71 factor: 73682911400163260264427017516156388364001743821334826866765033435391533 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Stop Reporting M127  jinydu  Information & Answers  7  20080907 06:45 