mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > Conjectures 'R Us

Reply
 
Thread Tools
Old 2008-01-04, 16:45   #23
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

132·61 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
You're welcome for the analysis. The k=5 or multi-k sieve calculation of when to break off pieces is quite a bit different, though- I simplified and rounded down a great deal for this project because of the "Wasted sieve" effect of finding a prime earlier. For Conjectures R us, we should test rather earlier than optimal because of the sizable shrink in project time every time a prime is found and k removed.

Fair warning: remainder of post is off-topic for this forum, just sieving discussion.

The k=5/multi-k analysis is done by a marginal analysis. I did some trial and error testing, comparing the marginal speedup in the sieve from breaking off a range versus the lost factors from not leaving that range in. I learned that while the sieve in principle has speed inversely proportional to the square root of n-range, in practice breaking off a small range produces almost no speedup- at the cost of finding fewer factors per p tested (fewer candidates in sieve means fewer factors per p-range). Extensive experimentation showed me that for the "big sieve" sheep is running, a range should be broken off when factors-found time is equal to 2.2 to 2.3 times the LLR rate for that range!!! This is how I arrived at 250T as optimal for 1250k LLR tests, and why I ignored all of your opinions that I had sieved enough at 60T. I'd done the calculations, and I am quite sure of what I am doing.
The actual calculation consisted of calculating the time to sieve the next 10T block with the lowest range (I used 50k range-pieces) left in, and the time to sieve with it removed. I then noted the expected factors found for each case. If the time saved from sieving with the range removed was not enough to LLR the difference in expected factors, the small range stayed in.

The upshot of this is that when sieving a large range and planning to test the entire sieve, break off pieces when factors-found time is *double* LLR time for a range, until that time is also equal to the LLR time for the 70% point of the entire sieve. At that point, the sieve is complete. For my big sieve, this will happen at 800T or so.

-Curtis
Brilliant! You are truly the sieving GOD! I think you should start doing our team sieves!

It's interesting that you mention ignoring my suggestions on k=5 to go ahead and break off the pieces. I think I remarked that I didn't claim to know the math involved but that we had so many LLRers ready that I thought the timing of when to do so made sense at that point. But I'm sure you were dead on on the efficiency of when to break off the individual pieces. It's just that n=470K-4M is such a HUGE range! I had suspected that you had to be doing some sort of marginal analysis. Like in calculus...the 'instantaneous rate of change' so to speak.

Now that you mention it, it does make a lot of sense to sieve to 2X (or more) the sieve removal rate vs. LLR rate of the highest-n candidate of each breakoff piece. Leaving a small n-range in the sieve adds so little time to the overall sieve time. I did make an assumption here. I assumed the 2X was referring to the highest-n candidate of each range not the 70% point of each range. You just referred to the 'LLR time for a range'. Not that it makes much difference but since we're splitting hairs here...

Technically, this would mean that I should continue sieving the team drive for Sierp base 16 for n=30K-100K just a small amount further (I think). But I am leaving out 8 k's...5 for future individual testing and 3 already reserved, which I had included in the sieve, so it should be very close (I previously thought it was over-sieved for the range). Regardless, I shall start it now (it just finished to P=600G) because I think it will help jump-start the project. Also, if a # of k's go down relatively quickly, it should be sieved more than sufficiently.

If you can and have the time, would you be able to send me in a PM (or even post here) your timing results for your tests and how you arrived at them? If not, no big deal. I'm certainly not questioning the tests. But what I think it will do is help me understand how to design such tests myself in the future for different sieving scenarios.

You know what you need to do? Create some software with various parameters that will spit out the optimum way to sieve any range for any effort. Although the complexity would be horrendous because you'd have to allow for different speeds of machines sieving vs. LLRing, whether it's a conjectures effort, multiple-k sieve, yada, yada, yada... Actually, just creating some software that does the calculation on a very simple single-k sieve (no conjectures effort) for a wide n-range, similar to k=5 at RPS, would be a great start. That would be cool!


Gary
gd_barnes is offline   Reply With Quote
Old 2008-01-05, 01:00   #24
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

3×1,559 Posts
Default

The short answer:
I have zero coding experience, so will not be attempting any such software. However, I do plan eventually to make my explanations concise enough to function as a recipe for optimal sieve-depth.

I said "LLR time for a range" assuming the range was relatively small enough that LLR time is constant for the entire range. Since it's a marginal analysis, in principle we're discussing breaking off a *small* piece. Consider a single k/n pair, for instance... we break off ranges to save overhead of manipulating and breaking constantly. In practice, use the average LLR time for the range, which often is the 50% point (unless there is an FFT jump in the middle of the range- then to be anally accurate you'd find the 50% point before FFT jump, 50% point of the large FFT, and do a weighted average... yea, right).

The 70% number comes from LLR scaling with time-squared, which it does via a linear scaling of iterations and a lumpy-linear scaling due to FFT jumps. Within a single FFT size, LLR timings scale linearly. For large ranges with many FFT jumps, we ignore the lumpiness and use 70%. For smaller ranges, it really doesn't matter, since the results will all be so similar.
-Curtis
VBCurtis is offline   Reply With Quote
Old 2008-01-05, 02:48   #25
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

132×61 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
The short answer:
I have zero coding experience, so will not be attempting any such software. However, I do plan eventually to make my explanations concise enough to function as a recipe for optimal sieve-depth.

I said "LLR time for a range" assuming the range was relatively small enough that LLR time is constant for the entire range. Since it's a marginal analysis, in principle we're discussing breaking off a *small* piece. Consider a single k/n pair, for instance... we break off ranges to save overhead of manipulating and breaking constantly. In practice, use the average LLR time for the range, which often is the 50% point (unless there is an FFT jump in the middle of the range- then to be anally accurate you'd find the 50% point before FFT jump, 50% point of the large FFT, and do a weighted average... yea, right).

The 70% number comes from LLR scaling with time-squared, which it does via a linear scaling of iterations and a lumpy-linear scaling due to FFT jumps. Within a single FFT size, LLR timings scale linearly. For large ranges with many FFT jumps, we ignore the lumpiness and use 70%. For smaller ranges, it really doesn't matter, since the results will all be so similar.
-Curtis

This makes perfect sense now. So when breaking off HUGE pieces like I did for the team drive, I SHOULD use the 70% timing point because there are many fftlen jumps from n=25K-100K for base 16. (In which case the team drive was definitely sieved far enough and too far for the lower range of n=25K-50K.) But if a smaller range such as the small pieces of k=5, then the 50% point makes sense.

And by extension...

If you went to anal level #2 , you could LLR every n=5K range and average all of those. I actually did that a long time ago after you told me about the 70% rule in order to convince myself that it was an accurate one for a relatively large n-range.

And if you went to anal level #3, you could create an automated process that LLR's a single candidate, sieves a very small amount to a new optimum point, LLR's another single candidate, sieves a little again to a new optimum point, etc., etc. such that the overall testing time spent is as little as mathematically possible.

I suspect that HUGE projects such as GIMPS or maybe Riesel Sieve or SOB actually do somethiing similar to 'anal level #3'. It's just that they have others do the LLRing while they do the sieving. Do you know anything about how they decide how to sieve those projects?


Gary
gd_barnes is offline   Reply With Quote
Old 2008-01-05, 03:42   #26
axn
 
axn's Avatar
 
Jun 2003

23×607 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
I suspect that HUGE projects such as GIMPS or maybe Riesel Sieve or SOB actually do somethiing similar to 'anal level #3'. It's just that they have others do the LLRing while they do the sieving. Do you know anything about how they decide how to sieve those projects?
GIMPS doesn't do "sieving". They do trial factoring for individual candidates.

The other projects -- SoB, RS -- they're in it for the long haul. So they don't do any breaking off. They just keep on sieving the whole range forever and ever. Maybe once you guys have figured out an optimal sieving/LLR sequence, you can let them know

Last fiddled with by axn on 2008-01-05 at 03:42
axn is offline   Reply With Quote
Old 2008-01-05, 04:24   #27
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

101000010001012 Posts
Default

Quote:
Originally Posted by axn1 View Post
GIMPS doesn't do "sieving". They do trial factoring for individual candidates.

The other projects -- SoB, RS -- they're in it for the long haul. So they don't do any breaking off. They just keep on sieving the whole range forever and ever. Maybe once you guys have figured out an optimal sieving/LLR sequence, you can let them know
Thanks for info., Axn...interesting and unexpected, to say the least.

The only conclusion that I could draw about GIMPS is that they must have some extremely fast way of trial factoring that is faster than sr1sieve or its equivalent sieving multiple candidates. I always thought that sieving over the largest n-range possible (that you know will be tested) is the most efficient way to go if you're in it for the long run.

And on SoB and RS, I strongly suspect that they are constrained by what people volunteer to do. If 25% of the CPU cycles that people volunteer are for sieving, then there will be some way over-sieved files out there but you certainly can't 'force' sieving people to primality test if they either can't or have poor machines for doing so (or simply don't want to). Or maybe it's the reverse...perhaps only 1-2% of CPU cycles are volunteered for sieving and hence the files aren't sieved quite far enough. But I'm guessing that the people who run the projects 'enourage' people to go one way or another depending on whether they think the files need to be sieved more or less for current testing.

All three are amazing and outstanding efforts but don't quite fit my prime-searching tolerance. I'd go nuts waiting for months or years between primes (or never)!


Gary
gd_barnes is offline   Reply With Quote
Old 2008-01-05, 05:42   #28
kar_bon
 
kar_bon's Avatar
 
Mar 2006
Germany

2×1,439 Posts
Default

Mersenne numbers M(p) can only have factors of the form 2*k*p+1, k > 0 and M(p) can only be prime if p is prime, so the conditions are slightly different and very effective in 'sieving' or better trial factoring.
kar_bon is offline   Reply With Quote
Old 2008-01-05, 05:44   #29
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

624910 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
The only conclusion that I could draw about GIMPS is that they must have some extremely fast way of trial factoring that is faster than sr1sieve or its equivalent sieving multiple candidates. I always thought that sieving over the largest n-range possible (that you know will be tested) is the most efficient way to go if you're in it for the long run.
I've heard that for some reason, trial factoring individual candidates is faster for Mersenne numbers, whereas for Riesel/Proth numbers it's faster to do larger n-ranges. Why, I have no idea.

Quote:
And on SoB and RS, I strongly suspect that they are constrained by what people volunteer to do. If 25% of the CPU cycles that people volunteer are for sieving, then there will be some way over-sieved files out there but you certainly can't 'force' sieving people to primality test if they either can't or have poor machines for doing so (or simply don't want to). Or maybe it's the reverse...perhaps only 1-2% of CPU cycles are volunteered for sieving and hence the files aren't sieved quite far enough. But I'm guessing that the people who run the projects 'enourage' people to go one way or another depending on whether they think the files need to be sieved more or less for current testing.
Actually, on RS, they've got more than double the power on sieving than they do on LLRing! It's mainly because of the BOINC side of the project--there, the sieve app is the default (because they've still got a relatively sporadic but serious bug to work out with the LLR wrapper for BOINC), and the majority of the participants either haven't bothered to switch to LLR, or choose not to. Thus, they are waaaaaaaay past optimal depth for the n-range they are LLRing in right now (and for a few M up after that!), though the sieving is still useful for the higher ranges (their dat file goes up to 20M). The project admins are hoping to get the bug ironed out soon, though, so they can set LLR as the default.

As for SOB, I've never crunched for them, so I don't know the details about them, but from the looks of it they've struck a good balance--the majority of the project's running primality testing (they use their own software, which does Proth tests like LLR does for k*2^n+1 numbers), but there's also some considerable power being thrown on sieving, especially since with them doing their sieving combined with PSP, PSP's collaboration with PrimeGrid affects SOB too. So, I'd say that, as an outsider, they seem to have a pretty good balance between sieving and primality testing. (I don't know any percentage figures, though, like you were discussing.) FYI, the PSP/SOB combined dat goes up to 50M, so the depth at which they will have sieved the whole dat file optimally will definitely be much higher than for Riesel Sieve.
mdettweiler is offline   Reply With Quote
Old 2008-01-05, 19:04   #30
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

3·1,559 Posts
Default

Quote:
Originally Posted by Anonymous View Post
I've heard that for some reason, trial factoring individual candidates is faster for Mersenne numbers, whereas for Riesel/Proth numbers it's faster to do larger n-ranges. Why, I have no idea.
Karsten's answer above is the reason sieving doesn't work for mersennes- each candidate with power p can only have a factor of the form 2*k*p+1. Since all the p's are prime (all mersenne candidates are prime powers), no number can factor more than one candidate. A sieve is thus useless, since each possible factor can only be tested against a single candidate.

The *fast* trial-factoring in that project is due to only iterating over the k-value in the formula above. For p in the 30 million range, you only have to trial-divide by 1 number every 60+ million! That's how they get factor depths up near 70 bits (10^21).

Finally, for a search like this project's, we need at some point to take double-checks into account. I don't know how to estimate the chance of error on a single test, but let's say it's 1/1000. When a first-time test takes ~50 times longer than a double-check range, there is a higher chance to find a prime per time expended via double-check versus first-time testing.

However, this is mitigated by first-time testing being well into top-5000 range by then, so minimizing project length is not the only priority. I'll hit upon this again once you folks are up in the 300-500k range for n's.

-Curtis
VBCurtis is offline   Reply With Quote
Old 2008-01-05, 23:18   #31
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

186916 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
Karsten's answer above is the reason sieving doesn't work for mersennes- each candidate with power p can only have a factor of the form 2*k*p+1. Since all the p's are prime (all mersenne candidates are prime powers), no number can factor more than one candidate. A sieve is thus useless, since each possible factor can only be tested against a single candidate.

The *fast* trial-factoring in that project is due to only iterating over the k-value in the formula above. For p in the 30 million range, you only have to trial-divide by 1 number every 60+ million! That's how they get factor depths up near 70 bits (10^21).

Finally, for a search like this project's, we need at some point to take double-checks into account. I don't know how to estimate the chance of error on a single test, but let's say it's 1/1000. When a first-time test takes ~50 times longer than a double-check range, there is a higher chance to find a prime per time expended via double-check versus first-time testing.

However, this is mitigated by first-time testing being well into top-5000 range by then, so minimizing project length is not the only priority. I'll hit upon this again once you folks are up in the 300-500k range for n's.

-Curtis
Doh! I didn't notice Karsten's post because it appears to have been posted while I was composing my earlier post. Thanks for the detailed answer!
mdettweiler is offline   Reply With Quote
Old 2008-01-06, 00:26   #32
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

132×61 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
Karsten's answer above is the reason sieving doesn't work for mersennes- each candidate with power p can only have a factor of the form 2*k*p+1. Since all the p's are prime (all mersenne candidates are prime powers), no number can factor more than one candidate. A sieve is thus useless, since each possible factor can only be tested against a single candidate.

The *fast* trial-factoring in that project is due to only iterating over the k-value in the formula above. For p in the 30 million range, you only have to trial-divide by 1 number every 60+ million! That's how they get factor depths up near 70 bits (10^21).

Finally, for a search like this project's, we need at some point to take double-checks into account. I don't know how to estimate the chance of error on a single test, but let's say it's 1/1000. When a first-time test takes ~50 times longer than a double-check range, there is a higher chance to find a prime per time expended via double-check versus first-time testing.

However, this is mitigated by first-time testing being well into top-5000 range by then, so minimizing project length is not the only priority. I'll hit upon this again once you folks are up in the 300-500k range for n's.

-Curtis
Curtis to the rescue again! And Karsten to the rescue with the initial explanation. I knew the exponent had to be prime but the limitation on factors is new to me. Very good! Checking to 70 bits...wow! That explains how the chances of finding a Mersenne prime are greater than any other prime form of the same n-range. With only a possible factor every P>60M, it doesn't surprise me that they get such amazing sieve depths.

Yes, we'll have to talk about double checks when we get up a little higher. Every conjectures project at some point has to do double checks on stubborn k's. This is why I'm asking everyone for results files.


Gary
gd_barnes is offline   Reply With Quote
Old 2008-01-22, 03:09   #33
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

1030910 Posts
Default More sieved files available for single-k searching

Anon posted sieved files for Sierp base 6 for k=154797, 157473, and 166753 for n=30K-100K in a thread here. There is now a link to them on the Sierp base 6 reservations page.

I have also posted links to files on the Sierp base 16 reservations page for k=2908, 6663, and 10183 for n=100K-200K.

The Sierp base 16 files were run in conjuntion with team drive #1, which now has links in that thread for files up to n=120K. I will add them up to n=200K as ranges are completed.


Gary
gd_barnes is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Coordination thread for redoing P-1 factoring ixfd64 Lone Mersenne Hunters 80 2021-01-19 17:17
Another sieving gap in Psieve files Batalov Riesel Prime Search 38 2014-03-14 02:59
Sieving reservations and coordination gd_barnes No Prime Left Behind 2 2008-02-16 03:28
Offer your sieved files here kar_bon Riesel Prime Search 8 2008-01-08 04:52
Sieving multiple NewPGen files in a row(How?) jasong Software 18 2005-12-30 20:23

All times are UTC. The time now is 23:08.

Fri Mar 5 23:08:07 UTC 2021 up 92 days, 19:19, 0 users, load averages: 2.09, 1.63, 1.53

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.