![]() |
![]() |
#1 |
"University student"
May 2021
Beijing, China
22·67 Posts |
![]()
I tried several different bounds here:
https://www.mersenne.ca/prob.php?exp...00&b2=22000000 My result: Assuming that v30.8 is 2.5x faster for wavefront exponents, and 1.1 tests are saved if a factor is found, then the "PrimeNet" bounds in mersenne.ca is almost optimal. However, we can reduce these bounds a bit, since not everyone has large enough memory (>30G) to do P-1 at peak speed. The b1=450000&b2=22000000 in the link above should not be far from optimal. Last fiddled with by Zhangrc on 2021-12-14 at 15:37 |
![]() |
![]() |
![]() |
#2 | |
Oct 2021
U. S. / New York, NY
149 Posts |
![]()
That sounds like a big assumption considering Mr. Woltman's previous comments that wavefront P-1 "will not benefit much" (see post #18 in https://www.mersenneforum.org/showth...861#post593861). Is 2.5x purely a spitball / an extrapolation or did you get that number through empirical testing? If the latter, how much RAM do you have allocated?
Quote:
Incidentally, in the extreme low-end case (not enough RAM allocated for stage 2 to start), B1 seems to be selected such that stage 1 takes almost as long to run as both stages would take together for a user with enough RAM allocated to run them. I've seen e.g. curtisc turn in pre-PRP P-1 results of B2 = B1 = 1.2 M. Unfortunately, this still tends to produce factor chances of < 2%. Last fiddled with by axn on 2021-12-15 at 12:41 Reason: Reference to Post #18 in original thread |
|
![]() |
![]() |
![]() |
#3 | ||
P90 years forever!
Aug 2002
Yeehaw, FL
23·1,019 Posts |
![]() Quote:
Quote:
In other words, if we set tests_saved to either 1.0 or 2.0 we will be doing more P-1 in the future. Why double the amount of P-1 effort today that will almost certainly be redone in the future? My second takeaway is that once 30.8 is fully ready, GIMPS would benefit greatly from owners of machines with lots of RAM switching to P-1. |
||
![]() |
![]() |
![]() |
#4 | |
Oct 2021
U. S. / New York, NY
149 Posts |
![]() Quote:
My vote between 1.0 and 1.1 is the former, perhaps just because it's a whole number and might cause less confusion for people who run Prime95 casually and don't always have a firm grasp on what the work window is printing (this was me for my first few years of GIMPS membership). If Kriesel's analysis is correct (I have no reason to believe it isn't), the empirically optimal number assuming more granularity than tenths would be ~1.0477. At that point, it's pretty much a coin flip whether to round up or down. I'll contend that some of the factors pushing the raw number up from 1.000 are to some degree transitory, so 1.0 should be a better choice for the long term. (The last few people bootlegging FTC LL or doing unproofed PRP will eventually either upgrade or stop testing, for one example. Increasing storage drive sizes should eventually bring up the average proof power, for another.) |
|
![]() |
![]() |
![]() |
#5 | |
1976 Toyota Corona years forever!
"Wayne"
Nov 2006
Saskatchewan, Canada
52·211 Posts |
![]() Quote:
When GPUs started TFing many times faster than PCs the consensus was let's TF a few bits deeper and save many more expensive LL/DC tests. Granted 1 PRP replaces 2 of LL &DC. Why aren't we using the same reasoning here? A P1 that used to take 5 hours now takes 1 (or so). So after the full rollout of 30.8 even if the number of P1ers doesn't change they'll be doing 5 times as many P1s in the same time. Wouldn't they get way ahead of the PRP wavefront? And if so, aren't we better off to stay just ahead and P1 deeper and save more PRPs? Granted deeper P1 in the future with 1TB machines will get more factors but aren't they more beneficial before the PRP is done? Ok now that I've spent 10 minutes one-finger typing on my mobile it just occurred to me that the average PC today won't have enough RAM to do P1 much faster at the PRP wavefront even with 30.8. Oh well, someone can slap me now. |
|
![]() |
![]() |
![]() |
#6 | |
"University student"
May 2021
Beijing, China
22×67 Posts |
![]() Quote:
I allocate 12GB of memory; can't use more because I have only 16GB. Usually it's enough for wavefront exponents, but for 30.8 it's always beneficial to allocate more RAM. Last fiddled with by Zhangrc on 2021-12-15 at 03:53 |
|
![]() |
![]() |
![]() |
#7 | |
Oct 2021
U. S. / New York, NY
9516 Posts |
![]() Quote:
We can assume some work line Pfactor=N/A,1,2,[exponent],-1,[TF depth],1. Loading this into Prime95 30.7 might produce bounds that take five hours to run. We suppose 30.8 could run the same bounds twice as fast on the same machine (for the sake of the example, because it probably can't for wavefront exponents in actuality). Then a 30.8 installation wouldn't calculate those bounds for that work line at all; it would calculate something appropriately larger independent of anything in the line itself needing to be changed. In simpler terms, larger P-1 bounds are always built into any boost to P-1 throughput (assuming Mr. Woltman doesn't make a serious mistake when revising the cost calculator, which there's no reason to believe he would). We could go with your initial assumption that 30.8's P-1 is drastically faster even at the PRP wavefront, and setting tests_saved=5 (for example) because of it still wouldn't accomplish anything besides wasting a load of cycles. Could you pinpoint exactly where? I downloaded the latest 30.8 tarball and Ctrl+F'ed "2.5" in its undoc.txt with no hits. Do you happen to be talking about the default Pm1CostFudge value? If so, that's just (approximately) the factor by which the new stage 2 cost calculator tends to undershoot; it doesn't indicate anything about the speed of the new P-1 in the abstract. |
|
![]() |
![]() |
![]() |
#8 |
Jun 2003
22·32·151 Posts |
![]() |
![]() |
![]() |
![]() |
#9 | ||
1976 Toyota Corona years forever!
"Wayne"
Nov 2006
Saskatchewan, Canada
527510 Posts |
![]() Quote:
I understand the Cost Calculator needs keep up with P-1 improvements. At the risk of oversimplifying let me try with actual numbers. prob.php tells me that suggested P-1 value takes about 15 GhzDays in the 108M ranges A PRP test in that same range takes about 450 GhzDays. That is 30 to 1. Interestingly (with a little rounding) the success rate is about 1/30. So 450 GhzDays of P-1 should do 30 tests and save on average 1 PRP test. --- I hope I didn't mess this up. I guess it assumes 450 GhzDays of each take approximately the same clock time. It may not. --- So if at a point in time, at the leading edge, the available P-1 GhzDays is 1/30 of the PRP GhzDays then P-1 should just keep up to PRP. However, if either due to personal choice or due to the increased speed of 30.8, we find P-1 getting too far ahead of PRP then would it make sense for P-1 to choose bigger B1/B2 and save more PRP tests instead. On the contrary if P-1 falls behind it would choose lower B1/B2. Or is this simply what you mean by: Quote:
|
||
![]() |
![]() |
![]() |
#10 | |
Jun 2003
22×32×151 Posts |
![]() Quote:
First things first. When P-1 stage 2 becomes faster, the software's calculation of optimal P-1 bounds changes. It changes in a way that increases the bounds. So, the amount of time the software spends on P-1 wouldn't necessarily drastically reduce. In fact, paradoxically, it might increase (whether it does or not is a different thing, but in principle this could happen). So we need more data to understand what is the impact of 30.8 on wavefront P-1. Second. The principle of what is the optimal cross over point of "TF vs PRP" or "P-1 vs PRP" is based on relative time it takes to run both types of computation on the _same_ processor. We do 3-4 bits of extra TF on GPU not because GPUs are faster than CPUs, but rather GPUs do better in TF relative to PRP. Like, a GPU might be 100x faster than CPUs on TF, but only 10x faster than CPUs in PRP, so the GPU's cross-over point of TF vs PRP will be a few bits higher than a CPU. If GPU was 100x faster in TF, but also 100x faster in PRP, then we wouldn't do extra TF bits with GPU (no matter how much GPU power we have). Similarly, optimal P-1 bound is / should be independent of how many dedicated P-1 crunchers are there. We assume that, if there were no P-1 work available, they would switch over to PRP (not a 100% accurate assumption, but the only feasible way to model this). If we get a surplus of dedicated P-1 crunchers who refuses to do anything else, c'est la vie. I guess they have the option to manually change the "tests save" and do whatever they wish, but the project shouldn't waste resources by using sub-optimal parameter. After all, the original point of P-1 was to speed up the clearing of exponents. |
|
![]() |
![]() |
![]() |
#11 | |
Oct 2021
U. S. / New York, NY
2258 Posts |
![]() Quote:
For similarly-sized exponents, a given PC can complete X PRP tests in some amount of time, or it can find Y P-1 factors in the same amount of time. Y is obviously dependent upon the P-1 bounds used. If you let it do its thing, Prime95 optimizes to have A) Y > AX (where A is the tests_saved value passed), then B) the highest Y value possible. You seem to suggest that ignoring this optimization and accepting a lower value of Y (or even accepting Y < X) will become a good idea if P-1 gets far ahead of the PRP wavefront, but in that case more benefit would be had from some P-1 users simply switching to primality testing. Since large B1 and B2 values quickly run into diminishing returns with respect to the cycles needed (yes, even with 30.8; "large" is just higher for B2), P-1 past Prime95's optimized bounds will not "save more PRP tests" than just, well, running the full PRPs. You brought up GPU TF earlier, so we can analogously apply your logic there. GPUs are very efficient for TF, but they can run primality tests as well, so there is still an optimization puzzle: GIMPS/GPU72 must select a TF threshold such that, in the time it would take a given GPU to complete one primality test, the same GPU will find more than one factor (on average). For most consumer GPUs, this seems to be ((Prime95 TF threshold) + 4). With that threshold, GPU72 is currently very far ahead of even high-category PRP (I believe they're currently pushing around 120 M or even higher). Does it then make sense that GPU72 should go to ((Prime95 threshold) + 5) at the PRP wavefront even though that wouldn't be optimal*, just because the threshold that is optimal is easily being handled? No; anyone doing GPU TF who wants the PRP wavefront to advance more quickly should simply switch to GPU PRP. * Some recent Nvidia models have such crippled FP64 throughput that this extra level actually can be optimal. I have such a one. However, I don't believe enough TFers own these to recommend the extra level universally. |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Intel: i7-11700 vs. i9-10900 for wavefront P-1 or PRP | techn1ciaN | Hardware | 2 | 2021-11-16 08:06 |
COVID vaccination wavefront | Batalov | Science & Technology | 274 | 2021-10-21 15:26 |
Production (wavefront) P-1 | kriesel | Marin's Mersenne-aries | 23 | 2021-07-03 15:17 |
Received P-1 assignment ahead of wavefront? | ixfd64 | PrimeNet | 1 | 2019-03-06 22:31 |
P-1 & LL wavefront slowed down? | otutusaus | PrimeNet | 159 | 2013-12-17 09:13 |