mersenneforum.org TF Level policy
 Register FAQ Search Today's Posts Mark Forums Read

2018-12-07, 00:46   #34
GP2

Sep 2003

2·1,289 Posts

Quote:
 Originally Posted by petrw1 Could someone do a database query to see what percent of P-1 factors are within the limits of the current TF level?
That would tell you how many factors TF would find that were missed by P−1, if P−1 was run first. That's the opposite of what we've been discussing.

Like I said earlier, for any given Mersenne factor, you can figure out what B1 and B2 bounds would have found that factor by P−1 testing. Some large factors are easy to find by P−1, some small ones are hard or impossible to find by P−1, it all depends on the individual factor. Formulate a precise question and you might get a precise answer.

2018-12-07, 12:42   #35
Mark Rose

"/X\(‘-‘)/X\"
Jan 2013
Ͳօɾօղէօ

22·5·139 Posts

Quote:
 Originally Posted by axn No. Larger TF leads to smaller P-1 bounds. Don't know why it behaves that way. You can play around with James's calculator: https://www.mersenne.ca/prob.php
I'm probably wrong, but I believe that's because with more TF done P-1 is less likely to find a factor, so the P-1 bounds are adjusted lower to keep the time taken doing P-1 more productive than doing LL/PRP.

 2018-12-07, 16:27 #36 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest D5F16 Posts P-1 bounds determination As far as I can determine, it's not primenet doing the B1, B2, d, e or NRP determination and dictating to the applications, it's most applications optimizing the bounds and other parameters, unless specified by the user, and the applications afterward telling primenet in the results record what parameters were selected and used. The applications, mprime, prime95, CUDAPm1 (but not gpuowl v5.0's PRP-1), unless the user specifies otherwise, try to optimize the probable savings in total computing time for the exponent, based on computed probabilities over combinations of many B1 values and several B2 values, of finding a P-1 factor, for a given prior TF level (number of bits trial factored to) a given number of future primality tests potentially saved, typically 1 or 2, available memory resource limits (system or gpu), and probably the system / gpu's performance characteristics / benchmark results. The mprime, prime95, CUDAPm1 programs try many combinations of B1 and B2 values while seeking that optimal. Or the user dictates the P-1 bounds in the worktodo line (or command line as applicable). From experiments with prime95, with somewhat larger exponents, it appears that optimization calculation occurs also during prime95 Test Status output generation, which shows considerable lag for P-1 work compared to other computation types. It appears there's no caching of previous computation of the optimal P-1 bounds. In my experience prime95 status output without a stack of P-1 work assignment is essentially instantaneous, while this example attached takes 5 seconds, even immediately after a preceding one. With larger P-1 exponents or more P-1 assignments (deeper work caching or more complete dedication of a system to P-1 work than the 1/4 in my example) I think that 5 seconds will increase. prime95.log: Code: Got assignment [aid redacted]: P-1 M89787821 Sending expected completion date for M89787821: Dec 05 2018 ... [Thu Dec 06 09:17:24 2018 - ver 29.4] Sending result to server: UID: Kriesel/emu, M89787821 completed P-1, B1=730000, B2=14782500, E=12, Wg4: 123E2311, AID: redacted PrimeNet success code with additional info: CPU credit is 7.3113 GHz-days. The prime95 worktodo.txt record for a primenet-given P-1 assignment contains no B1 or B2 specification. Code: Pfactor=[aid],1,2,89794319,-1,76,2 George's description of the optimization process is in the P-1 Factoring section of https://www.mersenne.org/various/math.php. It's there to read in the source codes also. CUDAPm1 example: worktodo entry from manual assignment: Code: PFactor=[aid],1,2,292000031,-1,81,2 program output: Code: CUDAPm1 v0.20 ------- DEVICE 1 ------- name GeForce GTX 480 Compatibility 2.0 clockRate (MHz) 1401 memClockRate (MHz) 1848 totalGlobalMem zu totalConstMem zu l2CacheSize 786432 sharedMemPerBlock zu regsPerBlock 32768 warpSize 32 memPitch zu maxThreadsPerBlock 1024 maxThreadsPerMP 1536 multiProcessorCount 15 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 65535,65535,65535 textureAlignment zu deviceOverlap 1 CUDA reports 1426M of 1536M GPU memory free. Index 91 Using threads: norm1 256, mult 128, norm2 32. Using up to 1408M GPU memory. Selected B1=1830000, B2=9607500, 2.39% chance of finding a factor Starting stage 1 P-1, M292000031, B1 = 1830000, B2 = 9607500, fft length = 16384K Aaron Haviland has recently rewritten part of CUDAPm1's bounds selection code in v0.22 https://www.mersenneforum.org/showpo...&postcount=646, building on his earlier 2014 fork. https://www.mersenneforum.org/showpo...&postcount=592 GPUOwL's PRP-1 implementation is a bit different approach, and requires user selection of B1. It defaults to B2=p but allows other B2 to be user specified. See https://www.mersenneforum.org/showth...=22204&page=70, posts 765-767 for Preda's description of gpuowl v5.0 P-1 handling. (See posts 694-706 for his earlier B1-only development; https://www.mersenneforum.org/showth...=22204&page=64.) (Code authors are welcome to weigh in re any errors, omissions, nuances etc.) Attached Thumbnails   Last fiddled with by kriesel on 2018-12-07 at 16:47
 2018-12-08, 06:32 #37 penlu   Jul 2018 22×7 Posts Personally, regarding TF vs. P-1: I find with my hardware that, in terms of maximizing d(probability of getting a factor)/dt, I should not TF to a higher level than around 74 bits. For exponents near 90M, a given one of my cards takes about a half hour to run through 73-74 bits, with success probability ~1.35%. That same card can do a P-1 with about 3.6% probability of success (using whatever bounds the software defaults to) in an hour and a half, thrice the time. Going to 75 would be too much. So if I want to maximize my factors-found per time in a range near 90M that has already been TF'd to 74 bits or more, then I should do P-1 work. In that sense, it's possible 76 bits is too high... on the other hand, my cards have a lot of memory, which probably pushes the TF/P-1 boundary down somewhat. But also d(probability of success under default params given available memory)/d(available memory) is not that big -- I don't know enough now about what the requirements are and how p(success) varies with B1, B2 to say. In terms of optimal work reduction, I think that how many factors TF might find that P-1 would miss is not as important as the per-time probability of finding a factor. I think you could take this as a multi-armed bandit problem where each action is a pair (factoring method, device) that has some time cost and some factor-probability reward. It's somewhat complicated by that failure to find a factor for a given exponent also returns a small amount of information ("no factors under 2^75") which influences the future factor-probability estimate for a given (method, device) on that exponent. (Not that this makes the allocation problem easier, but there is a framework one could use to analyze it at least...) Of course, optimal work reduction isn't the only metric; one might be interested in e.g. maximizing coverage in a given range, in which case the best strategy might be different but probably this modeling approach would still find it. One might also be interested in maximizing the rate of Mersenne prime yield, which might also involve admitting "LL" as an action. Hopefully current "economic cross-over point" analysis matches whatever this would come up with. Last fiddled with by penlu on 2018-12-08 at 07:17
2018-12-09, 22:27   #38
chalsall
If I May

"Chris Halsall"
Sep 2002

23·1,103 Posts

Quote:
 Originally Posted by penlu Hopefully current "economic cross-over point" analysis matches whatever this would come up with.
We try our best (or as my girlfriend likes to remind me constantly, I'm trying...).

You are in a somewhat unique situation as far as you are willing and able to target your "firepower" optimally. Few are as focused as to the optimality of deployment of cycles. Primenet and GPU72 are somewhat constrained as to what they can assign for optimal throughput because each user tends to fetch only a single type of work for each of their kit.

To put on the table, we are currently over-powered with (GPU) TF'ing and (CPU and GPU) P-1'ing; we are years ahead of the LL'ers.

What will come soon is the time to LLTF to 77 "bits". But possibly only after a P-1 run.

Any advise anyone has with regards how to optimally manage this would be most welcomed.

 Similar Threads Thread Thread Starter Forum Replies Last Post davieddy Puzzles 71 2013-12-22 07:26 James Heinrich PrimeNet 11 2011-01-26 20:07 lycorn PrimeNet 16 2008-10-12 22:35 gd_barnes Riesel Prime Search 33 2007-10-14 07:46 gd_barnes Riesel Prime Search 6 2007-10-01 18:52

All times are UTC. The time now is 09:27.

Tue Mar 31 09:27:34 UTC 2020 up 6 days, 7 hrs, 0 users, load averages: 0.87, 1.19, 1.26