Go Back > Great Internet Mersenne Prime Search > PrimeNet > GPU to 72

Thread Tools
Old 2018-12-07, 00:46   #34
GP2's Avatar
Sep 2003

2·1,289 Posts

Originally Posted by petrw1 View Post
Could someone do a database query to see what percent of P-1 factors are within the limits of the current TF level?
That would tell you how many factors TF would find that were missed by P−1, if P−1 was run first. That's the opposite of what we've been discussing.

Like I said earlier, for any given Mersenne factor, you can figure out what B1 and B2 bounds would have found that factor by P−1 testing. Some large factors are easy to find by P−1, some small ones are hard or impossible to find by P−1, it all depends on the individual factor. Formulate a precise question and you might get a precise answer.
GP2 is offline   Reply With Quote
Old 2018-12-07, 12:42   #35
Mark Rose
Mark Rose's Avatar
Jan 2013

22·5·139 Posts

Originally Posted by axn View Post
No. Larger TF leads to smaller P-1 bounds. Don't know why it behaves that way. You can play around with James's calculator:
I'm probably wrong, but I believe that's because with more TF done P-1 is less likely to find a factor, so the P-1 bounds are adjusted lower to keep the time taken doing P-1 more productive than doing LL/PRP.
Mark Rose is offline   Reply With Quote
Old 2018-12-07, 16:27   #36
kriesel's Avatar
Mar 2017
US midwest

D5F16 Posts
Default P-1 bounds determination

As far as I can determine, it's not primenet doing the B1, B2, d, e or NRP determination and dictating to the applications, it's most applications optimizing the bounds and other parameters, unless specified by the user, and the applications afterward telling primenet in the results record what parameters were selected and used.

The applications, mprime, prime95, CUDAPm1 (but not gpuowl v5.0's PRP-1), unless the user specifies otherwise, try to optimize the probable savings in total computing time for the exponent, based on computed probabilities over combinations of many B1 values and several B2 values, of finding a P-1 factor, for
  • a given prior TF level (number of bits trial factored to)
  • a given number of future primality tests potentially saved, typically 1 or 2,
  • available memory resource limits (system or gpu),
  • and probably the system / gpu's performance characteristics / benchmark results.
The mprime, prime95, CUDAPm1 programs try many combinations of B1 and B2 values while seeking that optimal. Or the user dictates the P-1 bounds in the worktodo line (or command line as applicable).

From experiments with prime95, with somewhat larger exponents, it appears that optimization calculation occurs also during prime95 Test Status output generation, which shows considerable lag for P-1 work compared to other computation types. It appears there's no caching of previous computation of the optimal P-1 bounds. In my experience prime95 status output without a stack of P-1 work assignment is essentially instantaneous, while this example attached takes 5 seconds, even immediately after a preceding one. With larger P-1 exponents or more P-1 assignments (deeper work caching or more complete dedication of a system to P-1 work than the 1/4 in my example) I think that 5 seconds will increase.

Got assignment [aid redacted]: P-1 M89787821
Sending expected completion date for M89787821: Dec 05 2018
 [Thu Dec 06 09:17:24 2018 - ver 29.4]
Sending result to server: UID: Kriesel/emu, M89787821 completed P-1, B1=730000, B2=14782500, E=12, Wg4: 123E2311, AID: redacted

PrimeNet success code with additional info:
CPU credit is 7.3113 GHz-days.
The prime95 worktodo.txt record for a primenet-given P-1 assignment contains no B1 or B2 specification.
George's description of the optimization process is in the P-1 Factoring section of
It's there to read in the source codes also.

CUDAPm1 example:
worktodo entry from manual assignment:
program output:
CUDAPm1 v0.20
------- DEVICE 1 -------
name                GeForce GTX 480
Compatibility       2.0
clockRate (MHz)     1401
memClockRate (MHz)  1848
totalGlobalMem      zu
totalConstMem       zu
l2CacheSize         786432
sharedMemPerBlock   zu
regsPerBlock        32768
warpSize            32
memPitch            zu
maxThreadsPerBlock  1024
maxThreadsPerMP     1536
multiProcessorCount 15
maxThreadsDim[3]    1024,1024,64
maxGridSize[3]      65535,65535,65535
textureAlignment    zu
deviceOverlap       1

CUDA reports 1426M of 1536M GPU memory free.
Index 91
Using threads: norm1 256, mult 128, norm2 32.
Using up to 1408M GPU memory.
Selected B1=1830000, B2=9607500, 2.39% chance of finding a factor
  Starting stage 1 P-1, M292000031, B1 = 1830000, B2 = 9607500, fft length = 16384K
Aaron Haviland has recently rewritten part of CUDAPm1's bounds selection code in v0.22, building on his earlier 2014 fork.

GPUOwL's PRP-1 implementation is a bit different approach, and requires user selection of B1. It defaults to B2=p but allows other B2 to be user specified. See, posts 765-767 for Preda's description of gpuowl v5.0 P-1 handling. (See posts 694-706 for his earlier B1-only development;

(Code authors are welcome to weigh in re any errors, omissions, nuances etc.)
Attached Thumbnails
Click image for larger version

Name:	p-1 status 5 seconds to generate.png
Views:	36
Size:	21.4 KB
ID:	19371  

Last fiddled with by kriesel on 2018-12-07 at 16:47
kriesel is offline   Reply With Quote
Old 2018-12-08, 06:32   #37
Jul 2018

22×7 Posts

Personally, regarding TF vs. P-1: I find with my hardware that, in terms of maximizing d(probability of getting a factor)/dt, I should not TF to a higher level than around 74 bits. For exponents near 90M, a given one of my cards takes about a half hour to run through 73-74 bits, with success probability ~1.35%. That same card can do a P-1 with about 3.6% probability of success (using whatever bounds the software defaults to) in an hour and a half, thrice the time. Going to 75 would be too much. So if I want to maximize my factors-found per time in a range near 90M that has already been TF'd to 74 bits or more, then I should do P-1 work. In that sense, it's possible 76 bits is too high... on the other hand, my cards have a lot of memory, which probably pushes the TF/P-1 boundary down somewhat. But also d(probability of success under default params given available memory)/d(available memory) is not that big -- I don't know enough now about what the requirements are and how p(success) varies with B1, B2 to say.

In terms of optimal work reduction, I think that how many factors TF might find that P-1 would miss is not as important as the per-time probability of finding a factor. I think you could take this as a multi-armed bandit problem where each action is a pair (factoring method, device) that has some time cost and some factor-probability reward. It's somewhat complicated by that failure to find a factor for a given exponent also returns a small amount of information ("no factors under 2^75") which influences the future factor-probability estimate for a given (method, device) on that exponent. (Not that this makes the allocation problem easier, but there is a framework one could use to analyze it at least...)

Of course, optimal work reduction isn't the only metric; one might be interested in e.g. maximizing coverage in a given range, in which case the best strategy might be different but probably this modeling approach would still find it. One might also be interested in maximizing the rate of Mersenne prime yield, which might also involve admitting "LL" as an action. Hopefully current "economic cross-over point" analysis matches whatever this would come up with.

Last fiddled with by penlu on 2018-12-08 at 07:17
penlu is offline   Reply With Quote
Old 2018-12-09, 22:27   #38
If I May
chalsall's Avatar
"Chris Halsall"
Sep 2002

23·1,103 Posts

Originally Posted by penlu View Post
Hopefully current "economic cross-over point" analysis matches whatever this would come up with.
We try our best (or as my girlfriend likes to remind me constantly, I'm trying...).

You are in a somewhat unique situation as far as you are willing and able to target your "firepower" optimally. Few are as focused as to the optimality of deployment of cycles. Primenet and GPU72 are somewhat constrained as to what they can assign for optimal throughput because each user tends to fetch only a single type of work for each of their kit.

To put on the table, we are currently over-powered with (GPU) TF'ing and (CPU and GPU) P-1'ing; we are years ahead of the LL'ers.

What will come soon is the time to LLTF to 77 "bits". But possibly only after a P-1 run.

Any advise anyone has with regards how to optimally manage this would be most welcomed.
chalsall is offline   Reply With Quote

Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
TF bit level davieddy Puzzles 71 2013-12-22 07:26
Probability of TF per bit level James Heinrich PrimeNet 11 2011-01-26 20:07
Expiring policy for V5 Server lycorn PrimeNet 16 2008-10-12 22:35
k=5 and policy on reservations gd_barnes Riesel Prime Search 33 2007-10-14 07:46
Reservation policy gd_barnes Riesel Prime Search 6 2007-10-01 18:52

All times are UTC. The time now is 09:27.

Tue Mar 31 09:27:34 UTC 2020 up 6 days, 7 hrs, 0 users, load averages: 0.87, 1.19, 1.26

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.