![]() |
![]() |
#1 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2×3,677 Posts |
![]()
Hi, I'm looking for a few intrepid volunteers, or more than a few, to take some scattered but strategically placed exponents to up to the full GPUto72 individual goal bit levels, or toward them, well distributed from 100 million to the mersenne.org upper limit of a billion. You can think of this as creating full-TF islands above the prevailing water line of bulk TF effort.
The main purpose is to provide some already TF complete candidates for P-1 factoring software testing. Whoever does the TF gets the computing credit and credit for any factors found along the way. Reservation of exponents is highly recommended, and reasonably prompt completion. Consider the exponents from 100 million to 101 million as a bin. Strategic TF would focus in the first 1000, that is, 100,000,000 to 100,001,000, and aim for completing TF on at least two that have no P-1 or primality test done and with no factor found by TF, so in need of P-1 and perhaps primality testing. The known location within the bin is to make it easy to find them via https://www.mersenne.org/manual_gpu_assignment/ or https://www.mersenne.org/report_exponent/ It would be good to have multiple close spaced bins each containing an island of two or more full-depth TF exponents without P-1 result or primality result. Having multiple closely spaced bins allows using one set of well spaced bins to test CUDAPm1, and another well spaced set prime95, another gpuOwL PRP-1 or any future Preda P-1, etc. Also, occasionally software has an issue in a small range of exponents but is ok on either side of the trouble spot. (I've seen CUDAPm1 have trouble with one gpu but not another, even sometimes the same model, at 84M, 128M, and 171M.) Having more than one fully TF completed exponent per island is insurance against finding a stage 1 factor and so being unable to test stage two, and could act as a spare in case of a nearby island being a trouble spot for one of the applications. At the same time, staggering the bins a bit between applications or versions means a slightly wider distribution of exponents tested. For example, and from now on giving bin identifications as millions (for example 100 instead of 100 million): P95 owl CUDAPm1 100 101 102 103 120 121 122 123 150 151 152 153 200 201 202 203 250 251 252 253 300 301 302 303 350 351 352 353 400 401 402 403 500 501 452 453 600 601 700 701 800 801 900 901 After running CUDAPm1 on a given gpu model on several widely spaced exponents (which are usually chosen spaced about 50M or 100M apart so they plot nicely), often, as in CUDAPm1 v0.20, I find some exponents can not be run successfully to completion on a given gpu or any gpu. Then I start doing a binary search to see what the limits are. Closer spaced islands later would be useful for that. When I need to TF-qualify the exponents, it really slows down the testing of P-1 limits since I'm using the same gpus for both. I'm getting ready to start the testing and limit mapping of several gpu models on CUDAPm1 v0.22, and am started testing in prime95. Any helpful TF island building would be appreciated. The end result is tabulation of run times and plotting of scaling, and documentation of limits, NRP trends, software issues encountered, run time scaling, etc, as in https://www.mersenneforum.org/showthread.php?t=23389 Users like mikr and rudimeier have already done some of this deeper TF at the front of a million bin. It is very useful when prequalifying a few exponents for P-1 software testing on high exponents on gpu by finishing them myself to gputo72 factoring goal levels. Thank you to the pioneers who have already done some of this, to or near primenet goal bit levels several years ago, for example. The higher ones will represent a considerable amount of total work per exponent. A few examples of computing effort per exponent to full GPUto72 TF depth: https://www.mersenne.ca/exponent/101000117 114 GhzD to go to full gputo72 bit level (76) https://www.mersenne.ca/exponent/171000043 346 GhzD to go to full gputo72 bit level (78) https://www.mersenne.ca/exponent/371000039 1.2 ThzD to go to full gputo72 bit level (81) https://www.mersenne.ca/exponent/919000001 8.4 ThzD to go to full gputo72 bit level (85) https://www.mersenne.ca/exponent/999000061 15.6 ThzD to go to full gputo72 bit level (86) |
![]() |
![]() |
![]() |
#2 |
Jun 2005
USA, IL
193 Posts |
![]()
I can volunteer some TF, but I'm not sure about what bit levels any particular range should be taken to. Are the 'full GPUto72 individual goal bit levels' posted somewhere?
|
![]() |
![]() |
![]() |
#3 | |
1976 Toyota Corona years forever!
"Wayne"
Nov 2006
Saskatchewan, Canada
52×211 Posts |
![]() Quote:
https://www.mersenne.ca/status/tf/0/0/1/0 Click on any line to drill down for finer limits. |
|
![]() |
![]() |
![]() |
#4 |
6809 > 6502
"""""""""""""""""""
Aug 2003
101×103 Posts
5·7·311 Posts |
![]()
James H had posted a chart a while back (based upon perfomance data). That and Chris mentioned that GPU's should do about 3 bits deeper than Prime95's default.
See this post of mine and James' response: https://mersenneforum.org/showthread.php?p=389094 and https://mersenneforum.org/showthread.php?p=490542 |
![]() |
![]() |
![]() |
#5 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
1CBA16 Posts |
![]() Quote:
https://www.mersenne.ca/exponent/101000117 after https://www.mersenne.org/report_expo...1001000&full=1 One must be careful about mersenne.ca for current status, since it lags a bit until it syncs overnight from mersenne.org If you're just after the TF level to go up to, going by the red curve for first LL on charts like https://www.mersenne.ca/cudalucas.ph...=100&mmax=1000 is not bad. You can get to those by clicking on any gpu in the list at https://www.mersenne.ca/cudalucas.php, and the low and high exponent limits are 50M to 300M by default but can be adjusted as shown in the URL above. Or, I suppose I could add a target TF column. Lots of choices. TFH P95 owl CUDAPm1 76 100 101 102 103 77 120 121 122 123 77 150 151 152 153 79 200 201 202 203 79 250 251 252 253 80 300 301 302 303 81 350 351 352 353 81 400 401 402 403 82 500 501 452 453 83 600 601 84 700 701 85 800 801 85 900 901 Last fiddled with by kriesel on 2019-01-08 at 06:29 |
|
![]() |
![]() |
![]() |
#6 |
Jun 2005
USA, IL
193 Posts |
![]()
Thanks for the links and list everyone. Yes, that makes sense.
Will it help your efforts more to work on any specific bins first, like smallest to largest, or just anything as available? |
![]() |
![]() |
![]() |
#7 |
Romulan Interpreter
"name field"
Jun 2011
Thailand
1027310 Posts |
![]()
Make an worktodo file (list of exponents with bitlevels, which I can summarily edit and paste to my rig) and pass it to me.
|
![]() |
![]() |
![]() |
#8 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
1CBA16 Posts |
![]() Quote:
I usually go from small to large, in the first part of testing, because it gives a quick feel for scaling and more rapidly and efficiently explores limits. TF in the same order seems like it would work well along with that. Examples of description: "entire left cudapm1 column"; "gpu column up to 401"; "p95 column 400 to 900"; an actual exponent list would work too. I am running testing on different applications on different gear in parallel; prime95 on cpus, gpuowl on AMD gpus, CUDAPm1 on NVIDIA gpus. So no particular priority between columns. |
|
![]() |
![]() |
![]() |
#9 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
162728 Posts |
![]() Quote:
Factor=153000277,75,77 Factor=153000349,75,77 Factor=203000101,73,79 Factor=203000117,73,79 Factor=253000937,76,79 Factor=303000119,70,80 Factor=353000047,77,81 Factor=403000067,71,81 Factor=453000013,78,82 Factor=253000079,74,79 Factor=303000227,70,80 Factor=353000101,72,81 Factor=403000069,71,81 Factor=453000029,71,82 (Please reserve them to avoid duplications especially at 70 or 71 bit starting points) Last fiddled with by kriesel on 2019-01-09 at 12:27 |
|
![]() |
![]() |
![]() |
#10 | |
If I May
"Chris Halsall"
Sep 2002
Barbados
22×5×7×79 Posts |
![]() Quote:
GPU72's targets are guided by James' "economic cross-over" analysis, which has been peer reviewed by many very knowledgeable people. The exact "optimal" TF'ing depth is a function of the range (candidate size) and the particular card's abilities (specifically, the "compute version"). For example, a RTX 2080 Ti (c.v. 7.5) should TF deeper than a GTX 580 (c.v. 2.0). Please keep in mind that James' analysis is based on comparing what will "clear" a candidate faster (using statistical heuristics) ***using the same kit*** running either mfaktc vs a CUDA LL'er. Note that some TF (slightly) beyond the optimal economic cross-over point because they just like finding factors, or can't be bothered to switch between the different software. |
|
![]() |
![]() |
![]() |
#11 |
6809 > 6502
"""""""""""""""""""
Aug 2003
101×103 Posts
5·7·311 Posts |
![]()
I suggest that you use James H's worktodo.txt balancer. Try to make each chunk posted as close as possible to the same GHz-days. Here is what it looks like as balanced as it can be:
Code:
[Worker #1] Factor=353000101,72,81 Factor=453000013,78,82 Factor=453000029,71,82 [Worker #2] Factor=153000349,75,77 Factor=253000937,76,79 Factor=203000117,73,79 Factor=303000227,70,80 Factor=403000069,71,81 Factor=353000047,77,81 [Worker #3] Factor=153000277,75,77 Factor=253000079,74,79 Factor=203000101,73,79 Factor=303000119,70,80 Factor=403000067,71,81 • Worker #1 = 5,572.802 GHz-days • Worker #2 = 4,489.192 GHz-days • Worker #3 = 3,233.924 GHz-days No one has to buy a whole thing, you can just reprocess it for the next batch. |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
What does Glib Deepak have to do with deep doo-doo? | cheesehead | Science & Technology | 47 | 2014-12-14 13:45 |
Deep Hash | diep | Math | 5 | 2012-10-05 17:44 |
Question on going deep and using cores | MercPrime | Software | 22 | 2009-01-13 20:10 |
Deep Sieving 10m Digit Candidates | lavalamp | Open Projects | 53 | 2008-12-01 03:59 |
NASA's Deep Impact... | ixfd64 | Lounge | 5 | 2005-07-06 13:46 |