mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > PrimeNet > GPU to 72

Reply
 
Thread Tools
Old 2018-11-28, 16:48   #12
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×29×59 Posts
Default

Quote:
Originally Posted by chalsall View Post
Again, it's a little complicated...

In an ideal world we would TF to the exact "economic cross-over" point (e.g. 76.253 "bits" for a GTX 580 at 90M). But mfaktX (and Prime95/mprime) only do integer depths. Further, mfaktX and Prime95/mprime do TF'ing differently (the former does lowest up; the latter uses "classes" spread over the entire bit range).

What we end up trying to do is what makes the most sense, based on the available compute.

Keep in mind also that James' analysis is based on the *same* hardware doing either TF'ing or LL'ing. In reality we have some who like to do TF'ing, and others who like to do LL/DC'ing. Thus, sometimes a participant/card will TF to a higher level than makes optimal sense (but rarely more than 0.5 of a bit level).
My question was, what's more likely to make optimal sense, round to nearest or round down or something else, in gpu TF?
In the absence of an answer from someone I think likely to know (which might include heinrich, Prime95, chalsall), I'm leaning toward round down. P-1 can pick up the slack based on the rounded-down value of TF limit in its selection of B1 and B2. (And we don't have an app that runs in negative time to un TF exponents that were taken further than optimal. ;)

As someone with a variety of cpu types and gpu types, running many of the available computation types on a wide range of exponents (but no Cpu TF, that would be a waste in my fleet), I get the complexity and difficulty of trying to optimize throughput overall. Just measuring and tabulating the GhzD/day values versus exponent, computation type, computation-type-specific variables (TF bit level, B1, B2) versus computing hardware type would be a large undertaking. And some apps compute and display that value, some don't. Then interpreting the data, coming up with a near optimal course for probability of finding a prime (which is not the same as maximizing GhzD/day) seems daunting.

Last fiddled with by kriesel on 2018-11-28 at 16:48
kriesel is online now   Reply With Quote
Old 2018-11-28, 17:55   #13
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013
Ͳօɾօղէօ

22·5·139 Posts
Default

Quote:
Originally Posted by kriesel View Post
My question was, what's more likely to make optimal sense, round to nearest or round down or something else, in gpu TF?
In the absence of an answer from someone I think likely to know (which might include heinrich, Prime95, chalsall), I'm leaning toward round down. P-1 can pick up the slack based on the rounded-down value of TF limit in its selection of B1 and B2. (And we don't have an app that runs in negative time to un TF exponents that were taken further than optimal. ;)

As someone with a variety of cpu types and gpu types, running many of the available computation types on a wide range of exponents (but no Cpu TF, that would be a waste in my fleet), I get the complexity and difficulty of trying to optimize throughput overall. Just measuring and tabulating the GhzD/day values versus exponent, computation type, computation-type-specific variables (TF bit level, B1, B2) versus computing hardware type would be a large undertaking. And some apps compute and display that value, some don't. Then interpreting the data, coming up with a near optimal course for probability of finding a prime (which is not the same as maximizing GhzD/day) seems daunting.
Again, as Chris mentioned, it depends on the individual GPU and user preference. If a user isn't using their GPUs for LL or P-1, then it makes sense to TF as evenly as possible across the exponents likely to be assigned for LL or P-1 in the near future. Is the near future a week or a month? Hard to say.

When it comes to rounding up or down, down is likely the way to go. Each bit takes roughly twice as long to TF,

Personally I TF exponents needing DC, since I want more TF done before my CPUs do the DC. It would probably be faster to do the actual LL work, but I'm lazy and haven't run cudaLucas with a script to manage work fetching and results.

The one feature I wish GPU72 had is the ability to tell it which GPU I'm using and for it to only give me "Let GPU72 decide" work based on James' crossovers. This would give the GPUs with the better TF/LL crossover points higher TF work. The difference in crossover for a GTX 580 and a RTX 2080 is several bit levels.
Mark Rose is offline   Reply With Quote
Old 2018-11-28, 20:33   #14
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×29×59 Posts
Default

Quote:
Originally Posted by snme2pm1 View Post
So I might argue that such work in 118M space, and other similar spaces, is relevant now for a smaller community that might like to engage with same.
Having a sprinkling of selected exponents taken to TF levels well ahead of the general breadth-first approach has proven very useful for some testing purposes. Rudimeier and others have taken some exponents near million-exponent-value boundaries (mod (exponent, 10^6) < ~200) much higher.
These are useful as pre-qualification of those exponents for the P-1 application testing I've been doing, exploring the limits of CUDAPm1. See for example https://www.mersenne.org/report_expo...2000100&full=1

Also, running some PRP or LL test & double check well ahead of the wavefront with multiple applications offers a chance of detecting software issues that are fft length or exponent dependent with plenty of time to determine the issue and work on a fix.
There's now at least one LLDC in every line through 113M in https://www.mersenne.org/primenet/, which is about 3 years ahead of the leading edge of the mass primality test assignment wave now at ~90M. (PRP coverage is much thinner; no PRP DC above 84M to at least 999M; see https://www.mersenneforum.org/showpo...81&postcount=6)
kriesel is online now   Reply With Quote
Old 2018-11-28, 21:40   #15
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

1101010111102 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
Again, as Chris mentioned, it depends on the individual GPU and user preference. If a user isn't using their GPUs for LL or P-1, then it makes sense to TF as evenly as possible across the exponents likely to be assigned for LL or P-1 in the near future. Is the near future a week or a month?
Or a day. As long as there's an adequate buffer maintained of well TF'd exponents, for input to P-1, LL, or PRP, for the most part it does not matter how thoroughly TF'd are the exponents we won't reach with the wavefront for a month, decade or several. Except for the rare software test exponents, which can be manually searched for when setting up the tests. Or in cases where the TF management is fully manual, and long run times are useful in keeping assignment and result volume compact per Thz-D and avoiding the gpus going idle.

Quote:
Originally Posted by Mark Rose View Post
The one feature I wish GPU72 had is the ability to tell it which GPU I'm using and for it to only give me "Let GPU72 decide" work based on James' crossovers. This would give the GPUs with the better TF/LL crossover points higher TF work. The difference in crossover for a GTX 580 and a RTX 2080 is several bit levels.
The crossover is relevant if you intend to do both the TF and the primality test on the gpu. One can argue that would be a misuse of the RTX2080. A GTX has a TF/LL throughput ratio around 15. The RTX, since it is so much faster at TF, but not LL, it has a TF/LL throughput ratio around 43, and ought do TF so the GTX or the cpu could do PRP, LL or P-1. I've computed TF/LL throughput ratios for my hardware fleet, and found older model cpus at 0.7-1.25, older gpus at 11.3 to 15.6 (except an IGP came in at 22.5 because of low DP performance). These figures are all gross oversimplifications, since both TF and ll throughput values are functions not constants.

It's hard to even formulate a coherent objective function to consider to optimize, given a very heterogenous pool of computing hardware and computation types that are related and the many constraints. Some may say all gpus should do all TF, all cpus should do no TF. (I think I'd find that approach dull.) I think the lowest TF/LL resources get assigned primality test and P-1, and the highest TF/LL resources get assigned TF, until a balance of throughputs of the various types is found, if one is doing all steps on the same exponents.

Having the capability of primenet interface and autopilot could help total throughput by gpus. There are probably some users who run prime95 but not gpu apps because they're not automatic.

Last fiddled with by kriesel on 2018-11-28 at 21:42
kriesel is online now   Reply With Quote
Old 2018-12-01, 02:19   #16
snme2pm1
 
"Graham uses ISO 8601"
Mar 2014
AU, Sydney

22810 Posts
Default

Quote:
Originally Posted by kriesel View Post
I'm leaning toward round down. P-1 can pick up the slack based on the rounded-down value of TF limit in its selection of B1 and B2..
Whilst I haven't laboured the Pollard algorithm, my limited grasp for now is that it is not necessarily destined to find results from lapse of reasonable due diligence of TF work; am I wrong?
snme2pm1 is offline   Reply With Quote
Old 2018-12-01, 05:35   #17
GP2
 
GP2's Avatar
 
Sep 2003

2·1,289 Posts
Default

Quote:
Originally Posted by snme2pm1 View Post
Whilst I haven't laboured the Pollard algorithm, my limited grasp for now is that it is not necessarily destined to find results from lapse of reasonable due diligence of TF work; am I wrong?
All factors of 2p−1 are of the form 2kp+1 for some k, and the P−1 algorithm can only find them if k is smooth, i.e., if the factors of k itself are such that all of them are less than B2 and all but one of them are less than B1.

(Obviously the set of factors of k for a particular factor 2kp+1 of 2p−1 are completely different from the set of factors of 2p−1)

So the P−1 algorithm can miss some smallish factors and find some largish factors, whereas TF finds factors strictly on the basis of their size, and there's less overlap than you might think between the factors findable by each method.

P−1 won't really bail you out if TF was poorly done with a bad machine.
GP2 is offline   Reply With Quote
Old 2018-12-01, 09:09   #18
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

22×5×419 Posts
Default

Well, practical example, if you have your p at the actual front, say in 90M, and you TF to 75 bits and missed a factor, due to bad machine or odd luck, or whatever. There are 2^75/180M, about 2*10^14 factor candidates there. About half of them can not be factors (for example k=2 mod 4). Say you have about 10^13 possible candidates, that you sieve and test. There are about 6 trillion primes there. Any of them can be a factor. I made it in your advantage. In this range, say you make P-1 with B1=10M and B2=300M (which is about 10 times higher and longer than the current values we use for this range). About one in a hundred and sixty thousand of these candidates are 10M-power-smooth and about another one in thirty thousands has the largest prime factor lower than B2. All in all, you may have a chance of 0.0025% plus or minus something, to find a factor by P-1 that was missed by TF.


All numbers are from my butt, I didn't use a calculator, and I didn't use advanced tools, just common sense ("feeling" for these numbers, after a lot of experience playing with GIMPS). But that is the ballpark we are in, anyhow.
LaurV is online now   Reply With Quote
Old 2018-12-06, 01:32   #19
GP2
 
GP2's Avatar
 
Sep 2003

2×1,289 Posts
Default

Quote:
Originally Posted by LaurV View Post
Well, practical example, if you have your p at the actual front, say in 90M, and you TF to 75 bits and missed a factor, due to bad machine or odd luck, or whatever. There are 2^75/180M, about 2*10^14 factor candidates there. About half of them can not be factors (for example k=2 mod 4). Say you have about 10^13 possible candidates, that you sieve and test. There are about 6 trillion primes there. Any of them can be a factor. I made it in your advantage. In this range, say you make P-1 with B1=10M and B2=300M (which is about 10 times higher and longer than the current values we use for this range). About one in a hundred and sixty thousand of these candidates are 10M-power-smooth and about another one in thirty thousands has the largest prime factor lower than B2. All in all, you may have a chance of 0.0025% plus or minus something, to find a factor by P-1 that was missed by TF.
I think your estimate is completely wrong, actually.

I took all the known factors (as of December 1) that meet your conditions: factors of size 75 bits of exponents in the 90M–91M range. There are 125 of them.

Slightly more than half of the currently-known 75-bit factors in this exponent range are findable using P−1 with the B1 and B2 you specified.

This is undoubtedly distorted because there has only been very limited TF to 75 bits in this range, so factors found by P−1 are probably over-represented versus factors found by TF. And also you picked exaggerated values for B1 and B2. But that doesn't matter. Just look at the data. Some of the factors are findable by P−1 using really ridiculously low B1 and B2... and it's a lot more than 0.0025% of them.

Whenever you have exponentially increasing difficulty, then you will have a large trivially-easy group and a large completely-impossible group, and a small sweet spot in the middle where you can find things with some effort. But the criteria that determine difficulty are different for TF than for P−1, so there will always be some factors that are very hard for TF but are very easy for P−1, and vice versa.

We should really choose a bit length like 65 bits, where we can be sure that every factor of this size is known (thanks to complete TF and also user TJAOI), so there is no selection bias. And then I think you'd still find maybe up to 20% of the factors of a given bit length found by TF in a given exponent range are also findable by P−1 with typical values B1 and B2 used by Primenet for that exponent range.

The following table shows the fields: p, f, bit length, B1, B2 and the links go to mersenne.ca so you can verify the B1 and B2 values that are needed to find the factor with P−1.

Code:
90001337,30275491157952984566993,75,27527,244877
90003091,27580695051547817445769,75,18149,575251
90004207,34105614724956800044441,75,17,61917248641
90011927,24692165690915304167479,75,24907,1835635517
90013493,37379225930359178211089,75,859,30214091639
90016897,37279878084531290742769,75,161743,1367789
90018223,30135935323533343270553,75,67219,419789
90020351,32121394306793985297151,75,431,204419063
90022831,32169364698025037074481,75,3347,190654069
90024941,34710660219198891405431,75,3511,4186691
90026549,22651977832386106845959,75,19867,24517
90034291,23794721359432517352583,75,607,132178369
90034409,30578635633880094849161,75,48023,2130209
90035839,37275245925627483902231,75,59,701702606723
90036091,21970575928536907737983,75,60773,111467
90036179,34807929506691838748903,75,1163,13053311
90040529,31784738502709835530847,75,751,6918347
90041383,23491020791516758138129,75,3,5435237111917
90042629,31629650497580602452961,75,13,6254882089
90043507,28337144191051389713177,75,11383,3455866487
90047663,24097400094846645123049,75,1753,313783
90050803,30975976957801844310913,75,8233,12089423
90052051,28861285992967411012481,75,239,34253
90061571,26211616433883023394961,75,17,23777865599
90061693,36686298667228450938193,75,5897,4220239
90062767,31426249583890764670127,75,263,663378703903
90065617,23085145955421470950559,75,128157374169887,128157374169887
90066673,33770446918217188124401,75,13,1144534253
90070867,34021689523742863765783,75,3,20984519472097
90074161,28662609592468589374529,75,1201,179996759
90074807,27200067254870418199769,75,2129,106165771
90076403,23222441039888242506799,75,43,111028537253
90086989,19616785394967460455521,75,5,1360961333923
90090529,20432166952246868820041,75,4297,101500129
90092983,34383907550131776874927,75,13,1630979139133
90093151,35139954951901562385641,75,23,423956791217
90094969,19558475561072849583463,75,131,39456074549
90095849,20334232711579886881601,75,1129,1372991
90100201,27963683013401722290983,75,1997,4283
90101353,23178295088429391067153,75,3,5359310727283
90102743,25720680603130397485823,75,13,10979210103229
90103477,19344642535218094203617,75,5347,7499
90114347,36353490287781684744673,75,47,8128125443
90115843,36838911275947845397409,75,61,209423697803
90117569,34556576308089107507311,75,349,36624740717
90118949,37409755409195259329287,75,19,3641362364351
90120749,22166173221494404329431,75,7,71708693549
90121459,23714259238860463509703,75,14419,390493
90131891,28496659569941492034839,75,158083111614409,158083111614409
90141127,21039128352619567738673,75,173,84321567977
90141269,34690632245071860114191,75,103,53376892031
90142421,19812699259451178744721,75,30517,4287097
90144179,29813064197700479208209,75,2437,107365889
90149651,22265018841347167697881,75,363437,5663027
90150631,35245101095404158356257,75,48847,83372123
90153013,36137577284545792762591,75,14929,130639
90158807,33343086134850656038961,75,236729,19527929
90162803,26260150014933188168921,75,284387,3657649
90164801,26394575149129829178857,75,139589,23831083
90170033,19782397216081836930073,75,15679,30713
90176059,32279051954330934005257,75,223,66882641167
90188047,29302081806277673160769,75,13,80499817
90189863,21430652709358856941009,75,3,4950356022217
90194723,21607328992446123657959,75,3517,10139291
90197719,34771047234186645032879,75,163,51413453869
90201773,20906787871288193055127,75,617843,5683949
90224671,36966116392160450253041,75,76543,100313
90225743,31311050552132744827639,75,1794719,10742323
90228647,28629910989595726382903,75,372131,38757613
90236819,36736775257229282064913,75,79,1602411757
90240209,26378280401930018845897,75,58897,136319
90243193,20501931601427179977511,75,5,7572845079269
90244949,31348065105309662033249,75,2,10855200710911
90246391,27312496571803352865433,75,7243,1741012261
90248201,23360788510093359122839,75,3,43141743641873
90251617,30478587149629142344903,75,67,44214038537
90255751,30816261664831676831599,75,887,240280727
90258367,26319405133113697433183,75,25603,52201
90261497,19902882946225105086569,75,1237,289376357
90264641,25436083960862831146607,75,17,8288075022599
90265039,19860596701068447730519,75,197,1399598927
90266431,20748280961854176610601,75,6551,125941
90271949,32174383420963682333671,75,17443,681106427
90283153,21071824681658771129983,75,101,7266862333
90287209,19791394593381752283497,75,426739,1088293
90289709,29050985469174247085119,75,71707,190339
90289933,22735974000730158897737,75,153817,1538609
90295097,21850106226467353734433,75,383,2049647
90301931,20313881338512870213263,75,6247,30058517
90302341,31615993367490175356761,75,65761,77339
90303803,37340129698475764263967,75,11981,1462519
90304237,36551719702887556102289,75,3527,98254259
90305447,28444008285058980009367,75,3,17498641942421
90306347,18989540267375199840847,75,585107,5445239
90308021,37206244850926875158431,75,17,89758756837
90308993,25846622933337040303463,75,31,659451830251
90311707,28289617785002206595713,75,443,32203
90323029,23342926586503468755617,75,71,8749941139
90328193,19867594653018597110279,75,109974494081923,109974494081923
90332633,36210287561275510310647,75,639833,11601841
90334411,31159445186129425734511,75,176237,7248959
90347119,29561056405002839372417,75,2,2556204953963
90348067,28406348181925693102831,75,23929,48767
90440143,23991331494887961879943,75,3,4912463667511
90503909,19804045823685690437737,75,953,46218101
90504539,27273508321056276073967,75,1181,29376559
90505357,21224517933807247369991,75,5077,4619089591
90506123,18926107751685058568183,75,433,3135990437
90668371,33743761966549016231081,75,1291,13370963
90673123,27569674045941640318759,75,7069,35801
90679157,29924157598336182932383,75,2927,354539491
90679219,33952994657583975848929,75,3203,1217705149
90682259,29660402175788739276809,75,50131,308809
90682567,36769630201356491264039,75,6091,597263
90684271,28385632387745954602943,75,154753,1011340817
90692821,35742861897608570358569,75,8377,135431
90693371,21212559431297250732497,75,11,1328938427701
90701713,20053913238312793986247,75,3,36849567251857
90719351,34360578429649274048663,75,37,393718171501
90723697,25737172880635507752199,75,2179,7232865097
90724243,26935176168132377274263,75,7,3029496056633
90725827,20912357778960786109847,75,1063,15488550089
90726341,30626793924517436076967,75,3301,396371947
90731723,27760869132060529502767,75,109,514674047
90732647,26075768070729698451583,75,67,714903585953

Last fiddled with by GP2 on 2018-12-06 at 01:48 Reason: fix bad links
GP2 is offline   Reply With Quote
Old 2018-12-06, 01:46   #20
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

23·1,103 Posts
Default

Quote:
Originally Posted by GP2 View Post
I think your estimate is completely wrong, actually.
Sigh... Why don't you put your CPUs where your mouth is.

Please run a few P-1 jobs in 89M where factors are not known. Many are available.

Last fiddled with by chalsall on 2018-12-06 at 01:47
chalsall is offline   Reply With Quote
Old 2018-12-06, 02:01   #21
GP2
 
GP2's Avatar
 
Sep 2003

2·1,289 Posts
Default

Quote:
Originally Posted by chalsall View Post
Sigh... Why don't you put your CPUs where your mouth is.

Please run a few P-1 jobs in 89M where factors are not known. Many are available.
I'm not sure what point you're trying to make.

When a factor of a Mersenne number is known, it's of the form 2kp+1. So you simply find the factors of k itself to determine, retroactively, which B1 and B2 would have sufficed to find that particular factor of the Mersenne number by the P−1 method. Namely, the B1 and B2 values required are the second-largest and the largest factors of k, respectively.

Thus you can determine that a significant fraction of factors found by TF are also findable by P−1. It's demonstrable by math, you don't need to run any jobs to prove it.

Last fiddled with by GP2 on 2018-12-06 at 02:03
GP2 is offline   Reply With Quote
Old 2018-12-06, 02:09   #22
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

23·1,103 Posts
Default

Quote:
Originally Posted by GP2 View Post
I'm not sure what point you're trying to make.
You yourself demonstrated that there is little cross-over between factors found by TF'ing or P-1'ing.

We have now effectively finished TF'ing 89M to 76 bits, and are close to finishing 90M to same.

Are you now telling us that we're wasting our time TF'ing?
chalsall is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
TF bit level davieddy Puzzles 71 2013-12-22 07:26
Probability of TF per bit level James Heinrich PrimeNet 11 2011-01-26 20:07
Expiring policy for V5 Server lycorn PrimeNet 16 2008-10-12 22:35
k=5 and policy on reservations gd_barnes Riesel Prime Search 33 2007-10-14 07:46
Reservation policy gd_barnes Riesel Prime Search 6 2007-10-01 18:52

All times are UTC. The time now is 02:07.

Tue Mar 31 02:07:34 UTC 2020 up 5 days, 23:40, 0 users, load averages: 2.02, 1.86, 1.70

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.