mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Blogorrhea > kriesel

Closed Thread
 
Thread Tools
Old 2019-05-11, 18:16   #12
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

52×197 Posts
Default Why don't we save interim residues on the primenet server?

This is often asked in the context of wanting to be able to continue a run abandoned before completion by someone. It's not unusual for someone to quit participating when their assigned exponent(s) are anywhere from 2 to 98% complete in a primality test.

Full length residues saved to the primenet server at some interval, perhaps every 20 million iterations, are sometimes proposed as a means of minimizing the lost throughput from abandoned uncompleted tests. The combined output of GIMPS would represent a considerable load on the server's resources to implement this, and require additional considerable expenditure to support, which is not in the Mersenne Research Inc. budget. For users with slow internet connections, the individual load could also be considerable as a fraction of available bandwidth. Transfer times could stall the application and reduce total throughput. https://www.mersenneforum.org/showpo...&postcount=118
Detailed analysis and discussion at https://www.mersenneforum.org/showpo...&postcount=124

However, it is feasible to save smaller interim residues, such as 64-bit or 2048-bit. And this is currently being done. Recent versions of prime95 automatically save 64-bit residues at 500,000 iterations and at every multiple of 5,000,000. The 2048-bit are generated at the end of PRP tests, possibly only type 1 and type 5 PRP tests, per posts 606-609 of https://www.mersenneforum.org/showth...048#post494079
The stored interim 64-bit residues from different runs can be compared to see if runs are matching along the way or when one or another diverges.


Top of this reference thread: https://www.mersenneforum.org/showth...736#post510736
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2020-02-20 at 20:35
kriesel is offline  
Old 2019-05-19, 15:58   #13
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

52·197 Posts
Default Why don't we skip double checking of PRP tests protected by the very reliable Gerbicz check?

George Woltman gave a few reasons at https://www.mersenneforum.org/showpo...68&postcount=3.
An example of a bad PRP result is listed at https://www.mersenne.org/report_expo...9078529&full=1, which George has identified as an example of a software bug affecting a single bit outside the block of computations protected by the Gerbicz error check.

However, the development of a method of generating a proof of correct completion of a PRP test, that can be independently verified, will replace PRP double checking, at a great savings in checking effort. https://www.mersenneforum.org/showth...ewpost&t=25638
This has been implemented in Gpuowl, mprime/prime95, and on the PrimeNet server. It is planned to be added to Mlucas also.



Top of this reference thread: https://www.mersenneforum.org/showth...736#post510736
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-02-02 at 19:28 Reason: updated statement of PRP proof/cert implementation status
kriesel is offline  
Old 2019-05-19, 17:11   #14
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

52·197 Posts
Default Why don't we self test the applications, immediately before starting each primality test?

Why don't we self test the applications, immediately before starting each primality test? For the same fft length about to be used for a primality test such as a current wavefront test, and any 100Mdigit exponent or higher? Perhaps also upon resumption of an exponent?

(part of this was first posted as https://www.mersenneforum.org/showpo...0&postcount=10)
Quote:
I think it would be a plus if future releases of primality testing software performed a brief self test before beginning each primality test, and if found unreliable, AT THAT TIME, refused to proceed with a primality test, instead providing the user with recommendations for improving reliability. Perhaps a fast small block of PRP/Gerbicz check, even if what's being run is LL; on the same exponent/fft length, to test more closely what's about to be run.
Hardware reliability changes with time and temperature and other factors. A self test of the same fft size checks that fft transforms and multiplications can be reliably done. If the self test was a couple of blocks of PRP/GC, it could be a useful small increment of a cat 4 PRP double check.

Users might find the checks annoying or regard them as lost throughput. Running LL on 100Mdigit exponents would be disincentivized, since it would involve working also on a 100Mdigit PRP DC so that there is an fft length match. One might as well run PRP for 100Mdigit exponents, and avoid the side self test or commitment to doing a 100Mdigit DC. Increasing adoption of PRP and reducing LL for 100Mdigit exponents is a good thing.

There are some application-specific or interface-specific reasons.
There is no GIMPS PRP code for CUDA or Gerbicz check code for CUDA.
There is no provision for self test of fft lengths larger than 8192K in CUDALucas.


Top of this reference thread: https://www.mersenneforum.org/showth...736#post510736
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2020-02-20 at 19:34
kriesel is offline  
Old 2019-05-20, 02:04   #15
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

52·197 Posts
Default Why don't we occasionally manually submit progress reports for long-duration manual primality tests?

There's currently no way to do that.
This is a CUDALucas console output line:
Code:
|  May 19  20:00:49  |  M49602851  30050000  0x05c21ef8e9eac8b2  |  2688K  0.15625   2.0879  104.39s  |     11:15:47  60.58%  |
https://www.mersenne.org/manual_result/ does not understand it.
Done processing:
* Parsed 1 lines.
* Found 0 datestamps.

GHz-days Qty Work Submitted Accepted Average 0 - all - 0.000
  • Did not understand 1 lines.
  • Recognized, but ignored 0/0 of the remaining lines.
  • Skipped 0 lines already in the database.
  • Accepted 0 lines.
There's no way to report progress of a gpu-based manual primality test or lengthy P-1 factoring run or long TF run, so from the primenet server's point of view, progress remains at 0.0%. This means sometimes they prematurely expire. It would be useful if the manual results processing script would accept progress reports in CUDALucas console output form as in the example above, even if it was limited to accepting reports with iteration counts that were multiples of 1M or 10M. See also https://www.mersenneforum.org/showthread.php?t=24262
Accepting gpuowl progress records would also be very useful.


Top of this reference thread: https://www.mersenneforum.org/showth...736#post510736
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2020-06-27 at 14:30
kriesel is offline  
Old 2019-10-02, 05:09   #16
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

52·7·53 Posts
Default Why don't we extend B1 or B2 of an existing no-factor P-1 run?

Quote:
Originally Posted by kriesel View Post
Neither GpuOwl nor CUDAPm1 have yet implemented B1 extension from an existing save file. Consequently a run to a higher B1 for the same exponent currently requires starting over, repeating a lot of computation.

Neither GpuOwl nor CUDAPm1 have yet implemented B2 extension from an existing savefile. Consequently a run to a higher B2 for the same exponent currently requires starting over, repeating a lot of computation.
B2 extension is trivial. Moreover, you only need to save the residue after stage 1 finished, and the last B2 value (or range). That is because every stage 2 "chunk" (or cluster) does not use the results from the former "chunks", but it only uses the results at the end of stage 1, and the current stage 2 "chunk", and the "chunks" increase until B2 is reached. So, if you have a save file at the end of stage 1 (when B1 was reached), you technically could do independently "stage 2 from B2_start_x to B2_end_x", in x computers in parallel.

Extending B1 is a bit trickier, because you need to recompute the additional small primes that fit into the new B1, and do the exponentiation required to include them into the new product (b^E). There is a piece of pari/gp Pm1 code I posted some time ago which does B1 extension, but that is slow because first of all, it is pari, and second, it only uses "chunks" of 2 primes (i.e. no stage 2 extensions), but it can save intermediary files and extend B1 too.

Also, once you extend B1, then you must do stage 2 "from scratch", whatever stage 2 you did before, for the same B2 (or more, or less) is void.

(Kriesel:) Mostly though, we don't do P-1 bounds extensions because:
  • The code to do so does not exist in our available GIMPS production software.
  • The work type is not defined in the PrimeNet API
  • P-1 extension assignments don't exist on the server web interface for manual work assignments.
  • P-1 is a lesser development priority right now.
  • P-1 is a smaller fraction of the work on an exponent than primality testing.
  • CUDAPm1 is still labeled alpha software and does not implement bounds extension
  • gpuowl P-1 is relatively new and does not implement bounds extension
  • Often the user wanting to increase bounds is not the user who did some previous bounds, and does not have access to the files from previous runs.
  • Some software may not even save those files after completing a run.
  • There's no need to extend if the bounds were both adequate on an earlier run.
  • It's more efficient to do P-1 once, with adequate bounds the first time.

Top of this reference thread: https://www.mersenneforum.org/showth...736#post510736
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-03-02 at 19:53 Reason: Add title, list of reasons for status quo
LaurV is offline  
Old 2020-06-19, 19:41   #17
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

133D16 Posts
Default Why don't we do proofs and certificates instead of double checks and triple and higher?

Update:
We can and do. Everyone who can upgrade to PRP, GEC, and proof generation for first primality tests (prime95/mprime v30.3 or later; gpuowl ~v6.11-316 or later; mprime v20 coming at some point, meanwhile use v19.1 for PRP/GEC without proof generation) should do so as soon as possible, and stop performing LL first tests.


Original post:
Because we didn't know it was possible to do proofs of PRP tests for these huge Mersenne numbers at considerably less effort than a repeat PRP test or repeat LL test until recently. The development of new code to do proofs and verifications, followed by widespread deployment of client applications to do proofs, and server infrastructure to accept proofs and perform verifications, will take around a year or more to complete.
Gpuowl is closest to being ready to provide proofs.
Prime95 and Mlucas haven't begun to get this added yet as of mid June 2020.
There's also separate verifier code to write.
Server modification for storing new data types.
Manual result handling modification.
Extension of the Primenet API to accommodate it for prime95.

Some threads regarding this recent development are

Announcement The Next Big Development for GIMPS
(Layperson's and informal discussion here)

Technical VDF (Verifiable Delay Function) and PRP
(Leave this one for the number theorists and crack programmers)

Technical background: Efficient Proth/PRP Test Proof Scheme
(Also a math/number-theory thread, let's leave this one for theorists too)

This is an exciting development. It offers elimination of almost all confirmation effort on future PRP tests, so will substantially increase testing throughput (eventually). It is a high priority for design and implementation right now. Other possible gpuowl enhancements are likely to wait until this is at least ready for some final testing.


Top of this reference thread: https://www.mersenneforum.org/showth...736#post510736
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-02-16 at 16:41
kriesel is offline  
Old 2020-06-27, 14:49   #18
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

52·197 Posts
Default Why don't we run gpu P-1 factoring's gcds on the gpus?

The software doesn't exist.

Currently CUDAPm1 stalls the gpu it runs on, for the duration of a stage 1 or stage 2 gcd that runs on one core of the system cpu.
Earlier versions of gpuowl that performed P-1 also stalled the gpu while running the gcd of a P-1 stage on a cpu core. At some point, Mihai reprogrammed it so a separate thread ran the gcd on one core of the cpu, while the gpu went ahead and speculatively began the second stage of the P-1 factoring in parallel with the stage 1 gcd, or the next worktodo assignment in parallel with the stage 2 gcd when one is available.
In all cases, these gcds are performed by the gmp library.
(About 98% of the time, a P-1 factoring stage won't find a factor, so continuing is a good bet, and preferable to leaving the gpu idle during the gcd computation.)

It was more efficient use of programmer time to implement it that way quickly, using an existing library routine.

On a fast cpu the impact is small. On slow cpus hosting fast gpus it is not.

Borrowing a cpu core for the gcd has the undesirable effect of stopping a worker in mprime or prime95 for the duration, and may also slow mlucas, unless hyperthreading is available and effective.

To my knowledge no one has yet written a gpu-based gcd routine for GIMPS size inputs.
For gpu use for gcd in other contexts see http://www.cs.hiroshima-u.ac.jp/cs/_...apdcm15gcd.pdf (RSA) and https://domino.mpi-inf.mpg.de/intran...FILE/paper.pdf (polynomials).
If one was written for the large inputs for current and future GIMPS work, a new gcd routine for the gpu could be difficult to share between CUDAPm1 and gpuowl, since gpuowl is OpenCL based but CUDAPm1 is CUDA based, and the available data structures probably differ significantly.


Top of this reference thread: https://www.mersenneforum.org/showth...736#post510736
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-03-02 at 19:55
kriesel is offline  
Old 2020-12-16, 17:25   #19
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

133D16 Posts
Default Why don't we use 2 instead of 3 as the base for PRP or P-1 computations?

Mersenne numbers are base-2 pseudoprimes. All would be indicated as prime in P-1 factoring or Fermat PRP tests, whether actually prime or composite. Using 3 as the base provides useful information, and costs no more computing time; using 2 as the base provides no useful information. That's a summary of my understanding of this thread as it relates to base choice.


Top of this reference thread: https://www.mersenneforum.org/showth...736#post510736
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2020-12-17 at 16:41
kriesel is offline  
Closed Thread

Thread Tools


All times are UTC. The time now is 02:45.

Sat Mar 6 02:45:54 UTC 2021 up 92 days, 22:57, 1 user, load averages: 1.95, 1.66, 1.64

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.