20230107, 01:10  #34  
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
17651_{8} Posts 
Save files would be a requirement for significant end user adoption providing other concerns are addressed, IMO.
So too would be input from a worktodo file containing a queue of work, and output to a results file, plain text in both cases. For examples for other programs, see https://www.mersenneforum.org/showpo...8&postcount=22 and https://www.mersenneforum.org/showpo...9&postcount=33. A few comparisons, before leaving the subject: subjects of this thread: GP_LLT 209KB (not far from the average size of a virus) proof.exe 149 KB (ditto) MPOC.exe 4MB highly compressible 22K zip size (no source or documentation for any of them) Some old friends, and their exe sizes: Code:
mfaktc 1.51.8MB source available mfakto 1.6MB " CLLucas 293KB source available CUDALucas 350KB1200KB " CUDAPm1 530KB " gpuowl 1.8MB " Mlucas 30MB distributed as source only Prime95 40MB source available with a small exception related to security strings Windows thread creation is variously stated as ~40 microseconds each; linux 10, and the value of keeping a thread around for a long time is discussed at https://stackoverflow.com/questions/...atingathread 8Mi threads / iteration x 40usec/thread / 64 cores ~ 5.24 seconds estimated elapsed thread creation overhead per iteration for GP_LLT @ ~OBD. Probably longer because Xeon Phi cores are clocked slow. ~552. years for thread creation overhead! Quote:
Ok I think that about does it. It's been fun. Last fiddled with by kriesel on 20230107 at 01:42 

20230109, 11:36  #35 
"Jacob"
Sep 2006
Brussels, Belgium
1967_{10} Posts 
Took a snapshot of a virtual machine. CPU i910920X downclocked to 2200 MHz, 2666 MHz DDR4 memory.
Started it, disabled network. Started test on M62818631 with GP_LLT 1.0.0.5. Factoring is extremely slow : it uses one core only. Going from k=1 to k=24000 (2*24000*62818631 is less than 42 bits !) took more than 20 minutes. Prime95 goes from k=1 to k=140023 (26 to 42 bits factors) in no time that I could measure, lets say 1 second. For factoring the program is more than a thousand times slower than Prime95 (which is CPU based, graphic cards bring trial factoring speed to another height.) Then I started the test of that number : the ETA was somewhere in 2028, iteration times where on the order of 2,7 seconds (compared with 0,0026 seconds for Prime95.) Again a thousand times slower than Prime95. After some 120 iterations the files size was already 8 MB. The whole run would increase the file to about 4 TB. The output files are not readable as such. The Proof program asks to choose a o0 length rdl file (but probably uses the nsp file). The MPC program measures something and gives the output as percentages. The conclusion is "Bravo !" for programming, but there is some work to do to start being useful for current GIMPS work. Reverted the virtual machine to the situation before the snapshot. Last fiddled with by S485122 on 20230109 at 11:55 Reason: ETA kept jumping around, used it timings instead. 
20230109, 14:10  #36  
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
1111110101001_{2} Posts 
Quote:
That low GP_LLT version # is funny; https://mersenneforum.org/showpost.p...27&postcount=8 indicates v2.2 on the zip file. Somewhere he uses the term divide in describing its trial factoring. Perhaps it is so slow from doing long division rather than modpow. FYI, for benchmarking purposes, set TimeStamp=2 in prime95's prime.txt for 1 second worker window timing resolution. One could run several bits deeper in prime95 TF to get some semblance of timing resolution. I wonder if it finds low factors within its theoretical & practical reach. https://www.mersenne.ca/exponent/10000103 has a small factor. Or how it handles multiple small factors. https://www.mersenne.ca/exponent/10000303 has 3 small factors. Did you happen to extract any resultant res64/iterationcount pairs from any file? It would be interesting to see whether it's at least producing correct res64 values. Especially after iteration #25 or so (exponent dependent), when the mod 2^{p}1 begins to have effect. For p=10000019, prime95 yields FA16B251080501DC for the 100th iteration res64 (which it numbers 102) on a seed 4 LL start. Other apps should match. (It's not like PRP, where there are many types, and prime95's interim residues don't match mlucas or gpuowl for type 1 interim residues, but do match at the final output.) You mention seeing file growth to 8MB in 120 iterations on ~62M. What's growing, and did it appear to be linear with iteration count? (Prime95 does some file space allocation or preallocation which generates space growth early that is not linear with iteration count.) Did you see any evidence of the ability to stop and resume a primality test from a save file? User set save intervals? "Some work to do" indeed for raising it from academic exercise to useful computation rate. A day well spent in a good university math library or on this forum, before beginning to code, could have helped greatly in spending the programming effort to better effect. I hope the OP found the journey educational. Maybe after a break he'll try again with a more efficient choice of algorithms and design. I'm putting together some interim and final residues for some exponents, spaced about 10:1 in exponent and iteration count, both LL and PRP type 1, from multiple applications. The final residues will be a subset due to run time or other limitations on available software and hardware for even the top 3 performing GIMPS primality test programs. It will likely become a developer's corner reference thread post. 

20230109, 15:20  #37 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
17651_{8} Posts 

20230109, 18:02  #38 
"Jacob"
Sep 2006
Brussels, Belgium
7×281 Posts 

20230110, 00:18  #39 
Jan 2023
Alberta, Canada
12_{10} Posts 
Merci
Thanks very much for the feedback, Jacob. I appreciate it. Truth be told though, it's a smidge faster than I thought it would be..
You're right that the Trial Factoring sucks. I just threw it in there to prevent someone from doing an entire LLT for no reason (i.e. a discoverable small factor). Also, the "whole run" of M62818631 would be 960 Meg. The "snp" file (short for "snapshot") is just the last successful iteration (in full), plus some other metrics, for restart purposes. The "rdl" file (short for ResiDue List), will contain exactly (P1) 16 byte records (960 Meg) when the LLT is complete; one for each iteration, numbered from zero (S == 4). So on a slowed down I9, your ETA was 2028; on mine, a battered I5 with only four cores, it was Feb, 2040. That sounds about right.. Still curious as ever though, to see what it could do on a 128core system. It was built to fully exploit multiple cores, so I've been looking at those systems. Surprisingly affordable they are now. Also correct when you point out that the rdl file can sometimes be zero length. Until the candidate is proven prime, the rdl file only contains "blocks" of iterations, a "block" being 32,768 iterations. If you've performed less than a block, the remainder is stored in the snp file. It was just efficient to do it that way. More than you wanted to know, I'm sure. And thanks for the Bravo. Do you think I should replace the grammar school squaring with a parallelized Karatsuba? Karatsuba gets kind of klunky, and expensive (bytes), when you parallelize it. 
20230110, 00:47  #40 
Undefined
"The unspeakable one"
Jun 2006
My evil lair
5·1,373 Posts 

20230110, 03:49  #41 
6809 > 6502
"""""""""""""""""""
Aug 2003
101×103 Posts
10101110001001_{2} Posts 
Maybe study up on FFT and try to implement that. If you want to speed your program up, that is a much better choice. It will take to to learn and implement. But, if you make a serious attempt at it, you will get plenty of help with your code. This assumes that you share your code when you ask for help.

20230110, 06:20  #42 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5·1,621 Posts 
Learn from the masters. Multilevel ffts; multiple sizes and shapes. Irrational base discrete weighted transforms to avoid need for zero padding. No grammar school or karatsuba to be found.
Download the software and read sources and documentation, and reference info. Try the programs out. gpuowl: 3 level fft; 2 poweroftwo for speed and one in the middle up to 13smooth for flexibility on total size. Usually 13smooth is not faster than the next higher 7smooth fft length. from the help output, height:middle:tail alternatives for equal length, Code:
FFT 3.25M [ 5.11M  63.93M] 256:13:512 512:13:256 FFT 3.50M [ 5.51M  68.67M] 1K:7:256 256:14:512 256:7:1K 512:14:256 512:7:512 mlucas: 3 to 5 level fft typically. Up to 31smooth IIRC. No single level uses as big as 1024. Most are poweroftwo, which typically are the most efficient. It includes benchmarking the several alternate layouts per length and records what is best for the hardware present. Learn nonpoweroftwo ffts. These allow tailoring the overall length for a closer more efficient fit to the task size. Adopt type 1 PRP. It enables FAR SUPERIOR error detection and correction by rewind to last known good save file. 100.000000% good full test for PRP with Gerbicz error check, vs LL's limited 50% detection rate of erroneous full residues. In practice, an LL first test is useless. These are typically in error 2% of the time per test. So cost is 1 first test + 1 DC + 4% probability of a third test for verification, 2.04 total. PRP also enables proof file generation, and a power 9 proof file allows verification for ~0.2% of the effort of a first test. PRP effort for test + verification ~1.002 tests, total; 2.04/1.002 ~ 2.036 times faster than LL + LLDC + sometimes LLTC etc. It's more cost effective to ignore an LL first test and perform a PRP with proof generation on the same exponent, saving 1.021.002 = .018 test equivalent. Last fiddled with by kriesel on 20230110 at 06:24 
20230110, 23:04  #43 
Jan 2023
Alberta, Canada
2^{2}·3 Posts 
PRP
"PRP" stands for "Probable Prime", right?
Will the EFF accept as a "discovery" that some number is "probably prime"? I believe the answer is no. You are right that I don't know much about "Multilevel ffts". But they're sequential algorithms, right? Or has someone parallelized them? Dunno. I just think that in an increasingly parallelprocessed world, you have to rethink the sequential algorithms, especially recursive ones, towards a more uniform parallel computational framework. Because Moore's Law is waning, and it's being replaced by parallel processors/processing, and the software has to change to keep up. End of diatribe. BTW, Jacob was also right about the version of GP_LLT. Microsoft automatically puts a "version" resource in its compiled executables, which I didn't maintain. It defaults to whatever he said it was. I could replace it with a new one, but then I'd have to suffer another four pages of accusations and namecalling. Nah. Pass. As for source code, I have no idea what I might do with it. Maybe just bite the bullet, buy the damn supercomputer, and win both remaining EFF awards with it. Except the hardware isn't quite there yet.. Or I might sell it, lock, stock, and barrel. Dunno. Or I might publish. Hard to say. 
20230110, 23:19  #44 
6809 > 6502
"""""""""""""""""""
Aug 2003
101×103 Posts
3·5·743 Posts 
But, with the error checking and the ability to verify the PRP run (which eliminates the need for a DC), PRP is now the best tool to scan exponents for primality. If a number passes, then it will be confirmed by several different hardware architectures with several different programs, doing the LL test. The PRP test, as it is now, also us to test and verify more numbers quicker.
Some processes that are sequential can also be parallelized. 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Any way to pipe GMP mpz_t into PFGW?  hansl  Software  3  20190409 19:44 
Is there any program...  mart_r  Software  2  20091115 20:06 
So you think you can program  rogue  Lounge  5  20091002 15:02 
Program  Primeinator  Information & Answers  5  20090716 21:42 
which program?  drakkar67  Prime Sierpinski Project  14  20051129 06:25 