![]() |
![]() |
#34 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
176518 Posts |
![]()
Save files would be a requirement for significant end user adoption providing other concerns are addressed, IMO.
So too would be input from a worktodo file containing a queue of work, and output to a results file, plain text in both cases. For examples for other programs, see https://www.mersenneforum.org/showpo...8&postcount=22 and https://www.mersenneforum.org/showpo...9&postcount=33. A few comparisons, before leaving the subject: subjects of this thread: GP_LLT 209KB (not far from the average size of a virus) proof.exe 149 KB (ditto) MPOC.exe 4MB highly compressible 22K zip size (no source or documentation for any of them) Some old friends, and their exe sizes: Code:
mfaktc 1.5-1.8MB source available mfakto 1.6MB " CLLucas 293KB source available CUDALucas 350KB-1200KB " CUDAPm1 530KB " gpuowl 1.8MB " Mlucas 30MB distributed as source only Prime95 40MB source available with a small exception related to security strings Windows thread creation is variously stated as ~40 microseconds each; linux 10, and the value of keeping a thread around for a long time is discussed at https://stackoverflow.com/questions/...ating-a-thread 8Mi threads / iteration x 40usec/thread / 64 cores ~ 5.24 seconds estimated elapsed thread creation overhead per iteration for GP_LLT @ ~OBD. Probably longer because Xeon Phi cores are clocked slow. ~552. years for thread creation overhead! Quote:
Ok I think that about does it. It's been fun. Last fiddled with by kriesel on 2023-01-07 at 01:42 |
|
![]() |
![]() |
![]() |
#35 |
"Jacob"
Sep 2006
Brussels, Belgium
196710 Posts |
![]()
Took a snapshot of a virtual machine. CPU i9-10920X down-clocked to 2200 MHz, 2666 MHz DDR4 memory.
Started it, disabled network. Started test on M62818631 with GP_LLT 1.0.0.5. Factoring is extremely slow : it uses one core only. Going from k=1 to k=24000 (2*24000*62818631 is less than 42 bits !) took more than 20 minutes. Prime95 goes from k=1 to k=140023 (26 to 42 bits factors) in no time that I could measure, lets say 1 second. For factoring the program is more than a thousand times slower than Prime95 (which is CPU based, graphic cards bring trial factoring speed to another height.) Then I started the test of that number : the ETA was somewhere in 2028, iteration times where on the order of 2,7 seconds (compared with 0,0026 seconds for Prime95.) Again a thousand times slower than Prime95. After some 120 iterations the files size was already 8 MB. The whole run would increase the file to about 4 TB. The output files are not readable as such. The Proof program asks to choose a o0 length rdl file (but probably uses the nsp file). The MPC program measures something and gives the output as percentages. The conclusion is "Bravo !" for programming, but there is some work to do to start being useful for current GIMPS work. Reverted the virtual machine to the situation before the snapshot. Last fiddled with by S485122 on 2023-01-09 at 11:55 Reason: ETA kept jumping around, used it timings instead. |
![]() |
![]() |
![]() |
#36 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
11111101010012 Posts |
![]() Quote:
That low GP_LLT version # is funny; https://mersenneforum.org/showpost.p...27&postcount=8 indicates v2.2 on the zip file. Somewhere he uses the term divide in describing its trial factoring. Perhaps it is so slow from doing long division rather than modpow. FYI, for benchmarking purposes, set TimeStamp=2 in prime95's prime.txt for 1 second worker window timing resolution. One could run several bits deeper in prime95 TF to get some semblance of timing resolution. I wonder if it finds low factors within its theoretical & practical reach. https://www.mersenne.ca/exponent/10000103 has a small factor. Or how it handles multiple small factors. https://www.mersenne.ca/exponent/10000303 has 3 small factors. Did you happen to extract any resultant res64/iteration-count pairs from any file? It would be interesting to see whether it's at least producing correct res64 values. Especially after iteration #25 or so (exponent dependent), when the mod 2p-1 begins to have effect. For p=10000019, prime95 yields FA16B251080501DC for the 100th iteration res64 (which it numbers 102) on a seed 4 LL start. Other apps should match. (It's not like PRP, where there are many types, and prime95's interim residues don't match mlucas or gpuowl for type 1 interim residues, but do match at the final output.) You mention seeing file growth to 8MB in 120 iterations on ~62M. What's growing, and did it appear to be linear with iteration count? (Prime95 does some file space allocation or preallocation which generates space growth early that is not linear with iteration count.) Did you see any evidence of the ability to stop and resume a primality test from a save file? User set save intervals? "Some work to do" indeed for raising it from academic exercise to useful computation rate. A day well spent in a good university math library or on this forum, before beginning to code, could have helped greatly in spending the programming effort to better effect. I hope the OP found the journey educational. Maybe after a break he'll try again with a more efficient choice of algorithms and design. I'm putting together some interim and final residues for some exponents, spaced about 10:1 in exponent and iteration count, both LL and PRP type 1, from multiple applications. The final residues will be a subset due to run time or other limitations on available software and hardware for even the top 3 performing GIMPS primality test programs. It will likely become a developer's corner reference thread post. |
|
![]() |
![]() |
![]() |
#37 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
176518 Posts |
![]() |
![]() |
![]() |
![]() |
#38 |
"Jacob"
Sep 2006
Brussels, Belgium
7×281 Posts |
![]() |
![]() |
![]() |
![]() |
#39 |
Jan 2023
Alberta, Canada
1210 Posts |
![]()
Thanks very much for the feedback, Jacob. I appreciate it. Truth be told though, it's a smidge faster than I thought it would be..
You're right that the Trial Factoring sucks. I just threw it in there to prevent someone from doing an entire LLT for no reason (i.e. a discoverable small factor). Also, the "whole run" of M62818631 would be 960 Meg. The "snp" file (short for "snapshot") is just the last successful iteration (in full), plus some other metrics, for restart purposes. The "rdl" file (short for ResiDue List), will contain exactly (P-1) 16 byte records (960 Meg) when the LLT is complete; one for each iteration, numbered from zero (S == 4). So on a slowed down I9, your ETA was 2028; on mine, a battered I5 with only four cores, it was Feb, 2040. That sounds about right.. Still curious as ever though, to see what it could do on a 128-core system. It was built to fully exploit multiple cores, so I've been looking at those systems. Surprisingly affordable they are now. Also correct when you point out that the rdl file can sometimes be zero length. Until the candidate is proven prime, the rdl file only contains "blocks" of iterations, a "block" being 32,768 iterations. If you've performed less than a block, the remainder is stored in the snp file. It was just efficient to do it that way. More than you wanted to know, I'm sure. And thanks for the Bravo. Do you think I should replace the grammar school squaring with a parallelized Karatsuba? Karatsuba gets kind of klunky, and expensive (bytes), when you parallelize it. |
![]() |
![]() |
![]() |
#40 |
Undefined
"The unspeakable one"
Jun 2006
My evil lair
5·1,373 Posts |
![]() |
![]() |
![]() |
![]() |
#41 |
6809 > 6502
"""""""""""""""""""
Aug 2003
101×103 Posts
101011100010012 Posts |
![]()
Maybe study up on FFT and try to implement that. If you want to speed your program up, that is a much better choice. It will take to to learn and implement. But, if you make a serious attempt at it, you will get plenty of help with your code. This assumes that you share your code when you ask for help.
|
![]() |
![]() |
![]() |
#42 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5·1,621 Posts |
![]()
Learn from the masters. Multilevel ffts; multiple sizes and shapes. Irrational base discrete weighted transforms to avoid need for zero padding. No grammar school or karatsuba to be found.
Download the software and read sources and documentation, and reference info. Try the programs out. gpuowl: 3 level fft; 2 power-of-two for speed and one in the middle up to 13-smooth for flexibility on total size. Usually 13-smooth is not faster than the next higher 7-smooth fft length. from the help output, height:middle:tail alternatives for equal length, Code:
FFT 3.25M [ 5.11M - 63.93M] 256:13:512 512:13:256 FFT 3.50M [ 5.51M - 68.67M] 1K:7:256 256:14:512 256:7:1K 512:14:256 512:7:512 mlucas: 3 to 5 level fft typically. Up to 31-smooth IIRC. No single level uses as big as 1024. Most are power-of-two, which typically are the most efficient. It includes benchmarking the several alternate layouts per length and records what is best for the hardware present. Learn non-power-of-two ffts. These allow tailoring the overall length for a closer more efficient fit to the task size. Adopt type 1 PRP. It enables FAR SUPERIOR error detection and correction by rewind to last known good save file. 100.000000% good full test for PRP with Gerbicz error check, vs LL's limited 50% detection rate of erroneous full residues. In practice, an LL first test is useless. These are typically in error 2% of the time per test. So cost is 1 first test + 1 DC + 4% probability of a third test for verification, 2.04 total. PRP also enables proof file generation, and a power 9 proof file allows verification for ~0.2% of the effort of a first test. PRP effort for test + verification ~1.002 tests, total; 2.04/1.002 ~ 2.036 times faster than LL + LLDC + sometimes LLTC etc. It's more cost effective to ignore an LL first test and perform a PRP with proof generation on the same exponent, saving 1.02-1.002 = .018 test equivalent. Last fiddled with by kriesel on 2023-01-10 at 06:24 |
![]() |
![]() |
![]() |
#43 |
Jan 2023
Alberta, Canada
22·3 Posts |
![]()
"PRP" stands for "Probable Prime", right?
Will the EFF accept as a "discovery" that some number is "probably prime"? I believe the answer is no. You are right that I don't know much about "Multilevel ffts". But they're sequential algorithms, right? Or has someone parallelized them? Dunno. I just think that in an increasingly parallel-processed world, you have to rethink the sequential algorithms, especially recursive ones, towards a more uniform parallel computational framework. Because Moore's Law is waning, and it's being replaced by parallel processors/processing, and the software has to change to keep up. End of diatribe. BTW, Jacob was also right about the version of GP_LLT. Microsoft automatically puts a "version" resource in its compiled executables, which I didn't maintain. It defaults to whatever he said it was. I could replace it with a new one, but then I'd have to suffer another four pages of accusations and name-calling. Nah. Pass. As for source code, I have no idea what I might do with it. Maybe just bite the bullet, buy the damn supercomputer, and win both remaining EFF awards with it. Except the hardware isn't quite there yet.. Or I might sell it, lock, stock, and barrel. Dunno. Or I might publish. Hard to say. |
![]() |
![]() |
![]() |
#44 |
6809 > 6502
"""""""""""""""""""
Aug 2003
101×103 Posts
3·5·743 Posts |
![]()
But, with the error checking and the ability to verify the PRP run (which eliminates the need for a DC), PRP is now the best tool to scan exponents for primality. If a number passes, then it will be confirmed by several different hardware architectures with several different programs, doing the LL test. The PRP test, as it is now, also us to test and verify more numbers quicker.
Some processes that are sequential can also be parallelized. |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Any way to pipe GMP mpz_t into PFGW? | hansl | Software | 3 | 2019-04-09 19:44 |
Is there any program... | mart_r | Software | 2 | 2009-11-15 20:06 |
So you think you can program | rogue | Lounge | 5 | 2009-10-02 15:02 |
Program | Primeinator | Information & Answers | 5 | 2009-07-16 21:42 |
which program? | drakkar67 | Prime Sierpinski Project | 14 | 2005-11-29 06:25 |