mersenneforum.org FastECPP software and 50000 digit primality proof (reposted from NMBRTHRY)
 Register FAQ Search Today's Posts Mark Forums Read

 2022-05-24, 23:57 #122 Batalov     "Serge" Mar 2008 Phi(4,2^7658614+1)/2 232008 Posts That's the same what Andreas wrote - paths are different on various architectures (likely only gated by number of threads/workers). I'd sent him the complete .cert1 and corresponding log file so he could sort-of-restore the interesting state (I had a state where the process reproducibly would go for ~300 core-hours alone; the highest prime had iirc 8-digit size). And he was able to get to that state. If/when he started from the top, his process didn't get stalled (different path). One path that I could suggest - Step 1. grok the structure of .cert1 (it is an array of arrays?), and then Step 2. truncate the .cert1 that is in some sort of a stalemate and run it for a while with different (less?) number of workers; you might get a different path. Later the successful bypass can be continued on the larger cluster. It is similar to what one could do to Primo especially when it simply fails in Step 1 (you can play with config file, various parameters affect exit criteria...) Another path (if this is in step [0]) might be to advance Andreas' own idea where he relaxed acceptance criteria; relax them maybe even to a totally silly degree; achieve the effect that proof advances to step [1] with even a 2 or 4 bit gain, it doesn't matter. It will be out of a trap. Then use the standard binary.) I think this will essentially imitate a backtrack.
 2022-05-25, 16:50 #123 frmky     Jul 2003 So Cal 22·599 Posts Increasing the list in cm_nt_next_prime() unfortunately didn't work. It ran for over a day (rather than stopping after a few hours) increasing its memory use until it was killed by the OOM killer. Andreas confirmed that the path taken depends on the number of workers. When I manually backtracked I restarted with the same number of workers, which explains why I got the same result. Yesterday I deleted a few steps at the end of cert1 then ran it overnight using the single-threaded version. It followed a new path, so I'm back on track. The next time this happens I'll try just backing up a step or two and changing the number of workers. Although it appears the code does need proper backtracking.
 2022-05-26, 01:31 #124 Batalov     "Serge" Mar 2008 Phi(4,2^7658614+1)/2 27×7×11 Posts Good! Just an idea borrowed from molecular modeling -- I also think some imitation of Simulated annealing might help. Simply put, early in a step cycle to not accept small gains, but then allow them slowly more and more permissively as a function of time. Then restart the clock at each step.
 2022-05-26, 12:13 #125 jwaltos     Apr 2012 Oh oh. 1110011012 Posts I've read the last several pages of this thread with interest. I can only imagine some of the expletives as some of you were coding through. It's possible this is how the term "coded" (as in "++" medically speaking) originated. My impression is that this methodology is inextricably "black-boxish." Isn't there a better conceptual path to trail break theoretically or is there a theoretical impasse..rather, what are they? Regarding the last post and simulated annealing, protein folding models and tactical strategy vis a vis GO can also be considered regarding geometric visualization and logical progression..if germane to the line of questioning considered. As always, consideration of the complexity class(es) to which these inquiries belong is a usual starting point. Last fiddled with by jwaltos on 2022-05-26 at 12:18 Reason: NP complete
 2022-05-27, 18:35 #126 ryanp     Jun 2012 Boulder, CO 18E16 Posts Feature request: It would be nice to have some kind of -l logfile flag, which would print timestamped log messages to the given file (similar to msieve). Perhaps this could include somewhat more detail about what's happening than what is normally printed to stdout.
 2022-05-28, 09:26 #127 xilman Bamboozled!     "𒉺𒌌𒇷𒆷𒀭" May 2003 Down not across 83·137 Posts I am not sure whether anyone else has noticed this one and/or posted a workaround so here it is in case it proves useful. When mpirun fires up a number of processes it will kill them if the shell from which it starts them is itself terminated. This occurs, for instance, when you log out of an interactive session and happens regardless of whether the mpirun is put in the background. By far the easiest way of avoiding this feature is to run Code: setsid mpirun -np ...
2022-05-28, 19:25   #128
Luminescence

Oct 2021
Germany

24·7 Posts

Quote:
 Originally Posted by paulunderwood I have made a number of changes gw_prp.c (attached): mpz strings are now base 16 for the system call to gw_prp added bound check for A in the calculation on the discriminant D used mpz instead of giants for loops and bit tests moved squareness test to after Fermat PRP test Important: Correspondingly the hack to lib/nt.c is: Code: int cm_nt_is_prime (mpz_t a) { char str[100000]; if ( mpz_sizeinbase (a, 2) < 26000 ) { return (mpz_probab_prime_p(a, 0)>0); } strcpy(str, "/home/paul/Downloads/p95/gw_prp "); // note space in the string strcat(str, mpz_get_str(NULL, 16, a)); return (system(str)); } You have to set your own path(s) in the hack!!
I transplanted your code into nt.c, swapped
Code:
mpz_probab_prime_p(a, 0)>0

out for

mpz_millerrabin (a, 0) > 0
and learned how to get it to compile with autotools. It did not like gwnum.a and told me I can't compile shared libraries with it.

ECPP-MPI is a bit faster (very roughly about 10%), though I am not sure if it's caused by the transplant or static compiling... but it's definitely a much cleaner way of doing it and the internal clock seems sane again.

2022-05-28, 21:24   #129
R. Gerbicz

"Robert Gerbicz"
Oct 2005
Hungary

112×13 Posts

For those who has big guns.
I believe this could be faster, for that runs where the current code spends a lot of time on trial division comparing to the the total running time [big N value with a lot of cores]. OK, you/we could lower B value also (and with that you'll sieve out fewer numbers). Here it is another way, keeping the same sievelimit, but reducing the memory a lot, with this we can avoid a lot longer the cuts at m values, in the code in fact there is no such cut. Should be faster, in theory, and on paper, could we see some comparison? A few steps is enough.

Put the 4 files into lib folder, and "make install" ( I have not included paul's trivial prp file ).

ps. Furthermore the rewritten primorial computation is really somewhat slower but that is intended to also save memory. The new single threaded "ecpp" code also works, but that is not that interesting. There could be bug, in the last bug I've "lost" the communication at a 15k number, but it was good at 1-2k digits number, really strange. More later.
Attached Files
 files.zip (31.6 KB, 28 views)

Last fiddled with by R. Gerbicz on 2022-05-28 at 21:25

2022-05-29, 01:49   #130
frmky

Jul 2003
So Cal

22×599 Posts

Quote:
 Originally Posted by R. Gerbicz but reducing the memory a lot
This is great! It solves the issue I was having getting it to run on the 32GB Arm system. Thanks!

2022-05-29, 03:57   #131
paulunderwood

Sep 2002
Database er0rr

32×467 Posts

Quote:
 Originally Posted by Luminescence I transplanted your code into nt.c, swapped Code: mpz_probab_prime_p(a, 0)>0 out for mpz_millerrabin (a, 0) > 0 and learned how to get it to compile with autotools. It did not like gwnum.a and told me I can't compile shared libraries with it. ECPP-MPI is a bit faster (very roughly about 10%), though I am not sure if it's caused by the transplant or static compiling... but it's definitely a much cleaner way of doing it and the internal clock seems sane again.
I am not sure about the chances of hitting upon a strong pseudoprime at these bit levels compared to failing a Fermat+Lucas test.

How did you overcome the the gwnum.a problem?

What is "10% faster"? Compared to what?

2022-05-29, 07:50   #132
Luminescence

Oct 2021
Germany

24×7 Posts

Quote:
 Originally Posted by paulunderwood How did you overcome the the gwnum.a problem?
Upon the direct implementation each step was very roughly about 10% faster compared to calling the external executable. Sometimes more, sometimes less.

I got the entire thing to compile by adding the necessary headers to nt.c and putting the gwnum headers into the cm lib folder. Somehow gwthreads.h also ended up in there, not sure if required. I also put them into /usr/local/include, though I am not sure if that's actually necessary.

Then I grabbed the gwnum.a file, renamed it to libgwnum.a and put it in my /usr/local/lib folder.

Open a terminal in the top level directory, run make distclean, automake and then ./configure --enable_mpi --enable-shared=no. It should (hopefully) compile.

 Similar Threads Thread Thread Starter Forum Replies Last Post bur GPU Computing 6 2020-08-28 06:20 JonathanM Information & Answers 25 2020-06-16 02:47 f1pokerspeed FactorDB 14 2014-01-09 21:06 princeps Math 15 2012-04-02 21:49 AntonVrba Math 96 2009-02-25 10:37

All times are UTC. The time now is 09:56.

Wed Jun 29 09:56:05 UTC 2022 up 76 days, 7:57, 1 user, load averages: 0.82, 1.15, 1.37