mersenneforum.org A possible bug in LLR/PFGW while using GWNUM (no bug in P95)
 Register FAQ Search Today's Posts Mark Forums Read

 2014-12-18, 04:32 #23 Batalov     "Serge" Mar 2008 Phi(4,2^7658614+1)/2 1004910 Posts :-) Well, on non-AVX CPUs, everything works fine. PRP, N-1, N+1, anything. In all three programs. The problem only rears its ugly head on AVX CPUs. EDIT: Ah, yes, true, I see it. My bad. --> This line: CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2, SSE4.1, SSE4.2 confused me... Where is the AVX feature? I read up to that line and didn't see that then miraculously it runs AVX FFT! Last fiddled with by Batalov on 2014-12-18 at 05:02
2014-12-18, 04:41   #24
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

3×11×13×19 Posts

Quote:
 Originally Posted by Batalov Well, on non-AVX CPUs, everything works fine. PRP, N-1, N+1, anything. In all three programs. The problem only rears its ugly head on AVX CPUs.

Quote:
 Special modular reduction using all-complex AVX FFT length 240K,
So an N+1 Lucas PRP test works in AVX. There is something about PFGW's N-1 test....

 2014-12-18, 04:55 #25 Batalov     "Serge" Mar 2008 Phi(4,2^7658614+1)/2 13·773 Posts New idea. Instead of dumping residues with InterimResidues=1, added line Debug=1 to the llr.ini. (In the code I saw that it writes residues for 50 bits and then go on without prining). Surprise, surprise. Debug=1 obviously does some more stuff. All residues at bit 48 match (with seven different FFT sizes), while they never match beyond bit 28 before. I see the tunnel at the end of this light (use google translate with if you want, but basically it is a dyslexic joke).
 2014-12-18, 06:08 #26 rogue     "Mark" Apr 2003 Between here and the 2·72·71 Posts David Broadhurst reported issues with the N+1 primality test back in March. Unfortunately I don't know enough about the N+1 test or about pfgw's implementation to debug it.
 2014-12-18, 08:01 #27 Batalov     "Serge" Mar 2008 Phi(4,2^7658614+1)/2 13·773 Posts Here is a test that can be run without changing the latest llr64 binary. The results are not yet very good, but optimistic. Code: #!/bin/tcsh foreach i (seq 0 6) mkdir $i nohup llr64 -W$i -d -oDebug=1 -oFermatBase=5 -oFFT_Increment=$i -q"1024*3^1877301+1" >&$i/out & end The output will show that these AVX FFT sizes will be used: Code: 0/out:Using all-complex AVX FFT length 240K, Pass1=1280, Pass2=192, a = 5 1/out:Using all-complex AVX FFT length 256K, Pass1=256, Pass2=1K, a = 5 2/out:Using all-complex AVX FFT length 288K, Pass1=384, Pass2=768, a = 5 3/out:Using all-complex AVX FFT length 320K, Pass1=256, Pass2=1280, a = 5 4/out:Using zero-padded AVX FFT length 336K, Pass1=448, Pass2=768, a = 5 5/out:Using all-complex AVX FFT length 384K, Pass1=384, Pass2=1K, a = 5 6/out:Using all-complex AVX FFT length 400K, Pass1=1280, Pass2=320, a = 5 Amazingly, two of them will after all finish with "prime" - 3/ and 4/ (highlighted), while the rest will produce different RES64s. In parallel, a nonAVX process will return "prime". Now, I am rerunning 0/ and nonAVX and compare residues every 1000 bits. After 1/2 run they still match. Soon I will find the explosive iteration that sends the computation off tracks. EDIT: The divergence between nonAVX (which ends in "P" with any FFT size) and the debugged run AVX 240K is in the last few hundred bits. I will start tomorrow with the interim save file at bit 2975000 (with ~600 bits to go) and will check each iteration (with full savefiles). Last fiddled with by Batalov on 2014-12-18 at 11:29
 2014-12-18, 16:51 #28 paulunderwood     Sep 2002 Database er0rr 2·33·83 Posts The bug persists in PFGW 3.7.8. (Sorry to Serge if he has already pointed this out.) Code: ./pfgw64 -t -i -V -q"1024*3^1877301+1" PFGW Version 3.7.8.64BIT.20141125.x86_Dev [GWNUM 28.5] CPU Information (From Woltman v26 library code) Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz CPU speed: 3723.08 MHz, 4 cores CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2, SSE4.1, SSE4.2 L1 cache size: 32 KB L2 cache size: 256 KB, L3 cache size: 8 MB L1 cache line size: 64 bytes L2 cache line size: 64 bytes TLBS: 64 Primality testing 1024*3^1877301+1 [N-1, Brillhart-Lehmer-Selfridge] Running N-1 test using base 5 Special modular reduction using all-complex FMA3 FFT length 256K, Pass1=128, Pass2=2K on 1024*3^1877301+1 1024*3^1877301+1 is composite (3498.3155s+0.0102s) It does compute "F" quickly, however. Last fiddled with by paulunderwood on 2014-12-18 at 16:52
2014-12-18, 19:35   #29
paulunderwood

Sep 2002
Database er0rr

2·33·83 Posts

Quote:
 Originally Posted by paulunderwood ps. I am running your number through a GMP implementation of my algorithm with the GWNUM 27.11 output from pfgw64.
It says "Likely prime with a=3", which means, for jacobiSymbol(3^2-4,n)==-1, (x+2)^(n+1)==2*3+5 (mod n, x^2-3*x+1). This implies Serge's problem number is 11-PRP and passes the test x^(n+1)==1 (mod n, x^2-(27/11)*x+1).

2014-12-18, 20:40   #30
Jean Penné

May 2004
FRANCE

26616 Posts

Quote:
 Originally Posted by Batalov New idea. Instead of dumping residues with InterimResidues=1, added line Debug=1 to the llr.ini. (In the code I saw that it writes residues for 50 bits and then go on without prining). Surprise, surprise. Debug=1 obviously does some more stuff. All residues at bit 48 match (with seven different FFT sizes), while they never match beyond bit 28 before. I see the tunnel at the end of this light (use google translate with if you want, but basically it is a dyslexic joke).
I can describe exactly what does the -oDebug=1 option on llr 3.8.13 :

1) - gwstartnextfft is always set to zero (disabled) before beginning the iterations.
2) - Interim residues are displayed after, either the 50 or 30, or at least the 2 first iterations...
That's all...

I think that only the first action could have a effect on the final result...
So, if setting this option really suppress the errors, (but is it really true?), we have now a better sight on the origin of the problem...
Regards,
Jean

2014-12-18, 21:09   #31
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

3·11·13·19 Posts

Quote:
 Originally Posted by Jean Penné 1) - gwstartnextfft is always set to zero (disabled) before beginning the iterations. I think that only the first action could have a effect on the final result... So, if setting this option really suppress the errors, (but is it really true?), we have now a better sight on the origin of the problem...
But, InterimResidues=1 should also set gwstartnextfft to zero.

 2014-12-18, 21:40 #32 Batalov     "Serge" Mar 2008 Phi(4,2^7658614+1)/2 274116 Posts It doesn't fully get rid of errors. I cannot yet put a finger on it, but the interim residues between a successful run (e.g. non-AVX 256K FFT) and 'composite run' (e.g. AVX 256K FFT*) match until the next to last: Code: 1024*3^1877301+1 interim residue B98276A5EE1E7D7F at bit 2975455 1024*3^1877301+1 interim residue 6B973585DBC04EDB at bit 2975456 1024*3^1877301+1 interim residue F145ED62830786D7 at bit 2975457 1024*3^1877301+1 interim residue 39D98E3611EAC80C at bit 2975458 1024*3^1877301+1 interim residue C6B7FE823B8F81B4 at bit 2975459 1024*3^1877301+1 interim residue 9DCC1DE4BF21C1B1 at bit 2975460 1024*3^1877301+1 interim residue 9DCC1DE4BF21C1B0 at bit 2975461 1024*3^1877301+1 may be prime, trying to compute gcd's 5^((N-1)/3)-1 is coprime to N! 1024*3^1877301+1 is prime! (895704 decimal digits) Time : 20273.926 sec. _____________________ 1024*3^1877301+1 interim residue B98276A5EE1E7D7F at bit 2975455 1024*3^1877301+1 interim residue 6B973585DBC04EDB at bit 2975456 1024*3^1877301+1 interim residue F145ED62830786D7 at bit 2975457 1024*3^1877301+1 interim residue 39D98E3611EAC80C at bit 2975458 1024*3^1877301+1 interim residue C6B7FE823B8F81B4 at bit 2975459 1024*3^1877301+1 interim residue 9DCC1DE4BF21C1B1 at bit 2975460 1024*3^1877301+1 interim residue 8E973DD8F58B11B0 at bit 2975461 1024*3^1877301+1 is not prime. RES64: 9B1788BFABF267B2. OLD64: 12E45BE92298D671 Time : 20219.416 sec. The interim files for the last 500 bits are different (23 bytes out the whole file are different; and this situation lingers for the whole stretch of the last 500 bits; inconcievable {one would expect that a different bit-state in one iteration even by a few bits should scramble the whole number over just a few more iterations}, but true: Code: > cmp -l ../BBa1/z3083805.2975460 z3083805.2975460 372225 100 200 372226 10 304 372227 30 307 372228 3 23 372241 130 230 372242 10 304 372243 30 307 372244 3 23 372253 300 24 372254 310 213 372255 220 201 372257 160 260 372258 265 161 372259 35 315 372260 3 23 744180 326 163 744181 12 351 744182 357 74 744183 315 317 744195 114 314 744196 66 53 744197 55 234 744198 324 336 > cmp -l ../BBa1/z3083805.2975460 z3083805.2975460 | wc -l 23 I don't understand the bit-state enough to draw any conclusions from this, just stating the facts.** And, lastly, with -oDebug=1 -oFermatBase=5 -oFFT_Increment=3 (or 4), the full AVX run ends well. I will try to dump bitstate for these runs more fully (currently I have done a large-jump (e.g. InterimFiles=10000), small-step (e.g. InterimFiles=1 starting from the last of the large ones), so that the disk won't fill up. ____________ *A minor note about speed (now that I have seven differnet FFT sizes as shown above): AVX 256K FFT gives faster iteration times than the default AVX 240K FFT. The default non-AVX is 256K FFT. ____________ **EDIT. Dec/18th. I have reached a better understanding of the structure of the savefiles. A savefile = a header + a dumped giant gwnum (!) structure. A giant structure for a base-2 gwnum is a bit-array, but a giant gwnum structure for non-base-2 gwnum is a sparse limb-array (~2:1 sparse). TL;DR version: two non-identical giants can be an identical number. Corollary 1: Savefiles are different, but the full-size residue (represented by them) is the same! Corollary 2: One can restart an interrupted run with different (AVX / non-AVX; different FFT size) settings. The result will be the same! ____________ **EDIT. Dec/19th. Oh, no. I was wrong, the above is how RES64 is calculated (gwtogiant), but writing a save file is different. To save time, gwnum structure is written. gwnum is not normalized at all times, so yes - the same number can be represented by different states of gwnum. Last fiddled with by Batalov on 2014-12-19 at 21:16
 2014-12-18, 22:10 #33 Batalov     "Serge" Mar 2008 Phi(4,2^7658614+1)/2 13×773 Posts A heretic thought - could there be a bug in gwtogiant() ? The one that shows rarely (like in this number). This is based on the 18-25 byte difference in savefiles that goes on and on and on (and doesn't hurt the run). So it can be the representation of the number (before writing the RES64 onscreen and the full residue to file) and not the FFT routines. That in turn would mean that the very last step or the gcd could be affected too. (And would solve the riddle?)

 Similar Threads Thread Thread Starter Forum Replies Last Post rogue Software 545 2023-01-20 14:12 Jean Penné Software 25 2010-11-01 15:18 Joe O Sierpinski/Riesel Base 5 5 2010-09-30 14:07 Unregistered Information & Answers 3 2010-09-12 19:52 Cyclamen Persicum Software 1 2007-01-02 20:53

All times are UTC. The time now is 10:03.

Sat Jan 28 10:03:39 UTC 2023 up 163 days, 7:32, 0 users, load averages: 1.56, 1.31, 1.22