![]() |
![]() |
#23 |
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
1004910 Posts |
![]()
:-)
Well, on non-AVX CPUs, everything works fine. PRP, N-1, N+1, anything. In all three programs. The problem only rears its ugly head on AVX CPUs. EDIT: Ah, yes, true, I see it. My bad. --> This line: CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2, SSE4.1, SSE4.2 confused me... Where is the AVX feature? I read up to that line and didn't see that then miraculously it runs AVX FFT! Last fiddled with by Batalov on 2014-12-18 at 05:02 |
![]() |
![]() |
![]() |
#24 | ||
P90 years forever!
Aug 2002
Yeehaw, FL
3×11×13×19 Posts |
![]() Quote:
Read Paul's info carefully: Quote:
|
||
![]() |
![]() |
![]() |
#25 |
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
13·773 Posts |
![]()
New idea. Instead of dumping residues with InterimResidues=1, added line Debug=1 to the llr.ini. (In the code I saw that it writes residues for 50 bits and then go on without prining).
Surprise, surprise. Debug=1 obviously does some more stuff. All residues at bit 48 match (with seven different FFT sizes), while they never match beyond bit 28 before. I see the tunnel at the end of this light (use google translate with if you want, but basically it is a dyslexic joke). |
![]() |
![]() |
![]() |
#26 |
"Mark"
Apr 2003
Between here and the
2·72·71 Posts |
![]()
David Broadhurst reported issues with the N+1 primality test back in March. Unfortunately I don't know enough about the N+1 test or about pfgw's implementation to debug it.
|
![]() |
![]() |
![]() |
#27 |
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
13·773 Posts |
![]()
Here is a test that can be run without changing the latest llr64 binary.
The results are not yet very good, but optimistic. Code:
#!/bin/tcsh foreach i (`seq 0 6`) mkdir $i nohup llr64 -W$i -d -oDebug=1 -oFermatBase=5 -oFFT_Increment=$i -q"1024*3^1877301+1" >& $i/out & end Code:
0/out:Using all-complex AVX FFT length 240K, Pass1=1280, Pass2=192, a = 5 1/out:Using all-complex AVX FFT length 256K, Pass1=256, Pass2=1K, a = 5 2/out:Using all-complex AVX FFT length 288K, Pass1=384, Pass2=768, a = 5 3/out:Using all-complex AVX FFT length 320K, Pass1=256, Pass2=1280, a = 5 4/out:Using zero-padded AVX FFT length 336K, Pass1=448, Pass2=768, a = 5 5/out:Using all-complex AVX FFT length 384K, Pass1=384, Pass2=1K, a = 5 6/out:Using all-complex AVX FFT length 400K, Pass1=1280, Pass2=320, a = 5 In parallel, a nonAVX process will return "prime". Now, I am rerunning 0/ and nonAVX and compare residues every 1000 bits. After 1/2 run they still match. Soon I will find the explosive iteration that sends the computation off tracks. EDIT: The divergence between nonAVX (which ends in "P" with any FFT size) and the debugged run AVX 240K is in the last few hundred bits. I will start tomorrow with the interim save file at bit 2975000 (with ~600 bits to go) and will check each iteration (with full savefiles). Last fiddled with by Batalov on 2014-12-18 at 11:29 |
![]() |
![]() |
![]() |
#28 |
Sep 2002
Database er0rr
2·33·83 Posts |
![]()
The bug persists in PFGW 3.7.8. (Sorry to Serge if he has already pointed this out.)
Code:
./pfgw64 -t -i -V -q"1024*3^1877301+1" PFGW Version 3.7.8.64BIT.20141125.x86_Dev [GWNUM 28.5] CPU Information (From Woltman v26 library code) Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz CPU speed: 3723.08 MHz, 4 cores CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2, SSE4.1, SSE4.2 L1 cache size: 32 KB L2 cache size: 256 KB, L3 cache size: 8 MB L1 cache line size: 64 bytes L2 cache line size: 64 bytes TLBS: 64 Primality testing 1024*3^1877301+1 [N-1, Brillhart-Lehmer-Selfridge] Running N-1 test using base 5 Special modular reduction using all-complex FMA3 FFT length 256K, Pass1=128, Pass2=2K on 1024*3^1877301+1 1024*3^1877301+1 is composite (3498.3155s+0.0102s) ![]() Last fiddled with by paulunderwood on 2014-12-18 at 16:52 |
![]() |
![]() |
![]() |
#29 | |
Sep 2002
Database er0rr
2·33·83 Posts |
![]() Quote:
![]() |
|
![]() |
![]() |
![]() |
#30 | |
May 2004
FRANCE
26616 Posts |
![]() Quote:
1) - gwstartnextfft is always set to zero (disabled) before beginning the iterations. 2) - Interim residues are displayed after, either the 50 or 30, or at least the 2 first iterations... That's all... I think that only the first action could have a effect on the final result... So, if setting this option really suppress the errors, (but is it really true?), we have now a better sight on the origin of the problem... Regards, Jean |
|
![]() |
![]() |
![]() |
#31 | |
P90 years forever!
Aug 2002
Yeehaw, FL
3·11·13·19 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#32 |
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
274116 Posts |
![]()
It doesn't fully get rid of errors.
I cannot yet put a finger on it, but the interim residues between a successful run (e.g. non-AVX 256K FFT) and 'composite run' (e.g. AVX 256K FFT*) match until the next to last: Code:
1024*3^1877301+1 interim residue B98276A5EE1E7D7F at bit 2975455 1024*3^1877301+1 interim residue 6B973585DBC04EDB at bit 2975456 1024*3^1877301+1 interim residue F145ED62830786D7 at bit 2975457 1024*3^1877301+1 interim residue 39D98E3611EAC80C at bit 2975458 1024*3^1877301+1 interim residue C6B7FE823B8F81B4 at bit 2975459 1024*3^1877301+1 interim residue 9DCC1DE4BF21C1B1 at bit 2975460 1024*3^1877301+1 interim residue 9DCC1DE4BF21C1B0 at bit 2975461 1024*3^1877301+1 may be prime, trying to compute gcd's 5^((N-1)/3)-1 is coprime to N! 1024*3^1877301+1 is prime! (895704 decimal digits) Time : 20273.926 sec. _____________________ 1024*3^1877301+1 interim residue B98276A5EE1E7D7F at bit 2975455 1024*3^1877301+1 interim residue 6B973585DBC04EDB at bit 2975456 1024*3^1877301+1 interim residue F145ED62830786D7 at bit 2975457 1024*3^1877301+1 interim residue 39D98E3611EAC80C at bit 2975458 1024*3^1877301+1 interim residue C6B7FE823B8F81B4 at bit 2975459 1024*3^1877301+1 interim residue 9DCC1DE4BF21C1B1 at bit 2975460 1024*3^1877301+1 interim residue 8E973DD8F58B11B0 at bit 2975461 1024*3^1877301+1 is not prime. RES64: 9B1788BFABF267B2. OLD64: 12E45BE92298D671 Time : 20219.416 sec. Code:
> cmp -l ../BBa1/z3083805.2975460 z3083805.2975460 372225 100 200 372226 10 304 372227 30 307 372228 3 23 372241 130 230 372242 10 304 372243 30 307 372244 3 23 372253 300 24 372254 310 213 372255 220 201 372257 160 260 372258 265 161 372259 35 315 372260 3 23 744180 326 163 744181 12 351 744182 357 74 744183 315 317 744195 114 314 744196 66 53 744197 55 234 744198 324 336 > cmp -l ../BBa1/z3083805.2975460 z3083805.2975460 | wc -l 23 And, lastly, with -oDebug=1 -oFermatBase=5 -oFFT_Increment=3 (or 4), the full AVX run ends well. I will try to dump bitstate for these runs more fully (currently I have done a large-jump (e.g. InterimFiles=10000), small-step (e.g. InterimFiles=1 starting from the last of the large ones), so that the disk won't fill up. ____________ *A minor note about speed (now that I have seven differnet FFT sizes as shown above): AVX 256K FFT gives faster iteration times than the default AVX 240K FFT. The default non-AVX is 256K FFT. ____________ **EDIT. Dec/18th. I have reached a better understanding of the structure of the savefiles. A savefile = a header + a dumped A giant structure for a base-2 gwnum is a bit-array, but a TL;DR version: two non-identical giants can be an identical number. Corollary 1: Savefiles are different, but the full-size residue (represented by them) is the same! Corollary 2: One can restart an interrupted run with different (AVX / non-AVX; different FFT size) settings. The result will be the same! ____________ **EDIT. Dec/19th. Oh, no. I was wrong, the above is how RES64 is calculated (gwtogiant), but writing a save file is different. To save time, gwnum structure is written. gwnum is not normalized at all times, so yes - the same number can be represented by different states of gwnum. Last fiddled with by Batalov on 2014-12-19 at 21:16 |
![]() |
![]() |
![]() |
#33 |
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
13×773 Posts |
![]()
A heretic thought - could there be a bug in gwtogiant() ? The one that shows rarely (like in this number).
This is based on the 18-25 byte difference in savefiles that goes on and on and on (and doesn't hurt the run). So it can be the representation of the number (before writing the RES64 onscreen and the full residue to file) and not the FFT routines. That in turn would mean that the very last step or the gcd could be affected too. (And would solve the riddle?) |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
PFGW 4.0.4 (with gwnum v30.10) Released | rogue | Software | 545 | 2023-01-20 14:12 |
LLR V3.8.2 using gwnum 26.2 is available! | Jean Penné | Software | 25 | 2010-11-01 15:18 |
PFGW 3.3.6 or PFGW 3.4.2 Please update now! | Joe O | Sierpinski/Riesel Base 5 | 5 | 2010-09-30 14:07 |
GWNUM? | Unregistered | Information & Answers | 3 | 2010-09-12 19:52 |
GWNUM as DLL? | Cyclamen Persicum | Software | 1 | 2007-01-02 20:53 |