mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2014-12-18, 04:32   #23
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

1004910 Posts
Default

:-)

Well, on non-AVX CPUs, everything works fine. PRP, N-1, N+1, anything. In all three programs.

The problem only rears its ugly head on AVX CPUs.

EDIT: Ah, yes, true, I see it. My bad.

--> This line:
CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2, SSE4.1, SSE4.2
confused me... Where is the AVX feature? I read up to that line and didn't see that then miraculously it runs AVX FFT!

Last fiddled with by Batalov on 2014-12-18 at 05:02
Batalov is offline   Reply With Quote
Old 2014-12-18, 04:41   #24
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

3×11×13×19 Posts
Default

Quote:
Originally Posted by Batalov View Post
Well, on non-AVX CPUs, everything works fine. PRP, N-1, N+1, anything. In all three programs.

The problem only rears its ugly head on AVX CPUs.

Read Paul's info carefully:

Quote:
Special modular reduction using all-complex AVX FFT length 240K,
So an N+1 Lucas PRP test works in AVX. There is something about PFGW's N-1 test....
Prime95 is offline   Reply With Quote
Old 2014-12-18, 04:55   #25
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

13·773 Posts
Default

New idea. Instead of dumping residues with InterimResidues=1, added line Debug=1 to the llr.ini. (In the code I saw that it writes residues for 50 bits and then go on without prining).

Surprise, surprise. Debug=1 obviously does some more stuff. All residues at bit 48 match (with seven different FFT sizes), while they never match beyond bit 28 before.

I see the tunnel at the end of this light (use google translate with if you want, but basically it is a dyslexic joke).
Batalov is offline   Reply With Quote
Old 2014-12-18, 06:08   #26
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

2·72·71 Posts
Default

David Broadhurst reported issues with the N+1 primality test back in March. Unfortunately I don't know enough about the N+1 test or about pfgw's implementation to debug it.
rogue is offline   Reply With Quote
Old 2014-12-18, 08:01   #27
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

13·773 Posts
Lightbulb

Here is a test that can be run without changing the latest llr64 binary.
The results are not yet very good, but optimistic.

Code:
#!/bin/tcsh

foreach i (`seq 0 6`)
   mkdir $i
   nohup llr64 -W$i -d -oDebug=1 -oFermatBase=5 -oFFT_Increment=$i -q"1024*3^1877301+1" >& $i/out &
   end
The output will show that these AVX FFT sizes will be used:
Code:
0/out:Using all-complex AVX FFT length 240K, Pass1=1280, Pass2=192, a = 5
1/out:Using all-complex AVX FFT length 256K, Pass1=256, Pass2=1K, a = 5
2/out:Using all-complex AVX FFT length 288K, Pass1=384, Pass2=768, a = 5
3/out:Using all-complex AVX FFT length 320K, Pass1=256, Pass2=1280, a = 5
4/out:Using zero-padded AVX FFT length 336K, Pass1=448, Pass2=768, a = 5
5/out:Using all-complex AVX FFT length 384K, Pass1=384, Pass2=1K, a = 5
6/out:Using all-complex AVX FFT length 400K, Pass1=1280, Pass2=320, a = 5
Amazingly, two of them will after all finish with "prime" - 3/ and 4/ (highlighted), while the rest will produce different RES64s.

In parallel, a nonAVX process will return "prime". Now, I am rerunning 0/ and nonAVX and compare residues every 1000 bits. After 1/2 run they still match. Soon I will find the explosive iteration that sends the computation off tracks.

EDIT: The divergence between nonAVX (which ends in "P" with any FFT size) and the debugged run AVX 240K is in the last few hundred bits. I will start tomorrow with the interim save file at bit 2975000 (with ~600 bits to go) and will check each iteration (with full savefiles).

Last fiddled with by Batalov on 2014-12-18 at 11:29
Batalov is offline   Reply With Quote
Old 2014-12-18, 16:51   #28
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

2·33·83 Posts
Default

The bug persists in PFGW 3.7.8. (Sorry to Serge if he has already pointed this out.)

Code:
./pfgw64 -t -i -V -q"1024*3^1877301+1"
PFGW Version 3.7.8.64BIT.20141125.x86_Dev [GWNUM 28.5]


CPU Information (From Woltman v26 library code)
Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz
CPU speed: 3723.08 MHz, 4 cores
CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2, SSE4.1, SSE4.2
L1 cache size: 32 KB
L2 cache size: 256 KB, L3 cache size: 8 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
                                    
Primality testing 1024*3^1877301+1 [N-1, Brillhart-Lehmer-Selfridge]                                    
Running N-1 test using base 5                                                  
Special modular reduction using all-complex FMA3 FFT length 256K, Pass1=128, Pass2=2K on 1024*3^1877301+1                                    
1024*3^1877301+1 is composite (3498.3155s+0.0102s)
It does compute "F" quickly, however.


Last fiddled with by paulunderwood on 2014-12-18 at 16:52
paulunderwood is offline   Reply With Quote
Old 2014-12-18, 19:35   #29
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

2·33·83 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
ps. I am running your number through a GMP implementation of my algorithm with the GWNUM 27.11 output from pfgw64.
It says "Likely prime with a=3", which means, for jacobiSymbol(3^2-4,n)==-1, (x+2)^(n+1)==2*3+5 (mod n, x^2-3*x+1). This implies Serge's problem number is 11-PRP and passes the test x^(n+1)==1 (mod n, x^2-(27/11)*x+1).

paulunderwood is offline   Reply With Quote
Old 2014-12-18, 20:40   #30
Jean Penné
 
Jean Penné's Avatar
 
May 2004
FRANCE

26616 Posts
Default

Quote:
Originally Posted by Batalov View Post
New idea. Instead of dumping residues with InterimResidues=1, added line Debug=1 to the llr.ini. (In the code I saw that it writes residues for 50 bits and then go on without prining).

Surprise, surprise. Debug=1 obviously does some more stuff. All residues at bit 48 match (with seven different FFT sizes), while they never match beyond bit 28 before.

I see the tunnel at the end of this light (use google translate with if you want, but basically it is a dyslexic joke).
I can describe exactly what does the -oDebug=1 option on llr 3.8.13 :

1) - gwstartnextfft is always set to zero (disabled) before beginning the iterations.
2) - Interim residues are displayed after, either the 50 or 30, or at least the 2 first iterations...
That's all...

I think that only the first action could have a effect on the final result...
So, if setting this option really suppress the errors, (but is it really true?), we have now a better sight on the origin of the problem...
Regards,
Jean
Jean Penné is offline   Reply With Quote
Old 2014-12-18, 21:09   #31
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

3·11·13·19 Posts
Default

Quote:
Originally Posted by Jean Penné View Post
1) - gwstartnextfft is always set to zero (disabled) before beginning the iterations.

I think that only the first action could have a effect on the final result...
So, if setting this option really suppress the errors, (but is it really true?), we have now a better sight on the origin of the problem...
But, InterimResidues=1 should also set gwstartnextfft to zero.
Prime95 is offline   Reply With Quote
Old 2014-12-18, 21:40   #32
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

274116 Posts
Default

It doesn't fully get rid of errors.

I cannot yet put a finger on it, but the interim residues between a successful run (e.g. non-AVX 256K FFT) and 'composite run' (e.g. AVX 256K FFT*) match until the next to last:
Code:
1024*3^1877301+1 interim residue B98276A5EE1E7D7F at bit 2975455
1024*3^1877301+1 interim residue 6B973585DBC04EDB at bit 2975456
1024*3^1877301+1 interim residue F145ED62830786D7 at bit 2975457
1024*3^1877301+1 interim residue 39D98E3611EAC80C at bit 2975458
1024*3^1877301+1 interim residue C6B7FE823B8F81B4 at bit 2975459
1024*3^1877301+1 interim residue 9DCC1DE4BF21C1B1 at bit 2975460
1024*3^1877301+1 interim residue 9DCC1DE4BF21C1B0 at bit 2975461
1024*3^1877301+1 may be prime, trying to compute gcd's
5^((N-1)/3)-1 is coprime to N!
1024*3^1877301+1 is prime! (895704 decimal digits)  Time : 20273.926 sec.
_____________________
1024*3^1877301+1 interim residue B98276A5EE1E7D7F at bit 2975455
1024*3^1877301+1 interim residue 6B973585DBC04EDB at bit 2975456
1024*3^1877301+1 interim residue F145ED62830786D7 at bit 2975457
1024*3^1877301+1 interim residue 39D98E3611EAC80C at bit 2975458
1024*3^1877301+1 interim residue C6B7FE823B8F81B4 at bit 2975459
1024*3^1877301+1 interim residue 9DCC1DE4BF21C1B1 at bit 2975460
1024*3^1877301+1 interim residue 8E973DD8F58B11B0 at bit 2975461
1024*3^1877301+1 is not prime.  RES64: 9B1788BFABF267B2.  OLD64: 12E45BE92298D671  Time : 20219.416 sec.
The interim files for the last 500 bits are different (23 bytes out the whole file are different; and this situation lingers for the whole stretch of the last 500 bits; inconcievable {one would expect that a different bit-state in one iteration even by a few bits should scramble the whole number over just a few more iterations}, but true:
Code:
> cmp -l ../BBa1/z3083805.2975460 z3083805.2975460
372225 100 200
372226  10 304
372227  30 307
372228   3  23
372241 130 230
372242  10 304
372243  30 307
372244   3  23
372253 300  24
372254 310 213
372255 220 201
372257 160 260
372258 265 161
372259  35 315
372260   3  23
744180 326 163
744181  12 351
744182 357  74
744183 315 317
744195 114 314
744196  66  53
744197  55 234
744198 324 336

> cmp -l ../BBa1/z3083805.2975460 z3083805.2975460 | wc -l
23
I don't understand the bit-state enough to draw any conclusions from this, just stating the facts.**

And, lastly, with -oDebug=1 -oFermatBase=5 -oFFT_Increment=3 (or 4), the full AVX run ends well. I will try to dump bitstate for these runs more fully (currently I have done a large-jump (e.g. InterimFiles=10000), small-step (e.g. InterimFiles=1 starting from the last of the large ones), so that the disk won't fill up.
____________
*A minor note about speed (now that I have seven differnet FFT sizes as shown above): AVX 256K FFT gives faster iteration times than the default AVX 240K FFT. The default non-AVX is 256K FFT.
____________
**EDIT. Dec/18th. I have reached a better understanding of the structure of the savefiles.
A savefile = a header + a dumped giant gwnum (!) structure.
A giant structure for a base-2 gwnum is a bit-array, but a giant gwnum structure for non-base-2 gwnum is a sparse limb-array (~2:1 sparse).
TL;DR version: two non-identical giants can be an identical number.
Corollary 1: Savefiles are different, but the full-size residue (represented by them) is the same!
Corollary 2: One can restart an interrupted run with different (AVX / non-AVX; different FFT size) settings. The result will be the same!

____________
**EDIT. Dec/19th. Oh, no. I was wrong, the above is how RES64 is calculated (gwtogiant), but writing a save file is different. To save time, gwnum structure is written. gwnum is not normalized at all times, so yes - the same number can be represented by different states of gwnum.

Last fiddled with by Batalov on 2014-12-19 at 21:16
Batalov is offline   Reply With Quote
Old 2014-12-18, 22:10   #33
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

13×773 Posts
Default

A heretic thought - could there be a bug in gwtogiant() ? The one that shows rarely (like in this number).

This is based on the 18-25 byte difference in savefiles that goes on and on and on (and doesn't hurt the run). So it can be the representation of the number (before writing the RES64 onscreen and the full residue to file) and not the FFT routines.

That in turn would mean that the very last step or the gcd could be affected too. (And would solve the riddle?)
Batalov is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
PFGW 4.0.4 (with gwnum v30.10) Released rogue Software 545 2023-01-20 14:12
LLR V3.8.2 using gwnum 26.2 is available! Jean Penné Software 25 2010-11-01 15:18
PFGW 3.3.6 or PFGW 3.4.2 Please update now! Joe O Sierpinski/Riesel Base 5 5 2010-09-30 14:07
GWNUM? Unregistered Information & Answers 3 2010-09-12 19:52
GWNUM as DLL? Cyclamen Persicum Software 1 2007-01-02 20:53

All times are UTC. The time now is 10:03.


Sat Jan 28 10:03:39 UTC 2023 up 163 days, 7:32, 0 users, load averages: 1.56, 1.31, 1.22

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔