mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   PFGW 4.0.3 (with gwnum v28.7) Released (https://www.mersenneforum.org/showthread.php?t=13969)

mdettweiler 2010-10-21 21:37

At last, the results are in for the comparison of Prime95 v25.11 vs. v26.3 for 289184*5^477336-1 on my Core 2 Duo. We have:

v25.11 started at [Thu Oct 21 14:29:35 2010], finished at [Thu Oct 21 15:30:55 2010] --> total 1:01:20 = 3680 sec.

v26.3 started at [Thu Oct 21 15:53:16 2010], finished at [Thu Oct 21 16:38:05 2010] --> total 0:44:49 = 2689 sec.

So it would seem that PFGW 3.4.0 (32-bit, as all of these were) is the only one of the triumvirate that exhibits a slowdown going to gwnum v26. Note, however, that the individual gwnum minor versions of the programs used for these tests do not all line up; I tested PFGW 3.3.6 vs. 3.4.0 (gwnum 26.2), LLR 3.8.1 vs. 3.8.2 (gwnum 26.2), and Prime95 25.11 vs. 26.3 (gwnum 26.3). For that reason, I will follow this up shortly with a rerun of 289184*5^477336-1 using PFGW 3.4.2. Stay tuned...

mdettweiler 2010-10-21 21:50

Holy cow! It would seem that 3.4.2 actually chooses an entirely different FFT size for 289184*5^477336-1 than 3.4.1 (which I understand is the same as 3.4.0 on 32-bit--I deleted my copy of 3.4.0, stupid me, and could only get my hands on 3.4.1). Behold:
[code]
$ ./pfgw341.exe -F -q289184*5^477336-1
PFGW Version 3.4.1.32BIT.20100927.Win_Dev [GWNUM 26.2]
Special modular reduction using zero-padded Core2 type-3 FFT length 128K, Pass1=128, Pass2=1K on 289184*5^477336-1
Special modular reduction using zero-padded Pentium4 type-1 FFT length 144K, Pass1=96, Pass2=1536 on 289184*5^477336-1
Special modular reduction using zero-padded Pentium4 type-3 FFT length 160K, Pass1=640, Pass2=256 on 289184*5^477336-1
Special modular reduction using zero-padded Pentium4 type-3 FFT length 192K, Pass1=256, Pass2=768 on 289184*5^477336-1
Special modular reduction using zero-padded Pentium4 type-3 FFT length 224K, Pass1=896, Pass2=256 on 289184*5^477336-1
Special modular reduction using zero-padded Pentium4 type-3 FFT length 240K, Pass1=320, Pass2=768 on 289184*5^477336-1

$ ./pfgw.exe -F -q289184*5^477336-1
PFGW Version 3.4.2.32BIT.20101019.Win_Dev [GWNUM 26.4]
Special modular reduction using Core2 type-3 FFT length 112K, Pass1=448, Pass2=256 on 289184*5^477336-1
[/code]
Not only is the size used different (112K vs. 128K), 3.4.2 omits the "zero-padded" nomenclature entirely. Whether this is just an output difference or a difference in the underlying logic I do not know.

This would seem to invalidate 3.4.2 for use in trying to nail down this mystery. However, at this point it would seem rather unnecessary, as whatever happened, it has apparently been fixed in 3.4.2. :huh: Thus, I'll just stick with 3.4.2 for all my testing as it seems to now be consistent with the speedups I get from comparable LLR and Prime95 versions.

Thanks for taking the time to look into this (and for whatever you guys did to fix it)! :smile:

rogue 2010-10-22 00:07

I didn't do anything, but George might have. I find it interesting that the old version specified Pentium4 and the new one specified Core 2.

mdettweiler 2010-10-22 01:08

[QUOTE=rogue;234128]I didn't do anything, but George might have. I find it interesting that the old version specified Pentium4 and the new one specified Core 2.[/QUOTE]
Well, it gave 6 different potential FFT choices (1 Core 2, 5 P4) when I ran it with -F, but when I run the actual test with -V it uses the Core2 FFT.

BTW: why exactly would it give 6 FFT choices like that? Shouldn't it boil down to exactly one choice just like it would for the real test? (Or might this, whatever the cause, be the reason for the strange slowdown?)

rogue 2010-10-22 01:17

[QUOTE=mdettweiler;234130]Well, it gave 6 different potential FFT choices (1 Core 2, 5 P4) when I ran it with -F, but when I run the actual test with -V it uses the Core2 FFT.

BTW: why exactly would it give 6 FFT choices like that? Shouldn't it boil down to exactly one choice just like it would for the real test? (Or might this, whatever the cause, be the reason for the strange slowdown?)[/QUOTE]

That it listed 6 was a bug that I fixed in 3.4.1. Only the first one would be used under normal conditions.

rogue 2010-10-25 21:01

PFGW 3.4.3 Released
 
You can d/l the latest release for Windows, MacIntel, and Linux from here: [url]http://sourceforge.net/projects/openpfgw/[/url]

The updates are for 64-bit PFGW users. A bug was found and fixed in the factoring code. For linux, the binary is now statically linked.

Prime95 2010-10-25 22:31

[QUOTE=mdettweiler;234119]Holy cow! It would seem that 3.4.2 actually chooses an entirely different FFT size for 289184*5^477336-1 than 3.4.1
...
Not only is the size used different (112K vs. 128K), 3.4.2 omits the "zero-padded" nomenclature entirely. Whether this is just an output difference or a difference in the underlying logic I do not know.[/QUOTE]

For those that like gory details, gwnum 26.4 can now propagate carries to the next 6 FFT data words whereas 26.3 can only propagate to the next 4 FFT data words. Usually this makes no difference in FFT selection. But for larger k values, 26.4 may use the slightly faster irrational base discrete weighted FFT (Richard Crandall's IBDWT) vs. a zero-padded FFT of the same size. In even rarer cases, 26.4 may use an IBDWT with a smaller FFT length.

Batalov 2010-11-01 04:25

[QUOTE=rogue;234367]You can d/l the latest release for Windows, MacIntel, and Linux from here: [URL]http://sourceforge.net/projects/openpfgw/[/URL]

The updates are for 64-bit PFGW users. A bug was found and fixed in the factoring code. For linux, the binary is now statically linked.[/QUOTE]
I have a small bug. Run pfgw64 (linux), kill it somewhere; then replace the input file (with something else), restart and it reports:

[FONT=Arial Narrow]***WARNING! file sr_10.pfgw line 2378 does not match what is expected.[/FONT]
[FONT=Arial Narrow]Expecting: 10001001*10^11441+1[/FONT]
[FONT=Arial Narrow]File contained: 1001001*10^25534+1[/FONT]
[FONT=Arial Narrow]Starting over at the beginning of the file[/FONT]
[FONT=Arial Narrow][/FONT]
[FONT=Arial Narrow]10001001*10^25535+1 is composite: RES64: [AD505C1D89295440] (24.7044s+0.0002s)[/FONT]
[FONT=Arial Narrow]...[/FONT]

Starting over at the beginning of the file, of course, is the usual and in this case desired effect. But it doesn't, it only says that it will, and instead goes from the middle of the file (i.e. the line is not zeroed). This seems to be new (something unitialized in 64-bit version?), -- it worked fine before.

rogue 2010-11-01 14:56

[QUOTE=Batalov;235156]I have a small bug. Run pfgw64 (linux), kill it somewhere; then replace the input file (with something else), restart and it reports:

[FONT=Arial Narrow]***WARNING! file sr_10.pfgw line 2378 does not match what is expected.[/FONT]
[FONT=Arial Narrow]Expecting: 10001001*10^11441+1[/FONT]
[FONT=Arial Narrow]File contained: 1001001*10^25534+1[/FONT]
[FONT=Arial Narrow]Starting over at the beginning of the file[/FONT]
[FONT=Arial Narrow][/FONT]
[FONT=Arial Narrow]10001001*10^25535+1 is composite: RES64: [AD505C1D89295440] (24.7044s+0.0002s)[/FONT]
[FONT=Arial Narrow]...[/FONT]

Starting over at the beginning of the file, of course, is the usual and in this case desired effect. But it doesn't, it only says that it will, and instead goes from the middle of the file (i.e. the line is not zeroed). This seems to be new (something unitialized in 64-bit version?), -- it worked fine before.[/QUOTE]

This was something I broke when trying to address a crash with ABC2 files. I'll have to look into another fix for that problem.

rogue 2010-11-04 21:39

PFGW 3.4.4 Released
 
You can d/l the latest release for Windows, MacIntel, and Linux from here: [url]http://sourceforge.net/projects/openpfgw/[/url]

This fixes a factoring problem on Win64 and fixes the ABC resume problem. I believe that there is still an ABC2 crashing problem, but I can't recall how to produce it. I had to revert that change to correct the ABC resume problem.

Batalov 2010-11-26 08:38

Konyagin-Pomerance extension
 
In PFGW, the N-1 Brillhart-Lehmer-Selfridge implements eponymous 1975 algorithm, but would it be hard to extend it with the third-magnitude stage Konyagin-Pomerance extension (as in pages 176-178 of Crandall/Pomerance PN-ACP, Theorem 4.1.6)? Part (1) seems no different from the square test of the second-magnitude stage, and the same code would be called six times with minor variations, but part (2) needs a bit of implementation. There's a GP prototype [URL="http://tech.groups.yahoo.com/group/primeform/files/KP/KonPom.gp"]available[/URL], needs a polroots() for a cubic poly and contfrac() rewritten.

Was this ever requested before? Could I possibly help? (with a disclaimer that familiarizing with the code could take much more time than "just doing it" for an experienced developer, i.e. Mark :rolleyes:)


All times are UTC. The time now is 03:58.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.