- **Software**
(*https://www.mersenneforum.org/forumdisplay.php?f=10*)

- - **PFGW 4.0.3 (with gwnum v28.7) Released**
(*https://www.mersenneforum.org/showthread.php?t=13969*)

I've also tried comparing iteration timings with Prime95 v26.2 vs. 25.11 (both Windows 32-bit) to verify whether the problem is in gwnum, or just PFGW.
In both cases, I used the Advanced>Time option, and ran 1000 iterations of M38000000. 25.11: ~60 ms/iter. 26.2: ~50 ms/iter. So there's a significant speedup going to version 26.2. Of course, this is a much bigger FFT than that used on the base 5 numbers; so I also tried 1000 iterations of M1100000. 25.11: ~1.2 ms/iter. 26.2: ~1.5 ms/iter. It seems that version 26.2 is actually slower on this FFT. Note that Prime95 v26.2 used the Pentium 4 type-3 56K FFT for this number, whereas the base 5 numbers tested earlier were done with a Core2 type-3 128K FFT. However, there does seem to be a commonality in that in both cases, the v26 gwnum program tested slower on my CPU at these low FFTs. |

[QUOTE=mdettweiler;231968]I've also tried comparing iteration timings with Prime95 v26.2 vs. 25.11 (both Windows 32-bit) to verify whether the problem is in gwnum, or just PFGW.[/QUOTE]
You can add "PRP=289184,5,477336,-1" to worktodo.txt to time the exact numbers in question. It is baffling to me why these smaller FFTs are slower on your Core 2 but not for anyone else. Maybe CPU-Z or one of the other programs that do a more thorough dump of CPU characteristics might shed some light. |

[QUOTE=Prime95;231975]You can add "PRP=289184,5,477336,-1" to worktodo.txt to time the exact numbers in question.
It is baffling to me why these smaller FFTs are slower on your Core 2 but not for anyone else. Maybe CPU-Z or one of the other programs that do a more thorough dump of CPU characteristics might shed some light.[/QUOTE] Okay, here's the iteration timings I got for that with Prime95 v25.11 and 26.2: 25.11: ~3.55 ms/iter. 26.2: ~3.45 ms/iter. Would you know--I get a speed boost with 26.2 after all. It would seem, then, that this is an issue in PFGW and not in gwnum. |

[QUOTE=mdettweiler;232001]Okay, here's the iteration timings I got for that with Prime95 v25.11 and 26.2:
25.11: ~3.55 ms/iter. 26.2: ~3.45 ms/iter. Would you know--I get a speed boost with 26.2 after all. It would seem, then, that this is an issue in PFGW and not in gwnum.[/QUOTE] Not necessarily. I've already shown the timings on Windows for the same build. Such a vast proportion of time is spent in gwnum that it is unlikely that PFGW could cause such a significant slow down. There is clearly something curious going on here though. |

[QUOTE=rogue;232002]Not necessarily. I've already shown the timings on Windows for the same build. Such a vast proportion of time is spent in gwnum that it is unlikely that PFGW could cause such a significant slow down. There is clearly something curious going on here though.[/QUOTE]
Here's what I get comparing iteration times on 289184*5^477336-1 for LLR 3.8.1 and 3.8.2: 3.8.1: ~3.45 ms/iter. 3.8.2: ~3.40 ms/iter. Just like Prime95, LLR gets a speed increase with 3.8.2 as expected. What I really should do, though, is run the test from start to finish on each version of both Prime95 and LLR. We've already seen that in such a test PFGW 3.3.6 is inexplicably faster than 3.4.0, but it would be interesting to see if the same holds true for Prime95 and LLR. Sure, the ms/iter. figures show the newer version to be faster in both such cases, but in each case the figures fluctuated rather wildly and I had to come up with a "gut estimate average" to post here. The potential for experimental error is, needless to say, rather large. George, quick question: is there a way to make Prime95 print the exact wall-clock runtime at the end of a test, like PFGW and LLR do? As it is now, there's not really an easy way to directly measure this with Prime95. |

[QUOTE=mdettweiler;232017]Here's what I get comparing iteration times on 289184*5^477336-1 for LLR 3.8.1 and 3.8.2:
3.8.1: ~3.45 ms/iter. 3.8.2: ~3.40 ms/iter. Just like Prime95, LLR gets a speed increase with 3.8.2 as expected. What I really should do, though, is run the test from start to finish on each version of both Prime95 and LLR. We've already seen that in such a test PFGW 3.3.6 is inexplicably faster than 3.4.0, but it would be interesting to see if the same holds true for Prime95 and LLR. Sure, the ms/iter. figures show the newer version to be faster in both such cases, but in each case the figures fluctuated rather wildly and I had to come up with a "gut estimate average" to post here. The potential for experimental error is, needless to say, rather large. George, quick question: is there a way to make Prime95 print the exact wall-clock runtime at the end of a test, like PFGW and LLR do? As it is now, there's not really an easy way to directly measure this with Prime95.[/QUOTE] I'm curious. Is 3.4.0 slower for other n and other bases? |

[QUOTE=mdettweiler;232017]
George, quick question: is there a way to make Prime95 print the exact wall-clock runtime at the end of a test, like PFGW and LLR do? As it is now, there's not really an easy way to directly measure this with Prime95.[/QUOTE] The date/time is displayed at the start of every line output to the screen. |

[QUOTE=rogue;232047]I'm curious. Is 3.4.0 slower for other n and other bases?[/QUOTE]
For base 2, 3.4.0 is faster: [code] PFGW Version 3.3.6.20100908.Win_Stable [GWNUM 25.14] 2071*2^270307-1 is composite: RES64: [816B1DBFBFC67D09] (123.9523s+0.0004s) PFGW Version 3.4.0.32BIT.20100925.Win_Dev [GWNUM 26.2] 2071*2^270307-1 is composite: RES64: [816B1DBFBFC67D09] (108.1168s+0.0003s) [/code] Ditto for base 3: [code] PFGW Version 3.3.6.20100908.Win_Stable [GWNUM 25.14] 170979002*3^50000+1 is composite: RES64: [CBA9FAA11257431A] (15.1078s+0.0014s) PFGW Version 3.4.0.32BIT.20100925.Win_Dev [GWNUM 26.2] 170979002*3^50000+1 is composite: RES64: [CBA9FAA11257431A] (12.1903s+0.0017s)[/code] And for another n on base 5, 3.4.0 is again faster: [code] PFGW Version 3.3.6.20100908.Win_Stable [GWNUM 25.14] 18656*5^65474-1 is composite: RES64: [BB2682E39AA9CB16] (42.5636s+0.0034s) PFGW Version 3.4.0.32BIT.20100925.Win_Dev [GWNUM 26.2] 18656*5^65474-1 is composite: RES64: [BB2682E39AA9CB16] (37.9236s+0.0038s)[/code] The problem, it would seem, is localized to this particular range of n on base 5 (possibly this particular FFT size). Mark, did you by chance check which FFT 3.4.0 chose for the two (larger) base 5 tests on your CPU? Mine used "Core2 type-3 FFT length 128K"; maybe yours chose a different CPU architecture? (Grasping at straws here...) [QUOTE=Prime95;232059]The date/time is displayed at the start of every line output to the screen.[/QUOTE] But that only prints out at the [i]end[/i] of each test; what I'd need for it to do is to print the time at the beginning as well so I could subtract and get the runtime. I suppose what I could do is stick a miniscule test (n=100 or so) in the worktodo.txt file right before the base 5 test. That way, it prints out the time at the tiny test's completion (i.e., at the start of the base 5 test) and again at the end of the base 5 test. I'll try that later today. |

This is what 3.4.0 chose on Win64:
Special modular reduction using zero-padded Core2 type-3 FFT length 128K, Pass1=128, Pass2=1K on 289184*5^477336-1 |

[QUOTE=rogue;232065]This is what 3.4.0 chose on Win64:
Special modular reduction using zero-padded Core2 type-3 FFT length 128K, Pass1=128, Pass2=1K on 289184*5^477336-1[/QUOTE] That's the same as what I got: Special modular reduction using zero-padded Core2 type-3 FFT length 128K, Pass1=128, Pass2=1K on 289184*5^477336-1 Does the 32-bit version by chance give you something different? |

[QUOTE=mdettweiler;232068]That's the same as what I got:
Special modular reduction using zero-padded Core2 type-3 FFT length 128K, Pass1=128, Pass2=1K on 289184*5^477336-1 Does the 32-bit version by chance give you something different?[/QUOTE] Nope. |

All times are UTC. The time now is 00:57. |

Powered by vBulletin® Version 3.8.11

Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.