![]() |
![]() |
#23 |
A Sunny Moo
Aug 2007
USA (GMT-5)
186916 Posts |
![]()
I've also tried comparing iteration timings with Prime95 v26.2 vs. 25.11 (both Windows 32-bit) to verify whether the problem is in gwnum, or just PFGW.
In both cases, I used the Advanced>Time option, and ran 1000 iterations of M38000000. 25.11: ~60 ms/iter. 26.2: ~50 ms/iter. So there's a significant speedup going to version 26.2. Of course, this is a much bigger FFT than that used on the base 5 numbers; so I also tried 1000 iterations of M1100000. 25.11: ~1.2 ms/iter. 26.2: ~1.5 ms/iter. It seems that version 26.2 is actually slower on this FFT. Note that Prime95 v26.2 used the Pentium 4 type-3 56K FFT for this number, whereas the base 5 numbers tested earlier were done with a Core2 type-3 128K FFT. However, there does seem to be a commonality in that in both cases, the v26 gwnum program tested slower on my CPU at these low FFTs. |
![]() |
![]() |
![]() |
#24 | |
P90 years forever!
Aug 2002
Yeehaw, FL
22·13·157 Posts |
![]() Quote:
It is baffling to me why these smaller FFTs are slower on your Core 2 but not for anyone else. Maybe CPU-Z or one of the other programs that do a more thorough dump of CPU characteristics might shed some light. |
|
![]() |
![]() |
![]() |
#25 | |
A Sunny Moo
Aug 2007
USA (GMT-5)
3×2,083 Posts |
![]() Quote:
25.11: ~3.55 ms/iter. 26.2: ~3.45 ms/iter. Would you know--I get a speed boost with 26.2 after all. It would seem, then, that this is an issue in PFGW and not in gwnum. |
|
![]() |
![]() |
![]() |
#26 |
"Mark"
Apr 2003
Between here and the
2×34×43 Posts |
![]()
Not necessarily. I've already shown the timings on Windows for the same build. Such a vast proportion of time is spent in gwnum that it is unlikely that PFGW could cause such a significant slow down. There is clearly something curious going on here though.
Last fiddled with by rogue on 2010-09-30 at 02:42 |
![]() |
![]() |
![]() |
#27 | |
A Sunny Moo
Aug 2007
USA (GMT-5)
3×2,083 Posts |
![]() Quote:
3.8.1: ~3.45 ms/iter. 3.8.2: ~3.40 ms/iter. Just like Prime95, LLR gets a speed increase with 3.8.2 as expected. What I really should do, though, is run the test from start to finish on each version of both Prime95 and LLR. We've already seen that in such a test PFGW 3.3.6 is inexplicably faster than 3.4.0, but it would be interesting to see if the same holds true for Prime95 and LLR. Sure, the ms/iter. figures show the newer version to be faster in both such cases, but in each case the figures fluctuated rather wildly and I had to come up with a "gut estimate average" to post here. The potential for experimental error is, needless to say, rather large. George, quick question: is there a way to make Prime95 print the exact wall-clock runtime at the end of a test, like PFGW and LLR do? As it is now, there's not really an easy way to directly measure this with Prime95. |
|
![]() |
![]() |
![]() |
#28 | |
"Mark"
Apr 2003
Between here and the
154668 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#29 |
P90 years forever!
Aug 2002
Yeehaw, FL
22·13·157 Posts |
![]()
The date/time is displayed at the start of every line output to the screen.
|
![]() |
![]() |
![]() |
#30 | |
A Sunny Moo
Aug 2007
USA (GMT-5)
624910 Posts |
![]()
For base 2, 3.4.0 is faster:
Code:
PFGW Version 3.3.6.20100908.Win_Stable [GWNUM 25.14] 2071*2^270307-1 is composite: RES64: [816B1DBFBFC67D09] (123.9523s+0.0004s) PFGW Version 3.4.0.32BIT.20100925.Win_Dev [GWNUM 26.2] 2071*2^270307-1 is composite: RES64: [816B1DBFBFC67D09] (108.1168s+0.0003s) Code:
PFGW Version 3.3.6.20100908.Win_Stable [GWNUM 25.14] 170979002*3^50000+1 is composite: RES64: [CBA9FAA11257431A] (15.1078s+0.0014s) PFGW Version 3.4.0.32BIT.20100925.Win_Dev [GWNUM 26.2] 170979002*3^50000+1 is composite: RES64: [CBA9FAA11257431A] (12.1903s+0.0017s) Code:
PFGW Version 3.3.6.20100908.Win_Stable [GWNUM 25.14] 18656*5^65474-1 is composite: RES64: [BB2682E39AA9CB16] (42.5636s+0.0034s) PFGW Version 3.4.0.32BIT.20100925.Win_Dev [GWNUM 26.2] 18656*5^65474-1 is composite: RES64: [BB2682E39AA9CB16] (37.9236s+0.0038s) Mark, did you by chance check which FFT 3.4.0 chose for the two (larger) base 5 tests on your CPU? Mine used "Core2 type-3 FFT length 128K"; maybe yours chose a different CPU architecture? (Grasping at straws here...) Quote:
I suppose what I could do is stick a miniscule test (n=100 or so) in the worktodo.txt file right before the base 5 test. That way, it prints out the time at the tiny test's completion (i.e., at the start of the base 5 test) and again at the end of the base 5 test. I'll try that later today. |
|
![]() |
![]() |
![]() |
#31 |
"Mark"
Apr 2003
Between here and the
1B3616 Posts |
![]()
This is what 3.4.0 chose on Win64:
Special modular reduction using zero-padded Core2 type-3 FFT length 128K, Pass1=128, Pass2=1K on 289184*5^477336-1 |
![]() |
![]() |
![]() |
#32 | |
A Sunny Moo
Aug 2007
USA (GMT-5)
3·2,083 Posts |
![]() Quote:
Special modular reduction using zero-padded Core2 type-3 FFT length 128K, Pass1=128, Pass2=1K on 289184*5^477336-1 Does the 32-bit version by chance give you something different? |
|
![]() |
![]() |
![]() |
#33 |
"Mark"
Apr 2003
Between here and the
2·34·43 Posts |
![]() |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
A possible bug in LLR/PFGW while using GWNUM (no bug in P95) | Batalov | Software | 77 | 2015-04-14 09:01 |
PFGW 3.2.0 has been Released | rogue | Software | 94 | 2010-09-14 21:39 |
PFGW 3.2.3 has been Released | rogue | Software | 10 | 2009-10-28 07:07 |
PFGW 3.2.1 has been released | rogue | Software | 5 | 2009-08-10 01:43 |
PFGW 3.1.0 has been Released | rogue | Software | 25 | 2009-07-21 18:13 |