20100929, 21:06  #23 
A Sunny Moo
Aug 2007
USA (GMT5)
14151_{8} Posts 
I've also tried comparing iteration timings with Prime95 v26.2 vs. 25.11 (both Windows 32bit) to verify whether the problem is in gwnum, or just PFGW.
In both cases, I used the Advanced>Time option, and ran 1000 iterations of M38000000. 25.11: ~60 ms/iter. 26.2: ~50 ms/iter. So there's a significant speedup going to version 26.2. Of course, this is a much bigger FFT than that used on the base 5 numbers; so I also tried 1000 iterations of M1100000. 25.11: ~1.2 ms/iter. 26.2: ~1.5 ms/iter. It seems that version 26.2 is actually slower on this FFT. Note that Prime95 v26.2 used the Pentium 4 type3 56K FFT for this number, whereas the base 5 numbers tested earlier were done with a Core2 type3 128K FFT. However, there does seem to be a commonality in that in both cases, the v26 gwnum program tested slower on my CPU at these low FFTs. 
20100929, 21:47  #24  
P90 years forever!
Aug 2002
Yeehaw, FL
11·673 Posts 
Quote:
It is baffling to me why these smaller FFTs are slower on your Core 2 but not for anyone else. Maybe CPUZ or one of the other programs that do a more thorough dump of CPU characteristics might shed some light. 

20100930, 02:26  #25  
A Sunny Moo
Aug 2007
USA (GMT5)
6249_{10} Posts 
Quote:
25.11: ~3.55 ms/iter. 26.2: ~3.45 ms/iter. Would you knowI get a speed boost with 26.2 after all. It would seem, then, that this is an issue in PFGW and not in gwnum. 

20100930, 02:41  #26 
"Mark"
Apr 2003
Between here and the
2^{2}·3·523 Posts 
Not necessarily. I've already shown the timings on Windows for the same build. Such a vast proportion of time is spent in gwnum that it is unlikely that PFGW could cause such a significant slow down. There is clearly something curious going on here though.
Last fiddled with by rogue on 20100930 at 02:42 
20100930, 06:10  #27  
A Sunny Moo
Aug 2007
USA (GMT5)
3×2,083 Posts 
Quote:
3.8.1: ~3.45 ms/iter. 3.8.2: ~3.40 ms/iter. Just like Prime95, LLR gets a speed increase with 3.8.2 as expected. What I really should do, though, is run the test from start to finish on each version of both Prime95 and LLR. We've already seen that in such a test PFGW 3.3.6 is inexplicably faster than 3.4.0, but it would be interesting to see if the same holds true for Prime95 and LLR. Sure, the ms/iter. figures show the newer version to be faster in both such cases, but in each case the figures fluctuated rather wildly and I had to come up with a "gut estimate average" to post here. The potential for experimental error is, needless to say, rather large. George, quick question: is there a way to make Prime95 print the exact wallclock runtime at the end of a test, like PFGW and LLR do? As it is now, there's not really an easy way to directly measure this with Prime95. 

20100930, 12:39  #28  
"Mark"
Apr 2003
Between here and the
2^{2}·3·523 Posts 
Quote:


20100930, 14:28  #29 
P90 years forever!
Aug 2002
Yeehaw, FL
11×673 Posts 
The date/time is displayed at the start of every line output to the screen.

20100930, 14:58  #30  
A Sunny Moo
Aug 2007
USA (GMT5)
3×2,083 Posts 
For base 2, 3.4.0 is faster:
Code:
PFGW Version 3.3.6.20100908.Win_Stable [GWNUM 25.14] 2071*2^2703071 is composite: RES64: [816B1DBFBFC67D09] (123.9523s+0.0004s) PFGW Version 3.4.0.32BIT.20100925.Win_Dev [GWNUM 26.2] 2071*2^2703071 is composite: RES64: [816B1DBFBFC67D09] (108.1168s+0.0003s) Code:
PFGW Version 3.3.6.20100908.Win_Stable [GWNUM 25.14] 170979002*3^50000+1 is composite: RES64: [CBA9FAA11257431A] (15.1078s+0.0014s) PFGW Version 3.4.0.32BIT.20100925.Win_Dev [GWNUM 26.2] 170979002*3^50000+1 is composite: RES64: [CBA9FAA11257431A] (12.1903s+0.0017s) Code:
PFGW Version 3.3.6.20100908.Win_Stable [GWNUM 25.14] 18656*5^654741 is composite: RES64: [BB2682E39AA9CB16] (42.5636s+0.0034s) PFGW Version 3.4.0.32BIT.20100925.Win_Dev [GWNUM 26.2] 18656*5^654741 is composite: RES64: [BB2682E39AA9CB16] (37.9236s+0.0038s) Mark, did you by chance check which FFT 3.4.0 chose for the two (larger) base 5 tests on your CPU? Mine used "Core2 type3 FFT length 128K"; maybe yours chose a different CPU architecture? (Grasping at straws here...) Quote:
I suppose what I could do is stick a miniscule test (n=100 or so) in the worktodo.txt file right before the base 5 test. That way, it prints out the time at the tiny test's completion (i.e., at the start of the base 5 test) and again at the end of the base 5 test. I'll try that later today. 

20100930, 15:15  #31 
"Mark"
Apr 2003
Between here and the
2^{2}×3×523 Posts 
This is what 3.4.0 chose on Win64:
Special modular reduction using zeropadded Core2 type3 FFT length 128K, Pass1=128, Pass2=1K on 289184*5^4773361 
20100930, 15:27  #32  
A Sunny Moo
Aug 2007
USA (GMT5)
1869_{16} Posts 
Quote:
Special modular reduction using zeropadded Core2 type3 FFT length 128K, Pass1=128, Pass2=1K on 289184*5^4773361 Does the 32bit version by chance give you something different? 

20100930, 15:45  #33 
"Mark"
Apr 2003
Between here and the
2^{2}·3·523 Posts 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
A possible bug in LLR/PFGW while using GWNUM (no bug in P95)  Batalov  Software  77  20150414 09:01 
PFGW 3.2.0 has been Released  rogue  Software  94  20100914 21:39 
PFGW 3.2.3 has been Released  rogue  Software  10  20091028 07:07 
PFGW 3.2.2 has been Released  rogue  Software  20  20090823 12:14 
PFGW 3.2.1 has been released  rogue  Software  5  20090810 01:43 