mersenneforum.org PFGW 4.0.4 (with gwnum v30.10) Released
 Register FAQ Search Today's Posts Mark Forums Read

 2010-09-29, 21:06 #23 mdettweiler A Sunny Moo     Aug 2007 USA (GMT-5) 186916 Posts I've also tried comparing iteration timings with Prime95 v26.2 vs. 25.11 (both Windows 32-bit) to verify whether the problem is in gwnum, or just PFGW. In both cases, I used the Advanced>Time option, and ran 1000 iterations of M38000000. 25.11: ~60 ms/iter. 26.2: ~50 ms/iter. So there's a significant speedup going to version 26.2. Of course, this is a much bigger FFT than that used on the base 5 numbers; so I also tried 1000 iterations of M1100000. 25.11: ~1.2 ms/iter. 26.2: ~1.5 ms/iter. It seems that version 26.2 is actually slower on this FFT. Note that Prime95 v26.2 used the Pentium 4 type-3 56K FFT for this number, whereas the base 5 numbers tested earlier were done with a Core2 type-3 128K FFT. However, there does seem to be a commonality in that in both cases, the v26 gwnum program tested slower on my CPU at these low FFTs.
2010-09-29, 21:47   #24
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

22·13·157 Posts

Quote:
 Originally Posted by mdettweiler I've also tried comparing iteration timings with Prime95 v26.2 vs. 25.11 (both Windows 32-bit) to verify whether the problem is in gwnum, or just PFGW.
You can add "PRP=289184,5,477336,-1" to worktodo.txt to time the exact numbers in question.

It is baffling to me why these smaller FFTs are slower on your Core 2 but not for anyone else. Maybe CPU-Z or one of the other programs that do a more thorough dump of CPU characteristics might shed some light.

2010-09-30, 02:26   #25
mdettweiler
A Sunny Moo

Aug 2007
USA (GMT-5)

3×2,083 Posts

Quote:
 Originally Posted by Prime95 You can add "PRP=289184,5,477336,-1" to worktodo.txt to time the exact numbers in question. It is baffling to me why these smaller FFTs are slower on your Core 2 but not for anyone else. Maybe CPU-Z or one of the other programs that do a more thorough dump of CPU characteristics might shed some light.
Okay, here's the iteration timings I got for that with Prime95 v25.11 and 26.2:

25.11: ~3.55 ms/iter.
26.2: ~3.45 ms/iter.

Would you know--I get a speed boost with 26.2 after all. It would seem, then, that this is an issue in PFGW and not in gwnum.

2010-09-30, 02:41   #26
rogue

"Mark"
Apr 2003
Between here and the

2×34×43 Posts

Quote:
 Originally Posted by mdettweiler Okay, here's the iteration timings I got for that with Prime95 v25.11 and 26.2: 25.11: ~3.55 ms/iter. 26.2: ~3.45 ms/iter. Would you know--I get a speed boost with 26.2 after all. It would seem, then, that this is an issue in PFGW and not in gwnum.
Not necessarily. I've already shown the timings on Windows for the same build. Such a vast proportion of time is spent in gwnum that it is unlikely that PFGW could cause such a significant slow down. There is clearly something curious going on here though.

Last fiddled with by rogue on 2010-09-30 at 02:42

2010-09-30, 06:10   #27
mdettweiler
A Sunny Moo

Aug 2007
USA (GMT-5)

3×2,083 Posts

Quote:
 Originally Posted by rogue Not necessarily. I've already shown the timings on Windows for the same build. Such a vast proportion of time is spent in gwnum that it is unlikely that PFGW could cause such a significant slow down. There is clearly something curious going on here though.
Here's what I get comparing iteration times on 289184*5^477336-1 for LLR 3.8.1 and 3.8.2:

3.8.1: ~3.45 ms/iter.
3.8.2: ~3.40 ms/iter.

Just like Prime95, LLR gets a speed increase with 3.8.2 as expected.

What I really should do, though, is run the test from start to finish on each version of both Prime95 and LLR. We've already seen that in such a test PFGW 3.3.6 is inexplicably faster than 3.4.0, but it would be interesting to see if the same holds true for Prime95 and LLR. Sure, the ms/iter. figures show the newer version to be faster in both such cases, but in each case the figures fluctuated rather wildly and I had to come up with a "gut estimate average" to post here. The potential for experimental error is, needless to say, rather large.

George, quick question: is there a way to make Prime95 print the exact wall-clock runtime at the end of a test, like PFGW and LLR do? As it is now, there's not really an easy way to directly measure this with Prime95.

2010-09-30, 12:39   #28
rogue

"Mark"
Apr 2003
Between here and the

154668 Posts

Quote:
 Originally Posted by mdettweiler Here's what I get comparing iteration times on 289184*5^477336-1 for LLR 3.8.1 and 3.8.2: 3.8.1: ~3.45 ms/iter. 3.8.2: ~3.40 ms/iter. Just like Prime95, LLR gets a speed increase with 3.8.2 as expected. What I really should do, though, is run the test from start to finish on each version of both Prime95 and LLR. We've already seen that in such a test PFGW 3.3.6 is inexplicably faster than 3.4.0, but it would be interesting to see if the same holds true for Prime95 and LLR. Sure, the ms/iter. figures show the newer version to be faster in both such cases, but in each case the figures fluctuated rather wildly and I had to come up with a "gut estimate average" to post here. The potential for experimental error is, needless to say, rather large. George, quick question: is there a way to make Prime95 print the exact wall-clock runtime at the end of a test, like PFGW and LLR do? As it is now, there's not really an easy way to directly measure this with Prime95.
I'm curious. Is 3.4.0 slower for other n and other bases?

2010-09-30, 14:28   #29
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

22·13·157 Posts

Quote:
 Originally Posted by mdettweiler George, quick question: is there a way to make Prime95 print the exact wall-clock runtime at the end of a test, like PFGW and LLR do? As it is now, there's not really an easy way to directly measure this with Prime95.
The date/time is displayed at the start of every line output to the screen.

2010-09-30, 14:58   #30
mdettweiler
A Sunny Moo

Aug 2007
USA (GMT-5)

624910 Posts

Quote:
 Originally Posted by rogue I'm curious. Is 3.4.0 slower for other n and other bases?
For base 2, 3.4.0 is faster:
Code:
PFGW Version 3.3.6.20100908.Win_Stable [GWNUM 25.14]
2071*2^270307-1 is composite: RES64: [816B1DBFBFC67D09] (123.9523s+0.0004s)

PFGW Version 3.4.0.32BIT.20100925.Win_Dev [GWNUM 26.2]
2071*2^270307-1 is composite: RES64: [816B1DBFBFC67D09] (108.1168s+0.0003s)
Ditto for base 3:
Code:
PFGW Version 3.3.6.20100908.Win_Stable [GWNUM 25.14]
170979002*3^50000+1 is composite: RES64: [CBA9FAA11257431A] (15.1078s+0.0014s)

PFGW Version 3.4.0.32BIT.20100925.Win_Dev [GWNUM 26.2]
170979002*3^50000+1 is composite: RES64: [CBA9FAA11257431A] (12.1903s+0.0017s)
And for another n on base 5, 3.4.0 is again faster:
Code:
PFGW Version 3.3.6.20100908.Win_Stable [GWNUM 25.14]
18656*5^65474-1 is composite: RES64: [BB2682E39AA9CB16] (42.5636s+0.0034s)

PFGW Version 3.4.0.32BIT.20100925.Win_Dev [GWNUM 26.2]
18656*5^65474-1 is composite: RES64: [BB2682E39AA9CB16] (37.9236s+0.0038s)
The problem, it would seem, is localized to this particular range of n on base 5 (possibly this particular FFT size).

Mark, did you by chance check which FFT 3.4.0 chose for the two (larger) base 5 tests on your CPU? Mine used "Core2 type-3 FFT length 128K"; maybe yours chose a different CPU architecture? (Grasping at straws here...)
Quote:
 Originally Posted by Prime95 The date/time is displayed at the start of every line output to the screen.
But that only prints out at the end of each test; what I'd need for it to do is to print the time at the beginning as well so I could subtract and get the runtime.

I suppose what I could do is stick a miniscule test (n=100 or so) in the worktodo.txt file right before the base 5 test. That way, it prints out the time at the tiny test's completion (i.e., at the start of the base 5 test) and again at the end of the base 5 test. I'll try that later today.

 2010-09-30, 15:15 #31 rogue     "Mark" Apr 2003 Between here and the 1B3616 Posts This is what 3.4.0 chose on Win64: Special modular reduction using zero-padded Core2 type-3 FFT length 128K, Pass1=128, Pass2=1K on 289184*5^477336-1
2010-09-30, 15:27   #32
mdettweiler
A Sunny Moo

Aug 2007
USA (GMT-5)

3·2,083 Posts

Quote:
 Originally Posted by rogue This is what 3.4.0 chose on Win64: Special modular reduction using zero-padded Core2 type-3 FFT length 128K, Pass1=128, Pass2=1K on 289184*5^477336-1
That's the same as what I got:

Special modular reduction using zero-padded Core2 type-3 FFT length 128K, Pass1=128, Pass2=1K on 289184*5^477336-1

Does the 32-bit version by chance give you something different?

2010-09-30, 15:45   #33
rogue

"Mark"
Apr 2003
Between here and the

2·34·43 Posts

Quote:
 Originally Posted by mdettweiler That's the same as what I got: Special modular reduction using zero-padded Core2 type-3 FFT length 128K, Pass1=128, Pass2=1K on 289184*5^477336-1 Does the 32-bit version by chance give you something different?
Nope.

 Similar Threads Thread Thread Starter Forum Replies Last Post Batalov Software 77 2015-04-14 09:01 rogue Software 94 2010-09-14 21:39 rogue Software 10 2009-10-28 07:07 rogue Software 5 2009-08-10 01:43 rogue Software 25 2009-07-21 18:13

All times are UTC. The time now is 02:29.

Mon Feb 6 02:29:50 UTC 2023 up 171 days, 23:58, 1 user, load averages: 0.48, 0.69, 0.88