20170928, 15:41  #1 
Nov 2010
5^{2} Posts 
Multithreaded PFGW
I have implemented a version of PFGW which supports multithreading, essentially along the same lines as LLR i.e. adding a gwset_num_threads() call between gwinit() and gwsetup(). The code is working correctly i.e. giving the correct answers, but does not always give speedup. If anyone wants to play, the patch is here: https://pastebin.com/xw7ykJbM (note that the patch also works with the most recent gwnum, since it removes the use of the deprecated gwmap_to_fft_info() function). Multiple threads are enabled using the T<number> command line argument.
For example, if I run a PRP test on a large factorial prime 208003!1 I find that going from 1 to 2 threads I get a 1.6x speedup, which is comparable to what I get using the LLR program to test 387*2^3322763+1, which is a Proth of roughly the same digit length. Code:
../pfgw64 q'208003!1' V T2 PFGW Version 3.8.3.64BIT.20161203.Mac_Dev [GWNUM 29.2] Generic modular reduction using generic reduction FMA3 FFT length 336K, Pass1=448, Pass2=768, clm=4, 2 threads on A 3374558bit number ./llr64.3.8.20 q387*2^3322763+1 d t2 Starting Proth prime test of 387*2^3322763+1 Using allcomplex FMA3 FFT length 256K, Pass1=128, Pass2=2K, 2 threads, a = 5 Code:
../pfgw64 q'387*2^3322763+1' V T2 PFGW Version 3.8.3.64BIT.20161203.Mac_Dev [GWNUM 29.2] Special modular reduction using allcomplex FMA3 FFT length 240K, Pass1=1280, Pass2=192, clm=2, 2 threads on 387*2^3322763+1 ../pfgw64 q'387*2^3322763+1' V t T2 PFGW Version 3.8.3.64BIT.20161203.Mac_Dev [GWNUM 29.2] Primality testing 387*2^3322763+1 [N1, BrillhartLehmerSelfridge] Running N1 test using base 5 Special modular reduction using allcomplex FMA3 FFT length 240K, Pass1=1280, Pass2=192, clm=2, 2 threads on 387*2^3322763+1 Code:
../pfgw64 q'387*2^3322763+1' V T2 PFGW Version 3.8.3.64BIT.20161203.Mac_Dev [GWNUM 29.2] Generic modular reduction using allcomplex FMA3 FFT length 240K, Pass1=1280, Pass2=192, clm=2, 2 threads on 387*2^3322763+1  Iain 
20170928, 19:15  #2 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
3·7·479 Posts 
As soon as you are in the 'Generic modular reduction using generic reduction' territory, by definition you have PFGW doing things differently to LLR. Neither LLR nor P95 have an equivalent use mode.

20170928, 20:16  #3 
"Mark"
Apr 2003
Between here and the
2^{3}·13·67 Posts 
The term "Special" or "Generic" txt is output depending upon the call used to gwnum to set up the modular reduction. gwsetup() vs gwsetup_general_mod()/gwsetup_general_mod_64(). The text following "using" is the output produced by a call to gwmodulo_as_string().

20170929, 08:27  #4 
Nov 2010
31_{8} Posts 
That's why I'm confused  it doesn't seem to be as simple as generic/special reduction controlling whether the multithreaded FFT performs well or not e.g.
So there is some other effect at work here, which I don't quite understand at this point. I wonder if George has any insight? 
20170929, 11:28  #5 
Jun 2003
2×2,719 Posts 
Iain, can you post the output (showing the FFT) from both single thread and 2thread runs for the Proth number? Also the iteration timings and the CPU usage as well for both of these?
What kind of CPU is it? If it is HT, are both the threads of the 2thread run somehow running on the same physical core? Last fiddled with by axn on 20170929 at 11:32 Reason: Clarified "output" 
20180316, 17:52  #6 
"Mark"
Apr 2003
Between here and the
2^{3}·13·67 Posts 
I applied the change and didn't see any improvement either. Did you look at how llr implemented it?

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
PFGW GUI vs CMD  houding  Software  1  20160620 12:11 
Which Work Types are Multithreaded  tului  Software  6  20151128 21:59 
PFGW 3.3.6 or PFGW 3.4.2 Please update now!  Joe O  Sierpinski/Riesel Base 5  5  20100930 14:07 
Feature request: multithreaded polsel  Andi47  Msieve  1  20100220 01:16 
Multithreaded QS/NFS sieve stage  nuggetprime  Msieve  5  20080811 07:48 