mersenneforum.org gcwsieve
 Register FAQ Search Today's Posts Mark Forums Read

2007-07-22, 23:21   #34
geoff

Mar 2003
New Zealand

13×89 Posts

Quote:
 Originally Posted by jasong The 64-bit code works perfectly. When I unzipped the 32-bit version to the same directory and tried to run it, the OS claimed the file didn't exist, even though the 'ls' command listed it as being there.
OK, that is probably because you don't have 32-bit system libraries installed.

My main concern was to check that the 64-bit code works correctly, as it hasn't been tested before, so thanks for helping with that.

2007-07-23, 04:50   #35
jasong

"Jason Goatcher"
Mar 2005

5·701 Posts

Quote:
 Originally Posted by geoff OK, that is probably because you don't have 32-bit system libraries installed. My main concern was to check that the 64-bit code works correctly, as it hasn't been tested before, so thanks for helping with that.
So, I won't be able to run ANY 32-bit apps?

2007-07-24, 00:29   #36
geoff

Mar 2003
New Zealand

13×89 Posts

Quote:
 Originally Posted by jasong So, I won't be able to run ANY 32-bit apps?
If the problem is missing 32-bit libraries, then I guess you will only be able to run statically linked 32-bit apps. It should be simple enoughto install the 32-bit libraries though, unless you are using a live CD or something like that.

2007-07-24, 03:03   #37
jasong

"Jason Goatcher"
Mar 2005

5·701 Posts

Quote:
 Originally Posted by geoff If the problem is missing 32-bit libraries, then I guess you will only be able to run statically linked 32-bit apps. It should be simple enoughto install the 32-bit libraries though, unless you are using a live CD or something like that.
I've got more than a week before gcwsieve completes the range, so not a big concern.

 2007-07-30, 23:49 #38 geoff     Mar 2003 New Zealand 13×89 Posts In version 1.0.9 the 32-bit code was actually faster than the 64-bit code. But in version 1.0.10 the 64-bit code is faster again. A quick test run on a primegrid range with a C2D @ 2.67GHz: Code: version 1.0.9 64-bit: 61 kp/s version 1.0.11 32-bit: 83 kp/s version 1.0.11 64-bit: 100 kp/s edit: from the Cullen 2M sieve, p=1000e9 Last fiddled with by geoff on 2007-07-30 at 23:51
 2007-08-01, 01:16 #39 rogue     "Mark" Apr 2003 Between here and the 2·3·937 Posts Version 1.0.11 on PowerPC64 (at 2.5 GHz): 457243 p/sec Last fiddled with by rogue on 2007-08-01 at 01:17
2007-08-01, 02:01   #40
geoff

Mar 2003
New Zealand

13×89 Posts

Quote:
 Originally Posted by rogue Version 1.0.11 on PowerPC64 (at 2.5 GHz): 457243 p/sec
Which sieve file was that with? If it is with the current 5.0M < n < 7.5M file for this project then that is a good time, but not too surprising.

For comparison a 2.9GHz P4 does about 500 kp/s on that file at p=100e9.

There is room for improvement in the ppc64 code. Currently the ppc64 uses the same method as the non-SSE2 x86 machines, which process the candidates one at a time, while the SSE2 and x86-64 code does them 4 at a time.

 2007-08-01, 02:32 #41 rogue     "Mark" Apr 2003 Between here and the 15F616 Posts That was with the above file (after fixing the input). It was at p=1000e9. Are you saying it doesn't have the improvements that were done to sr2sieve and sr5sieve?
2007-08-01, 03:36   #42
geoff

Mar 2003
New Zealand

48516 Posts

Quote:
 Originally Posted by rogue That was with the above file (after fixing the input). It was at p=1000e9. Are you saying it doesn't have the improvements that were done to sr2sieve and sr5sieve?
No, the main loop in gcwsieve doesn't benefit from those improvements because each new computation a*b (mod p) has new values of a and b.

The x86 and ppc64 main loop looks a bit like this:
Code:
for (i=0; i<n; i++)
X[i] = X[i] * Y[i] (mod p)
if (X[i] == Z[i])
/* Found a factor */
With the SSE2 and (from 1.0.10) the x86-64 versions it is vectorised a bit like this:
Code:
m = n/4
for (i = 0; i < m; i++)
X[i+0*m] = X[i+0*m] * Y[i+0*m] (mod p)
...
X[i+3*m] = X[i+3*m] * Y[i+3*m] (mod p)
if (X[i+0*m] == Z[i+0*m] || .. || X[i+3*m] == Z[i+3*m])
/* Found a factor */
The vectorisation can't be done automatically by the C compiler because the initial values of X[0], X[m], X[2*m], X[3*m] can't be inferred from the original loop. (They are computed seperately with powmod).

 2007-08-01, 04:23 #43 geoff     Mar 2003 New Zealand 13·89 Posts Just to correct the previous post: Yes the improvements to the ppc64 assembler are in gcwsieve 1.0.10. They may speed up some other parts of the code, but they don't help with the main loop, there is still room for improvement there.
 2007-08-03, 01:24 #44 geoff     Mar 2003 New Zealand 13×89 Posts gcwsieve 1.0.13 This version fixes a memory allocation bug that could cause the program to abort at the end of a sieve range, or a memory leak if there were multiple ranges queued up in the work file. No work needs to be repeated, as all results for the range would have been written to file before the abort. The affected builds were: Windows: versions 1.0.0 - 1.0.10. OS X: versions 1.0.0 - 1.0.12. The bug didn't affect the Linux builds. Thanks rogue for finding it.