mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Prime Cullen Prime (https://www.mersenneforum.org/forumdisplay.php?f=79)
-   -   gcwsieve (https://www.mersenneforum.org/showthread.php?t=7788)

geoff 2007-07-22 23:21

[QUOTE=jasong;110891]The 64-bit code works perfectly. When I unzipped the 32-bit version to the same directory and tried to run it, the OS claimed the file didn't exist, even though the 'ls' command listed it as being there.
[/QUOTE]

OK, that is probably because you don't have 32-bit system libraries installed.

My main concern was to check that the 64-bit code works correctly, as it hasn't been tested before, so thanks for helping with that.

jasong 2007-07-23 04:50

[QUOTE=geoff;110941]OK, that is probably because you don't have 32-bit system libraries installed.

My main concern was to check that the 64-bit code works correctly, as it hasn't been tested before, so thanks for helping with that.[/QUOTE]

So, I won't be able to run ANY 32-bit apps?

geoff 2007-07-24 00:29

[QUOTE=jasong;110950]So, I won't be able to run ANY 32-bit apps?[/QUOTE]

If the problem is missing 32-bit libraries, then I guess you will only be able to run statically linked 32-bit apps. It should be simple enoughto install the 32-bit libraries though, unless you are using a live CD or something like that.

jasong 2007-07-24 03:03

[QUOTE=geoff;111011]If the problem is missing 32-bit libraries, then I guess you will only be able to run statically linked 32-bit apps. It should be simple enoughto install the 32-bit libraries though, unless you are using a live CD or something like that.[/QUOTE]
I've got more than a week before gcwsieve completes the range, so not a big concern.

geoff 2007-07-30 23:49

In version 1.0.9 the 32-bit code was actually faster than the 64-bit code. But in version 1.0.10 the 64-bit code is faster again.

A quick test run on a primegrid range with a C2D @ 2.67GHz:
[code]
version 1.0.9 64-bit: 61 kp/s
version 1.0.11 32-bit: 83 kp/s
version 1.0.11 64-bit: 100 kp/s
[/code]

edit: from the Cullen 2M sieve, p=1000e9

rogue 2007-08-01 01:16

Version 1.0.11 on PowerPC64 (at 2.5 GHz):

457243 p/sec

:shock:

geoff 2007-08-01 02:01

[QUOTE=rogue;111456]Version 1.0.11 on PowerPC64 (at 2.5 GHz):

457243 p/sec

:shock:[/QUOTE]
Which sieve file was that with? If it is with the current 5.0M < n < 7.5M file for this project then that is a good time, but not too surprising.

For comparison a 2.9GHz P4 does about 500 kp/s on that file at p=100e9.

There is room for improvement in the ppc64 code. Currently the ppc64 uses the same method as the non-SSE2 x86 machines, which process the candidates one at a time, while the SSE2 and x86-64 code does them 4 at a time.

rogue 2007-08-01 02:32

That was with the above file (after fixing the input). It was at p=1000e9.

Are you saying it doesn't have the improvements that were done to sr2sieve and sr5sieve?

geoff 2007-08-01 03:36

[QUOTE=rogue;111463]That was with the above file (after fixing the input). It was at p=1000e9.

Are you saying it doesn't have the improvements that were done to sr2sieve and sr5sieve?[/QUOTE]

No, the main loop in gcwsieve doesn't benefit from those improvements because each new computation a*b (mod p) has new values of a and b.

The x86 and ppc64 main loop looks a bit like this:
[code]
for (i=0; i<n; i++)
X[i] = X[i] * Y[i] (mod p)
if (X[i] == Z[i])
/* Found a factor */
[/code]
With the SSE2 and (from 1.0.10) the x86-64 versions it is vectorised a bit like this:
[code]
m = n/4
for (i = 0; i < m; i++)
X[i+0*m] = X[i+0*m] * Y[i+0*m] (mod p)
...
X[i+3*m] = X[i+3*m] * Y[i+3*m] (mod p)
if (X[i+0*m] == Z[i+0*m] || .. || X[i+3*m] == Z[i+3*m])
/* Found a factor */
[/code]
The vectorisation can't be done automatically by the C compiler because the initial values of X[0], X[m], X[2*m], X[3*m] can't be inferred from the original loop. (They are computed seperately with powmod).

geoff 2007-08-01 04:23

Just to correct the previous post: Yes the improvements to the ppc64 assembler are in gcwsieve 1.0.10. They may speed up some other parts of the code, but they don't help with the main loop, there is still room for improvement there.

geoff 2007-08-03 01:24

gcwsieve 1.0.13
 
This version fixes a memory allocation bug that could cause the program to abort at the end of a sieve range, or a memory leak if there were multiple ranges queued up in the work file.

No work needs to be repeated, as all results for the range would have been written to file before the abort. The affected builds were:

Windows: versions 1.0.0 - 1.0.10.
OS X: versions 1.0.0 - 1.0.12.

The bug didn't affect the Linux builds. Thanks rogue for finding it.


All times are UTC. The time now is 08:35.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.