-   Prime Cullen Prime (
-   -   gcwsieve (

geoff 2007-04-20 02:55

[QUOTE=em99010pepe;103893]2º have the ability to queue up jobs
3º save it's progress on an output file

I have added the workfile and checkpoint features in version 1.0.7, and have also added default names for the input and factors file, so now if you start it without specifying the -p -P -i or -f flags you get this behaviour:

Input sieve read from `sieve.txt'
Factors written to `factors.txt'
Checkpoints written to `checkpoint.txt'
Ranges read from `work.txt'

work.txt is the same format as sr5sieve, one range in billions per line, e.g.:

To run multiple gcwsieve processes you will either need to create a seperate directory for each process, or else specify the file names using the command line switches.

Carlos: I have made the SSE2 a little (4%) faster on my Northwood P4, can you check whether it is still slower than version 1.0.4 on your AMD64? If it is still slower then I will add the 1.0.4 code to the AMD build in the next version.

em99010pepe 2007-04-20 06:23


A quick test gave me still slower (5-8 kp/s less)than 1.0.4 version.


em99010pepe 2007-04-20 08:28


Just tested the latest version on my P4 3.0Ghz HT and got a decrease of sieve speed from 222 kp/s to 190kp/s.


geoff 2007-04-21 03:14

[QUOTE=em99010pepe;104083]Just tested the latest version on my P4 3.0Ghz HT and got a decrease of sieve speed from 222 kp/s to 190kp/s.

Thanks. For comparison at p=1500e9 with the 4352 term sieve file: on my 2.9GHz HT P4 I get 290 kp/s, or 505 kp/s by running two hyperthreads.

I'm not really sure what is going on with the speeds here, some of the code is the same as is used in sr5sieve, but the main loop is different so there are more possible complications. It may be that by trying to finely optimise for my own machines, I am making code that will not run fast on any others.

em99010pepe 2007-04-21 08:24

I always had this problem even when I was running sr5sieve. One of the reasons I stopped helping was because the sieve speed was decreasing each time a new version was released.

The previous sieve speeds were measured at p=1800e9.

For my AMD 64 3000+ seems like the optimal cache are L1=16Kb and L2=256Kb, respectively. I also don't know what's going on...the only thing I can help is to test your clients on different machines.


geoff 2007-04-21 22:55

I have made versions 1.0.4a and 1.0.6a using the SSE2 routines from 1.0.4 and 1.0.6, but otherwise they are the same as version 1.0.7.

There is one other setting that you can use to try to speed up the sieve: The -d switch allows you to set the maximum gap between exponents manually. Extra dummy terms will be added to fill any larger gaps. You can see the gap size chosen by running with the -v switch, and try something a bit smaller or larger. The optimal gap size will change with each new sieve file unfortunately.

Carlos: If you happen to remember which sr5sieve version was fastest for your machines I can make it available for download again. But as you can see here with gcwsieve, between three machines -- two of them Pentium 4's -- we have three different versions already :-).

em99010pepe 2007-04-22 09:54


I really can't remember which sr5sieve version was fastest on my machines. Last November I moved all machines to another project...

About gcwsieve, I think the problem here is a memory one.
1.0.4 version with L1=16 Kb and L2=256 Kb is faster (about 6kp/s) than 1.0.4a version with the same cache settings. I noticed the latter detects the cache size memory and 1.0.4 uses as default L1=16 Kb and L2=256 Kb.


em99010pepe 2007-04-23 20:44


I don't know if this matters but I tried sr1sieve because of this [url=]sieve project[/url] and I got my fastest times with L1 cache size of 32Kb and L2 cache size of 512Kb, the real ones for my machine.


geoff 2007-06-10 22:18

gcwsieve 1.0.8
Compared to the previous version, this one is about 10% faster on my P4 and about 5% faster on my P3. There is no need to upgrade if it turns out to be slower on your machine.

On Windows the p/sec and sec/factor rates are now measured in CPU-seconds for consistency with the Unix versions. (Same as recent versions of sr5sieve). To get a valid comparison with previous versions, run them on an otherwise idle CPU.

geoff 2007-06-18 22:40

gcwsieve 1.0.9
Some changes to the 32-bit SSE2 assembler (taken from sr5sieve 1.5.6) have had a big effect on P4 performance. Here are some times for my 2.9 GHz P4 at p=4200e9 using the 2947 term sieve file for the 2.5-5.0 million n range:
Version Single Thread 2 Hyperthreads
------- ------------- --------------
1.0.7: 405 kp/s 700 kp/s
1.0.8: 446 kp/s (+10%) 720 kp/s (+3%)
1.0.9: 644 kp/s (+44%) 786 kp/s (+9%)
I don't know whether other SSE2 machines will benefit by as much, or at all.

My P4 is still better off LLR testing, but my P4/Celeron would now be more productive sieving, even just on the 2.5-5 million range. (It is the same speed as a full P4 for sieving, but only half speed for LLR testing which it is doing at the moment).

hhh 2007-06-19 09:43

Great news indeed.

Sieving the 5-25M range will be done up to 100G in a couple of days, and I think we can start a new sieve drive then. Is that OK for you, or do you suggest to reopen the late 1.5-5M range?

Yours H.

All times are UTC. The time now is 19:36.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.