![]() |
![]() |
#12 | |
Mar 2003
New Zealand
13·89 Posts |
![]() Quote:
Input sieve read from `sieve.txt' Factors written to `factors.txt' Checkpoints written to `checkpoint.txt' Ranges read from `work.txt' work.txt is the same format as sr5sieve, one range in billions per line, e.g.: 1500,1600 2000,2200 To run multiple gcwsieve processes you will either need to create a seperate directory for each process, or else specify the file names using the command line switches. Carlos: I have made the SSE2 a little (4%) faster on my Northwood P4, can you check whether it is still slower than version 1.0.4 on your AMD64? If it is still slower then I will add the 1.0.4 code to the AMD build in the next version. |
|
![]() |
![]() |
#13 |
Sep 2004
2·5·283 Posts |
![]()
Geoff,
A quick test gave me still slower (5-8 kp/s less)than 1.0.4 version. Carlos |
![]() |
![]() |
#14 |
Sep 2004
2×5×283 Posts |
![]()
Geoff,
Just tested the latest version on my P4 3.0Ghz HT and got a decrease of sieve speed from 222 kp/s to 190kp/s. Carlos |
![]() |
![]() |
#15 | |
Mar 2003
New Zealand
13·89 Posts |
![]() Quote:
I'm not really sure what is going on with the speeds here, some of the code is the same as is used in sr5sieve, but the main loop is different so there are more possible complications. It may be that by trying to finely optimise for my own machines, I am making code that will not run fast on any others. |
|
![]() |
![]() |
#16 |
Sep 2004
2·5·283 Posts |
![]()
I always had this problem even when I was running sr5sieve. One of the reasons I stopped helping was because the sieve speed was decreasing each time a new version was released.
The previous sieve speeds were measured at p=1800e9. For my AMD 64 3000+ seems like the optimal cache are L1=16Kb and L2=256Kb, respectively. I also don't know what's going on...the only thing I can help is to test your clients on different machines. Carlos |
![]() |
![]() |
#17 |
Mar 2003
New Zealand
48516 Posts |
![]()
I have made versions 1.0.4a and 1.0.6a using the SSE2 routines from 1.0.4 and 1.0.6, but otherwise they are the same as version 1.0.7.
There is one other setting that you can use to try to speed up the sieve: The -d switch allows you to set the maximum gap between exponents manually. Extra dummy terms will be added to fill any larger gaps. You can see the gap size chosen by running with the -v switch, and try something a bit smaller or larger. The optimal gap size will change with each new sieve file unfortunately. Carlos: If you happen to remember which sr5sieve version was fastest for your machines I can make it available for download again. But as you can see here with gcwsieve, between three machines -- two of them Pentium 4's -- we have three different versions already :-). |
![]() |
![]() |
#18 |
Sep 2004
B0E16 Posts |
![]()
Geoff,
I really can't remember which sr5sieve version was fastest on my machines. Last November I moved all machines to another project... About gcwsieve, I think the problem here is a memory one. 1.0.4 version with L1=16 Kb and L2=256 Kb is faster (about 6kp/s) than 1.0.4a version with the same cache settings. I noticed the latter detects the cache size memory and 1.0.4 uses as default L1=16 Kb and L2=256 Kb. Carlos Last fiddled with by em99010pepe on 2007-04-22 at 09:56 |
![]() |
![]() |
#19 |
Sep 2004
283010 Posts |
![]()
Geoff,
I don't know if this matters but I tried sr1sieve because of this sieve project and I got my fastest times with L1 cache size of 32Kb and L2 cache size of 512Kb, the real ones for my machine. Carlos |
![]() |
![]() |
#20 |
Mar 2003
New Zealand
13×89 Posts |
![]()
Compared to the previous version, this one is about 10% faster on my P4 and about 5% faster on my P3. There is no need to upgrade if it turns out to be slower on your machine.
On Windows the p/sec and sec/factor rates are now measured in CPU-seconds for consistency with the Unix versions. (Same as recent versions of sr5sieve). To get a valid comparison with previous versions, run them on an otherwise idle CPU. |
![]() |
![]() |
#21 |
Mar 2003
New Zealand
22058 Posts |
![]()
Some changes to the 32-bit SSE2 assembler (taken from sr5sieve 1.5.6) have had a big effect on P4 performance. Here are some times for my 2.9 GHz P4 at p=4200e9 using the 2947 term sieve file for the 2.5-5.0 million n range:
Code:
Version Single Thread 2 Hyperthreads ------- ------------- -------------- 1.0.7: 405 kp/s 700 kp/s 1.0.8: 446 kp/s (+10%) 720 kp/s (+3%) 1.0.9: 644 kp/s (+44%) 786 kp/s (+9%) My P4 is still better off LLR testing, but my P4/Celeron would now be more productive sieving, even just on the 2.5-5 million range. (It is the same speed as a full P4 for sieving, but only half speed for LLR testing which it is doing at the moment). |
![]() |
![]() |
#22 |
Jun 2005
5658 Posts |
![]()
Great news indeed.
Sieving the 5-25M range will be done up to 100G in a couple of days, and I think we can start a new sieve drive then. Is that OK for you, or do you suggest to reopen the late 1.5-5M range? Yours H. |
![]() |