mersenneforum.org gcwsieve
 Register FAQ Search Today's Posts Mark Forums Read

 2007-04-08, 14:03 #1 em99010pepe     Sep 2004 2·5·283 Posts gcwsieve Geoff, How do I feed the client with more ranges without stopping it? I like the srwork.txt thing you have on the other clients. How do I turn sse2 on? Best Regards, Carlos Last fiddled with by em99010pepe on 2007-04-08 at 14:09
2007-04-10, 04:26   #2
geoff

Mar 2003
New Zealand

13·89 Posts

Quote:
 Originally Posted by em99010pepe [FONT=monospace]How do I feed the client with more ranges without stopping it? I like the srwork.txt thing you have on the other clients.
I haven't added that feature yet. Unlike srsieve, It should usually be a little faster to start a new sieve job using the latest sieve file than continuing to use the old file anyway. On Linux you can use the batch command to queue up jobs.

Quote:
 How do I turn sse2 on?
For the linux-x86_64 binary it is always on. For the 32-bit binaries it should be detected automatically: Running with the --verbose option will print "Using SSE2 code path" if it is being used. You can force it on with the --sse2 switch, or force it off with the --no-sse2 switch. If it is not being detected automatically then that is a bug.

I am not sure how much advantage machines other than Pentium 4 will get from the SSE2, it may be worthwhile trying both --sse2 and --no-sse2 to see which is faster.

 2007-04-10, 20:43 #3 em99010pepe     Sep 2004 2·5·283 Posts Thanks Geoff. Carlos
 2007-04-17, 17:20 #4 em99010pepe     Sep 2004 2·5·283 Posts Geoff, I want to bring more machines (adding at least 4 cores) to sieve but the client needs to be run in: 1º hidden mode 2º have the ability to queue up jobs 3º save it's progress on an output file The way the client is started, with that list of flags, it's very difficult to hide when using HideItX. I prefer the srsieve feature.... What's again your opinion? Could you implement the srsieve feature into gcwsieve? By the way, gcwsieve 1.0.4 is faster than gcwsieve 1.0.6 on my AMD 64 but I still need to make more tests. Carlos Last fiddled with by em99010pepe on 2007-04-17 at 17:21
2007-04-18, 02:45   #5
Citrix

Jun 2003

63316 Posts

Quote:
 Originally Posted by em99010pepe The way the client is started, with that list of flags, it's very difficult to hide when using HideItX. I prefer the srsieve feature.... Carlos
It is possible. Create a cmd prompt (.bat file) and start the bat file with hideitX.

2007-04-18, 05:21   #6
geoff

Mar 2003
New Zealand

13·89 Posts

Quote:
 Originally Posted by em99010pepe 1º hidden mode
If you create a file called gcwsieve-command-line.txt' with one line consisting of the command line you want to use, then run gcwsieve without any command line arguments, it will read that file run as if you had typed the first line at the console. Does this help?

Quote:
 2º have the ability to queue up jobs
I will add a work file facility sometime soon.

Quote:
 3º save it's progress on an output file
You can do this by creating a copy of the input file, call it sieve.txt say, change the number on the first line to <pmin>, then run gcwsieve -i sieve.txt -o sieve.txt -f factors.txt -s<minutes_between_saves> -P<pmax>...'. If you stop and restart with exactly the same command line each time, it will continue from the last save point.

Quote:
 By the way, gcwsieve 1.0.4 is faster than gcwsieve 1.0.6 on my AMD 64 but I still need to make more tests.
This is interesting :-) There are two changes between 1.0.4 and 1.0.6 that might affect peformance:

The 1.0.6 SSE2 code processes 4 terms at a time instead of 2 at a time in 1.0.4. The idea behind this code was to overcome the long Pentium 4 pipeline and it does run a lot faster on my P4, but it is possible that this code is slower on the AMD64 -- it would be interesting if that is the case. You could test this by running both versions with the --no-sse2 switch and see if 1.0.6 is still slower.

In 1.0.6 the AMD cache size detection has (hopefully) been fixed. You can run using the old settings from version 1.0.4 by adding the command line switches -l16 -L256. If you are running multiple instances of the program on a n-core machine then it might be faster to set the L2 cache size to 1/n of its actual size. The L1 cache size probably doesn't have much effect.

Let me know what the results of your experiments are.

 2007-04-18, 07:43 #7 em99010pepe     Sep 2004 2×5×283 Posts Geoff, The 1.0.6 SSE2 code increased by 2x the output of my new work machine, a dual core P4 3.0GHz. Went from 107 kp/s to 222 kp/s...thank you. I noticed that on my AMD the highest performance was achieved by 1.0.4 SSE2 code with caches -l16 -L256. Playing with the flag cache on 1.0.6 code I never achieved the performance of the 1.0.4 SSE2 code. Later today I will check the ability to hide the client. Cheers, Carlos
 2007-04-18, 22:16 #8 Citrix     Jun 2003 3×232 Posts Geoff, so is gcwsieve.exe now faster than multisieve?
2007-04-19, 04:17   #9
geoff

Mar 2003
New Zealand

48516 Posts

Quote:
 Originally Posted by Citrix Geoff, so is gcwsieve.exe now faster than multisieve?
For sieving this project with a Pentium 4 it looks like it is :-) Actually, I am sieving with two threads on my 2.9GHz hyperthreaded P4 and getting 475 kp/s total, which is a lot faster per clock than my 800MHz P3 which gets 82 kp/s on the current sieve file.

Unfortunately from Carlos's results it looks like the new SSE2 code is only suited to the P4. Since the only Windows machines I have access to are P4 and Celeron D, I can't really compare to Multisieve for other machines.

If you think there is some way the SSE2 code from gcwsieve can be used in MultiSieve, let me know. I may be able to put it into the form of an external function that could be linked by MSC.

Carlos: In a future version I think I'll use the old SSE2 code in the gcwsieve-amd build and only use the new code in the gcwsieve-intel build.

2007-04-19, 04:28   #10
Citrix

Jun 2003

3×232 Posts

Quote:
 Originally Posted by geoff If you think there is some way the SSE2 code from gcwsieve can be used in MultiSieve, let me know. I may be able to put it into the form of an external function that could be linked by MSC.

I think this will be useful, I can give it a try. Do you have any fast assembly routines for modular 64 bit addition too? (There are none built into multisieve.) I am working on a new algorithm for this project, where I can replace the multiplications with additions.

Btw, for the 3^16 search, your program says it can calculate 6 million primes/sec. But it takes about 100 sec to do 1 billion. So it is looking at approx 600 million primes per billion. But there are not 600 million primes in a billion, as 1/2 the numbers are even. So where the error in the calculations?

2007-04-19, 05:32   #11
geoff

Mar 2003
New Zealand

48516 Posts

Quote:
 Originally Posted by Citrix I think this will be useful, I can give it a try. Do you have any fast assembly routines for modular 64 bit addition too? (There are none built into multisieve.) I am working on a new algorithm for this project, where I can replace the multiplications with additions.
I don't know if there is anything fundamentally faster than this. Assuming 0 <= a,b < p < 2^63:

x = a+b;
if (x >= p) x -= p;

Two can be done in parallel with SSE2 and the `if' can use a conditional move instead of a branch. I think you will need to be adding two large arrays together, or adding a constant to each element of a large array before hand assembler will be much faster than what the C compiler will generate.

Quote:
 Btw, for the 3^16 search, your program says it can calculate 6 million primes/sec. But it takes about 100 sec to do 1 billion. So it is looking at approx 600 million primes per billion. But there are not 600 million primes in a billion, as 1/2 the numbers are even. So where the error in the calculations?
The p/sec figure is just the increase in p per second, not the number of primes calculated. I know it is not good notation :-(

The reason testing 3^16*2^(16*n)+1 is so fast is that most of the primes don't even have to be generated, the Sieve of Eratosthenes sieves numbers of the form 32*x+1, it doesn't even look at the others.

All times are UTC. The time now is 13:47.

Wed May 25 13:47:22 UTC 2022 up 41 days, 11:48, 0 users, load averages: 1.52, 1.31, 1.24