![]() |
![]() |
#1 |
Sep 2004
2×5×283 Posts |
![]()
Geoff,
How do I feed the client with more ranges without stopping it? I like the srwork.txt thing you have on the other clients. How do I turn sse2 on? Best Regards, Carlos Last fiddled with by em99010pepe on 2007-04-08 at 14:09 |
![]() |
![]() |
#2 | ||
Mar 2003
New Zealand
13×89 Posts |
![]() Quote:
Quote:
I am not sure how much advantage machines other than Pentium 4 will get from the SSE2, it may be worthwhile trying both --sse2 and --no-sse2 to see which is faster. |
||
![]() |
![]() |
#3 |
Sep 2004
2×5×283 Posts |
![]()
Thanks Geoff.
Carlos |
![]() |
![]() |
#4 |
Sep 2004
B0E16 Posts |
![]()
Geoff,
I want to bring more machines (adding at least 4 cores) to sieve but the client needs to be run in: 1º hidden mode 2º have the ability to queue up jobs 3º save it's progress on an output file The way the client is started, with that list of flags, it's very difficult to hide when using HideItX. I prefer the srsieve feature.... What's again your opinion? Could you implement the srsieve feature into gcwsieve? By the way, gcwsieve 1.0.4 is faster than gcwsieve 1.0.6 on my AMD 64 but I still need to make more tests. Carlos Last fiddled with by em99010pepe on 2007-04-17 at 17:21 |
![]() |
![]() |
#5 |
Jun 2003
7×233 Posts |
![]() |
![]() |
![]() |
#6 | |||
Mar 2003
New Zealand
13×89 Posts |
![]()
If you create a file called `gcwsieve-command-line.txt' with one line consisting of the command line you want to use, then run gcwsieve without any command line arguments, it will read that file run as if you had typed the first line at the console. Does this help?
Quote:
Quote:
Quote:
The 1.0.6 SSE2 code processes 4 terms at a time instead of 2 at a time in 1.0.4. The idea behind this code was to overcome the long Pentium 4 pipeline and it does run a lot faster on my P4, but it is possible that this code is slower on the AMD64 -- it would be interesting if that is the case. You could test this by running both versions with the --no-sse2 switch and see if 1.0.6 is still slower. In 1.0.6 the AMD cache size detection has (hopefully) been fixed. You can run using the old settings from version 1.0.4 by adding the command line switches -l16 -L256. If you are running multiple instances of the program on a n-core machine then it might be faster to set the L2 cache size to 1/n of its actual size. The L1 cache size probably doesn't have much effect. Let me know what the results of your experiments are. |
|||
![]() |
![]() |
#7 |
Sep 2004
B0E16 Posts |
![]()
Geoff,
The 1.0.6 SSE2 code increased by 2x the output of my new work machine, a dual core P4 3.0GHz. Went from 107 kp/s to 222 kp/s...thank you. I noticed that on my AMD the highest performance was achieved by 1.0.4 SSE2 code with caches -l16 -L256. Playing with the flag cache on 1.0.6 code I never achieved the performance of the 1.0.4 SSE2 code. Later today I will check the ability to hide the client. Cheers, Carlos |
![]() |
![]() |
#8 |
Jun 2003
163110 Posts |
![]()
Geoff, so is gcwsieve.exe now faster than multisieve?
![]() |
![]() |
![]() |
#9 |
Mar 2003
New Zealand
13×89 Posts |
![]()
For sieving this project with a Pentium 4 it looks like it is :-) Actually, I am sieving with two threads on my 2.9GHz hyperthreaded P4 and getting 475 kp/s total, which is a lot faster per clock than my 800MHz P3 which gets 82 kp/s on the current sieve file.
Unfortunately from Carlos's results it looks like the new SSE2 code is only suited to the P4. Since the only Windows machines I have access to are P4 and Celeron D, I can't really compare to Multisieve for other machines. If you think there is some way the SSE2 code from gcwsieve can be used in MultiSieve, let me know. I may be able to put it into the form of an external function that could be linked by MSC. Carlos: In a future version I think I'll use the old SSE2 code in the gcwsieve-amd build and only use the new code in the gcwsieve-intel build. |
![]() |
![]() |
#10 | |
Jun 2003
7×233 Posts |
![]() Quote:
I think this will be useful, I can give it a try. Do you have any fast assembly routines for modular 64 bit addition too? (There are none built into multisieve.) I am working on a new algorithm for this project, where I can replace the multiplications with additions. Btw, for the 3^16 search, your program says it can calculate 6 million primes/sec. But it takes about 100 sec to do 1 billion. So it is looking at approx 600 million primes per billion. But there are not 600 million primes in a billion, as 1/2 the numbers are even. So where the error in the calculations? |
|
![]() |
![]() |
#11 | ||
Mar 2003
New Zealand
48516 Posts |
![]() Quote:
x = a+b; if (x >= p) x -= p; Two can be done in parallel with SSE2 and the `if' can use a conditional move instead of a branch. I think you will need to be adding two large arrays together, or adding a constant to each element of a large array before hand assembler will be much faster than what the C compiler will generate. Quote:
The reason testing 3^16*2^(16*n)+1 is so fast is that most of the primes don't even have to be generated, the Sieve of Eratosthenes sieves numbers of the form 32*x+1, it doesn't even look at the others. |
||
![]() |