mersenneforum.org  

Go Back   mersenneforum.org > Other Stuff > Archived Projects > Prime Cullen Prime

 
 
Thread Tools
Old 2007-04-08, 14:03   #1
em99010pepe
 
em99010pepe's Avatar
 
Sep 2004

2×5×283 Posts
Default gcwsieve

Geoff,

How do I feed the client with more ranges without stopping it? I like the srwork.txt thing you have on the other clients.
How do I turn sse2 on?

Best Regards,

Carlos

Last fiddled with by em99010pepe on 2007-04-08 at 14:09
em99010pepe is offline  
Old 2007-04-10, 04:26   #2
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

13×89 Posts
Default

Quote:
Originally Posted by em99010pepe View Post
[FONT=monospace]How do I feed the client with more ranges without stopping it? I like the srwork.txt thing you have on the other clients.
I haven't added that feature yet. Unlike srsieve, It should usually be a little faster to start a new sieve job using the latest sieve file than continuing to use the old file anyway. On Linux you can use the batch command to queue up jobs.

Quote:
How do I turn sse2 on?
For the linux-x86_64 binary it is always on. For the 32-bit binaries it should be detected automatically: Running with the --verbose option will print "Using SSE2 code path" if it is being used. You can force it on with the --sse2 switch, or force it off with the --no-sse2 switch. If it is not being detected automatically then that is a bug.

I am not sure how much advantage machines other than Pentium 4 will get from the SSE2, it may be worthwhile trying both --sse2 and --no-sse2 to see which is faster.
geoff is offline  
Old 2007-04-10, 20:43   #3
em99010pepe
 
em99010pepe's Avatar
 
Sep 2004

2×5×283 Posts
Default

Thanks Geoff.

Carlos
em99010pepe is offline  
Old 2007-04-17, 17:20   #4
em99010pepe
 
em99010pepe's Avatar
 
Sep 2004

B0E16 Posts
Default

Geoff,

I want to bring more machines (adding at least 4 cores) to sieve but the client needs to be run in:

1º hidden mode
2º have the ability to queue up jobs
3º save it's progress on an output file

The way the client is started, with that list of flags, it's very difficult to hide when using HideItX. I prefer the srsieve feature....

What's again your opinion? Could you implement the srsieve feature into gcwsieve?
By the way, gcwsieve 1.0.4 is faster than gcwsieve 1.0.6 on my AMD 64 but I still need to make more tests.


Carlos

Last fiddled with by em99010pepe on 2007-04-17 at 17:21
em99010pepe is offline  
Old 2007-04-18, 02:45   #5
Citrix
 
Citrix's Avatar
 
Jun 2003

7×233 Posts
Default

Quote:
Originally Posted by em99010pepe View Post

The way the client is started, with that list of flags, it's very difficult to hide when using HideItX. I prefer the srsieve feature....


Carlos
It is possible. Create a cmd prompt (.bat file) and start the bat file with hideitX.
Citrix is offline  
Old 2007-04-18, 05:21   #6
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

13×89 Posts
Default

Quote:
Originally Posted by em99010pepe View Post
1º hidden mode
If you create a file called `gcwsieve-command-line.txt' with one line consisting of the command line you want to use, then run gcwsieve without any command line arguments, it will read that file run as if you had typed the first line at the console. Does this help?

Quote:
2º have the ability to queue up jobs
I will add a work file facility sometime soon.

Quote:
3º save it's progress on an output file
You can do this by creating a copy of the input file, call it sieve.txt say, change the number on the first line to <pmin>, then run `gcwsieve -i sieve.txt -o sieve.txt -f factors.txt -s<minutes_between_saves> -P<pmax>...'. If you stop and restart with exactly the same command line each time, it will continue from the last save point.

Quote:
By the way, gcwsieve 1.0.4 is faster than gcwsieve 1.0.6 on my AMD 64 but I still need to make more tests.
This is interesting :-) There are two changes between 1.0.4 and 1.0.6 that might affect peformance:

The 1.0.6 SSE2 code processes 4 terms at a time instead of 2 at a time in 1.0.4. The idea behind this code was to overcome the long Pentium 4 pipeline and it does run a lot faster on my P4, but it is possible that this code is slower on the AMD64 -- it would be interesting if that is the case. You could test this by running both versions with the --no-sse2 switch and see if 1.0.6 is still slower.

In 1.0.6 the AMD cache size detection has (hopefully) been fixed. You can run using the old settings from version 1.0.4 by adding the command line switches -l16 -L256. If you are running multiple instances of the program on a n-core machine then it might be faster to set the L2 cache size to 1/n of its actual size. The L1 cache size probably doesn't have much effect.

Let me know what the results of your experiments are.
geoff is offline  
Old 2007-04-18, 07:43   #7
em99010pepe
 
em99010pepe's Avatar
 
Sep 2004

B0E16 Posts
Default

Geoff,

The 1.0.6 SSE2 code increased by 2x the output of my new work machine, a dual core P4 3.0GHz. Went from 107 kp/s to 222 kp/s...thank you.
I noticed that on my AMD the highest performance was achieved by 1.0.4 SSE2 code with caches -l16 -L256. Playing with the flag cache on 1.0.6 code I never achieved the performance of the 1.0.4 SSE2 code.
Later today I will check the ability to hide the client.

Cheers,

Carlos
em99010pepe is offline  
Old 2007-04-18, 22:16   #8
Citrix
 
Citrix's Avatar
 
Jun 2003

163110 Posts
Default

Geoff, so is gcwsieve.exe now faster than multisieve?
Citrix is offline  
Old 2007-04-19, 04:17   #9
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

13×89 Posts
Default

Quote:
Originally Posted by Citrix View Post
Geoff, so is gcwsieve.exe now faster than multisieve?
For sieving this project with a Pentium 4 it looks like it is :-) Actually, I am sieving with two threads on my 2.9GHz hyperthreaded P4 and getting 475 kp/s total, which is a lot faster per clock than my 800MHz P3 which gets 82 kp/s on the current sieve file.

Unfortunately from Carlos's results it looks like the new SSE2 code is only suited to the P4. Since the only Windows machines I have access to are P4 and Celeron D, I can't really compare to Multisieve for other machines.

If you think there is some way the SSE2 code from gcwsieve can be used in MultiSieve, let me know. I may be able to put it into the form of an external function that could be linked by MSC.

Carlos: In a future version I think I'll use the old SSE2 code in the gcwsieve-amd build and only use the new code in the gcwsieve-intel build.
geoff is offline  
Old 2007-04-19, 04:28   #10
Citrix
 
Citrix's Avatar
 
Jun 2003

7×233 Posts
Default

Quote:
Originally Posted by geoff View Post
If you think there is some way the SSE2 code from gcwsieve can be used in MultiSieve, let me know. I may be able to put it into the form of an external function that could be linked by MSC.

I think this will be useful, I can give it a try. Do you have any fast assembly routines for modular 64 bit addition too? (There are none built into multisieve.) I am working on a new algorithm for this project, where I can replace the multiplications with additions.

Btw, for the 3^16 search, your program says it can calculate 6 million primes/sec. But it takes about 100 sec to do 1 billion. So it is looking at approx 600 million primes per billion. But there are not 600 million primes in a billion, as 1/2 the numbers are even. So where the error in the calculations?
Citrix is offline  
Old 2007-04-19, 05:32   #11
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

48516 Posts
Default

Quote:
Originally Posted by Citrix View Post
I think this will be useful, I can give it a try. Do you have any fast assembly routines for modular 64 bit addition too? (There are none built into multisieve.) I am working on a new algorithm for this project, where I can replace the multiplications with additions.
I don't know if there is anything fundamentally faster than this. Assuming 0 <= a,b < p < 2^63:

x = a+b;
if (x >= p) x -= p;

Two can be done in parallel with SSE2 and the `if' can use a conditional move instead of a branch. I think you will need to be adding two large arrays together, or adding a constant to each element of a large array before hand assembler will be much faster than what the C compiler will generate.

Quote:
Btw, for the 3^16 search, your program says it can calculate 6 million primes/sec. But it takes about 100 sec to do 1 billion. So it is looking at approx 600 million primes per billion. But there are not 600 million primes in a billion, as 1/2 the numbers are even. So where the error in the calculations?
The p/sec figure is just the increase in p per second, not the number of primes calculated. I know it is not good notation :-(

The reason testing 3^16*2^(16*n)+1 is so fast is that most of the primes don't even have to be generated, the Sieve of Eratosthenes sieves numbers of the form 32*x+1, it doesn't even look at the others.
geoff is offline  
 

Thread Tools


All times are UTC. The time now is 15:22.


Fri Sep 22 15:22:46 UTC 2023 up 9 days, 13:05, 1 user, load averages: 1.61, 1.50, 1.24

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔