mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > Twin Prime Search

Reply
 
Thread Tools
Old 2009-08-11, 14:47   #221
pschoefer
 
pschoefer's Avatar
 
Jan 2007
.de

2·32 Posts
Default

Quote:
Originally Posted by Ken_g6 View Post
I was wondering when someone would ask about Windows. I'll look at making a 32-bit Windows .exe tomorrow. But I don't have a 64-bit Windows. Is there a gcc for 32-bit Windows or 32 or 64-bit Linux that will compile a 64-bit Windows binary? Or does someone want to compile it for me?
There's MinGW-w64 for Windows and Linux. I have it running on a Win64 host, so I could try to compile it, too.
pschoefer is offline   Reply With Quote
Old 2009-08-11, 16:19   #222
biwema
 
biwema's Avatar
 
Mar 2004

1011111012 Posts
Default

I see you intend to sieve 20000 n parallel, but only upt to 10000000. (a total range of 200G). According to this thred this would take 20000 times as long as one individual n.

Did you (or the sieving tool) taks this property into consideration:

Group for example 10 .. 20 n together by moving the exponent to the mantissa. For example:

mantissa exponent
1000001 480000
1000001 480001
1000001 480002
...
1000001 480015
1000001 480016

convert to:
1000001 480000
2000002 480000
4000004 480000
8000008 480000
...
32768032768 480000
65536065536 480000


In that case you need to sieve up to 20 times less N, depending how large the mantissa can be.
So the sieving is onlz 1000 times slower instead of 20000 times.
Sieving would neither take longer nor take more memory in case you use hashtables (recommended anyway because of the large ranges)

Of course, for LLR you can convert back the candidtes that you can still benefit of small mantissa.
biwema is offline   Reply With Quote
Old 2009-08-11, 16:35   #223
kar_bon
 
kar_bon's Avatar
 
Mar 2006
Germany

2×1,433 Posts
Default

Quote:
Originally Posted by biwema View Post
Of course, for LLR you can convert back the candidtes that you can still benefit of small mantissa.
no need for that, LLR will reduce automatically from 65536065536*2^480000-1 to 1000001*2^480016-1!
Attached Thumbnails
Click image for larger version

Name:	LLR.jpg
Views:	79
Size:	24.1 KB
ID:	3971  

Last fiddled with by kar_bon on 2009-08-11 at 16:40
kar_bon is offline   Reply With Quote
Old 2009-08-11, 20:43   #224
Ken_g6
 
Ken_g6's Avatar
 
Jan 2005
Caught in a sieve

5×79 Posts
Default

OK, it's time for the good, the bad, and the ugly!

First, the slightly bad news: I haven't figured out how to compile a 64-bit Windows version. Anyone's welcome to try; but I don't know if Geoff would prefer to do it himself?

Second, the really good news, for 32-bit users on processors with SSE2. I made an SSE2-enabled 32-bit app that's about 75% as fast as the 64-bit app, for many-N sieves. That's about three times as fast as the regular 32-bit app! (I haven't checked which is faster for single-N sieves; but there shouldn't be much difference.) My zipfile now includes both 32-bit versions for both Windows and Linux, and a 64-bit Linux version.

Finally, the ugly - which is what coding biwema's suggestion would be. Right now, checking the second N is about 60-80 times as fast as the first N, because all one has to do is divide the result for the first N by 2 (mod P). This takes two short lines of code.

To do what biwema suggests would first require checking whether even K's between kmax and, say, 65536*kmax, have enough 0's on the right that shifting right would put the K between kmin and kmax. This is unlikely, so it would probably require counting the 0's individually. This is pretty much as slow as doing the above divide, unless there's a processor-specific instruction to count right 0's (there may be in SSE4 or something.)

Even assuming you get that working quickly, you still have to divide by 2^16 (mod P). Of course the way to do this is to multiply by 2^-16 (mod P). I'm sure this would never be faster than the new SSE2 code in 32 bits, and it's not all that fast even with 64 bits. - Edit: strike that, that's a multiply, not an exponentiation. It's still slow, but not that slow. Still, I'm not planning to try this soon.

But if anyone's willing to try it, please, be my guest.

Last fiddled with by Ken_g6 on 2009-08-11 at 21:14
Ken_g6 is offline   Reply With Quote
Old 2009-08-11, 21:00   #225
axn
 
axn's Avatar
 
Jun 2003

3·5·17·19 Posts
Default

Quote:
Originally Posted by Ken_g6 View Post
Second, the really good news, for 32-bit users on processors with SSE2. I made an SSE2-enabled 32-bit app that's about 75% as fast as the 64-bit app, for many-N sieves. That's about three times as fast as the regular 32-bit app! (I haven't checked which is faster for single-N sieves; but there shouldn't be much difference.) My zipfile now includes both 32-bit versions for both Windows and Linux, and a 64-bit Linux version.
Yay I am testing this one as soon as I get back from work.

Quote:
Originally Posted by Ken_g6 View Post
Finally, the ugly - which is what coding biwema's suggestion would be.
Biwema assumed that sieving 20000n in parallel is 20000x slower. But we know that that is not true. So the evaluation needs to be adjusted accordingly.
axn is offline   Reply With Quote
Old 2009-08-11, 21:27   #226
pschoefer
 
pschoefer's Avatar
 
Jan 2007
.de

2·32 Posts
Default

Quote:
Originally Posted by Ken_g6 View Post
First, the slightly bad news: I haven't figured out how to compile a 64-bit Windows version. Anyone's welcome to try; but I don't know if Geoff would prefer to do it himself?
I think I got it. It's running very fast (more than twice as fast as the 0.2.3 I used before) and finding the same factors, so I think it's OK, although I got those warnings:
Code:
gcc -Wall -O3 -DNDEBUG -D_REENTRANT -m64 -I. -I.. -s -o tpsieve-x86_64-windows.exe ..\main.c ..\sieve.c ..\clock.c ..\util.c app.c -lm -lpthread
..\main.c: In function 'thread_fun':
..\main.c:529: warning: cast from pointer to integer of different size
..\main.c: In function 'thread_fun_wrapper':
..\main.c:653: warning: cast from pointer to integer of different size
..\main.c: In function 'main':
..\main.c:755: warning: cast to pointer from integer of different size
It's uploaded here: tpsieve-x86_64-windows.exe
pschoefer is offline   Reply With Quote
Old 2009-08-11, 21:35   #227
Skligmund
 
Skligmund's Avatar
 
Dec 2006
Anchorage, Alaska

4E16 Posts
Default

I'm not savvy with the tpseive as of yet (never even used it), but which is quickest? I would assume linux x64 is faster than win32?
Skligmund is offline   Reply With Quote
Old 2009-08-11, 22:07   #228
Mini-Geek
Account Deleted
 
Mini-Geek's Avatar
 
"Tim Sorbera"
Aug 2006
San Antonio, TX USA

17×251 Posts
Default

Quote:
Originally Posted by Ken_g6 View Post
Second, the really good news, for 32-bit users on processors with SSE2. I made an SSE2-enabled 32-bit app that's about 75% as fast as the 64-bit app, for many-N sieves. That's about three times as fast as the regular 32-bit app! (I haven't checked which is faster for single-N sieves; but there shouldn't be much difference.) My zipfile now includes both 32-bit versions for both Windows and Linux, and a 64-bit Linux version.


So the 32-bit version was only 25% the speed of the 64-bit version? (from: "75% as fast as the 64-bit app" and "three times as fast as the regular 32-bit app") I thought 32-bit versions of math programs, in general, were at worst 50% the speed of the 64-bit versions?
Mini-Geek is offline   Reply With Quote
Old 2009-08-11, 23:28   #229
Ken_g6
 
Ken_g6's Avatar
 
Jan 2005
Caught in a sieve

18B16 Posts
Default

pschoefer: I get those warnings every time too, compiling for 64-bit Linux. It doesn't seem to be a problem. You can expect to find the same factors, but occasionally they'll be in a different order.

Skligmund: Linux vs. Windows shouldn't matter. (In theory, that is; no guarantees.) x64 > SSE2 > 32 bit.

Quote:
Originally Posted by Mini-Geek View Post
So the 32-bit version was only 25% the speed of the 64-bit version? (from: "75% as fast as the 64-bit app" and "three times as fast as the regular 32-bit app") I thought 32-bit versions of math programs, in general, were at worst 50% the speed of the 64-bit versions?
That's the way it is. I don't think there's really a hard and fast rule. Maybe that only applies to 32-bit apps with SSE2?
Ken_g6 is offline   Reply With Quote
Old 2009-08-12, 00:41   #230
axn
 
axn's Avatar
 
Jun 2003

3·5·17·19 Posts
Default

Quote:
Originally Posted by axn View Post
Yay I am testing this one as soon as I get back from work.
Something's wrong :surprised I'm getting 7 times the speed from the 0.2.3 client!!! (190k to 1.3M)
axn is offline   Reply With Quote
Old 2009-08-12, 01:02   #231
Mini-Geek
Account Deleted
 
Mini-Geek's Avatar
 
"Tim Sorbera"
Aug 2006
San Antonio, TX USA

17·251 Posts
Default

Quote:
Originally Posted by axn View Post
Something's wrong :surprised I'm getting 7 times the speed from the 0.2.3 client!!! (190k to 1.3M)
I'm getting an extreme speedup too: 2.481 MPsec up from something like 300 KPsec (this is both cores of my Athlon), which is around 8 times faster! However, this isn't quite comparable, since I was sieving 10G-15G, and am now sieving 55G-80G. (and if it matters, n=485k-490k now instead of 480k-485k) Anyone know what sort of speed up I should expect based on that alone and how much is based on the SSE2 speed up? I'm certainly not complaining, just wanted to check that nothing's going wrong.
Mini-Geek is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
S9 and general sieving discussion Lennart Conjectures 'R Us 31 2014-09-14 15:14
Sieving discussion thread philmoore Five or Bust - The Dual Sierpinski Problem 66 2010-02-10 14:34
Combined sieving discussion ltd Prime Sierpinski Project 76 2008-07-25 11:44
Sieving Discussion ltd Prime Sierpinski Project 26 2005-11-01 07:45
Sieving Discussion R.D. Silverman Factoring 7 2005-09-30 12:57

All times are UTC. The time now is 22:55.

Sat Jan 23 22:55:23 UTC 2021 up 51 days, 19:06, 1 user, load averages: 1.49, 1.60, 1.64

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.