mersenneforum.org Call for volunteers: RSA896
 Register FAQ Search Today's Posts Mark Forums Read

 2012-11-17, 14:12 #1 jasonp Tribal Bullet     Oct 2004 2·3·19·31 Posts Call for volunteers: RSA896 As of SVN823 the Msieve code is almost stable enough to consider a release of v1.51 (the only thing left is deciding what to do with the windows build projects, which will need a lot of fixing after the recent GPU overhaul). A win32 binary is available for your perusal. As a larger-scale test of the code, I propose we run polynomial selection on RSA896. Paul Zimmermann reports his group still has a lot of work left with this task, and using a GPU is much more effective on this size problem than using a CPU. So if you have a graphics card and at least version 4.2 of the Nvidia drivers, consider running a range of poly selection stage 1 for RSA896 with me (I've been at it for 2 weeks now). More specifically, write the following in a file worktodo.ini: Code: 412023436986659543855531365332575948179811699844327982845455626433876445565248426198098870423161841879261420247188869492560931776375033421130982397485150944909106910269861031862704114880866970564902903653658867433731720813104105190864254793282601391257624033946373269391 and then run msieve -v -np1 "stage1_norm=5e33 X,Y" where X and Y is a range of numbers with magnitude 10^12 to 10^15. You should pick a fairly large range since the code will skip over many leading coefficients that are not smooth enough, and multiple machines (say up to 5) can search the same range with a high probability of not duplicating any work. With a GTS450 card (which is pretty low-end) I get about 80000 hits per week, which is about 2MB zipped. So you will periodically want to interrupt a run in progress, restart with X set to start above the last coefficient searched, and msieve.dat.m moved to another filename. The CPU utilization will be very low while a run is in progress, but interactivity in my winXP GUI is noticeably worse, which will be annoying. The binary can be run multithreaded but it does not make a difference with such large problems. Please post here if you have problems or questions. I've only had time to test the GPU-specific engine for Fermi cards, custom engines exist for older cards but I have no idea if they work. Some additional pointers: First, X and Y have to be in decimal, not in scientific notation like older versions used to allow. Also, if building from SVN make sure to use the trunk and not the msieve_faststage1 branch; that branch is now old. Last fiddled with by jasonp on 2012-12-12 at 12:32 Reason: updated norm bound
 2012-11-17, 17:35 #2 pinhodecarlos     "Carlos Pinho" Oct 2011 Milton Keynes, UK 484610 Posts So you don't recommend using only CPU.
 2012-11-17, 18:44 #3 jasonp Tribal Bullet     Oct 2004 2×3×19×31 Posts In my tests, the GPU code is perhaps 100x faster than the CPU code. The CPU code is not multithread-aware but could benefit from multithreading. Actually the CPU code needs to be able to use a thread pool and high-performance radix sorting just like the GPU code does now, after which the CPU and GPU code could probably be mostly unified together. But we're not there yet, and could probably buy a 10x speedup in the CPU code by doing so. Given a choice between a CPU run and nothing, then obviously a CPU run would be better. But currently the situation with stage 1 polynomial selection is exactly analogous to Mersenne trial factoring, vis-a-vis the GPU speedup.
2012-11-17, 18:49   #4
pinhodecarlos

"Carlos Pinho"
Oct 2011
Milton Keynes, UK

2·2,423 Posts

Quote:
 Originally Posted by jasonp In my tests, the GPU code is perhaps 100x faster than the CPU code.
How many cores has your CPU?

 2012-11-17, 19:22 #5 jasonp Tribal Bullet     Oct 2004 2·3·19·31 Posts This is 1 core of a Core2 from 2007 vs a low-end Fermi card from 2010. The GPU finds about one hit every 5-10 seconds, and now I see that the CPU code found two hits after 24 minutes. Neither codebase is optimal, neither set of hardware is very fast in absolute terms and the search itself can find many times more hits if the cutoff for hit quality is loosened. The CPU code would certainly speed up linearly if you threw more cores at it. But currently GPUs have a huge performance advantage when running this algorithm.
 2012-11-17, 20:35 #6 debrouxl     Sep 2009 977 Posts BOINC wrapper integration would enable us to explode the CPU+GPU power working on polynomial selection for RSA-896. I should dig again into my mailbox for the discussions about that, though I don't have time to help with it right now...
2012-11-19, 20:08   #7
frmky

Jul 2003
So Cal

80616 Posts

Quote:
 Originally Posted by jasonp With a GTS450 card (which is pretty low-end) I get about 80000 hits per week, which is about 2MB zipped.
I started at 10^13. On the GTX 480, I got 229 hits in 8 minutes, which extrapolates to a bit under 300000 per week. How many are you wanting to collect?

2012-11-19, 20:54   #8
Dubslow

"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3×29×83 Posts

Quote:
 Originally Posted by frmky I started at 10^13. On the GTX 480, I got 229 hits in 8 minutes, which extrapolates to a bit under 300000 per week. How many are you wanting to collect?
Sheesh, that's a lot. Perhaps we should lower the norm to (e.g.) 5e32.

How will -nps and -npr be run? Should we run the former our selves, as had previously been suggested for BOINCification?

frmky, how fast are you traversing leading-coeff ranges?

I plan to participate, once a) I'm back at school and b) I get my recently RMAd 460.

 2012-11-20, 01:25 #9 jasonp Tribal Bullet     Oct 2004 2·3·19·31 Posts I've computed about 65000 hits so far, and Paul reports computing 500k hits with CADO-NFS. I'm currently covering the 1e13 range as well, so we will probably have a little duplication. The code selects very few leading coefficients in this range, after two weeks I'm up to around (1e13+1e9). I'm also being lazy, in that Paul's group can turn around stage 2 pretty quickly so I'm not bothering with it. A BOINC project would definitely need to do its own size optimization, since I'd expect thousands of good size-optimized polynomials out of tens of millions of stage 1 hits. The upper limit is a good question; we would want millions of hits at least.
 2012-11-20, 03:13 #10 frmky     Jul 2003 So Cal 2·13·79 Posts It's a big project, but if we only want millions of hits I'm not sure it's big or long term enough for BOINC. With two GPU's I will be getting over a half million per week. Starting at 10^13, the leading coefficient has advanced by 21 million in 6.5 hours, with a bit over 10000 hits. Extrapolating, if we want to cover the entire range, this will take 35000 GTX480-years and yield a half-trillion hits totally 12 terabytes. To put this in perspective, NFS@Home uses about 1.5 CPU-years each day.
2012-11-20, 03:16   #11
Dubslow

"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

722110 Posts

Quote:
 Originally Posted by jasonp I've computed about 65000 hits so far, and Paul reports computing 500k hits with CADO-NFS. I'm currently covering the 1e13 range as well, so we will probably have a little duplication. The code selects very few leading coefficients in this range, after two weeks I'm up to around (1e13+1e9). I'm also being lazy, in that Paul's group can turn around stage 2 pretty quickly so I'm not bothering with it. A BOINC project would definitely need to do its own size optimization, since I'd expect thousands of good size-optimized polynomials out of tens of millions of stage 1 hits. The upper limit is a good question; we would want millions of hits at least.
Does CADO support CUDA, or is yours the only such code?

How does lowering the norm affect how many hits you need? How likely would you be to miss an unusually-great poly with too large a norm?

It seems to me that running our own -nps would, at the very least, cut down on how much data we need to transfer. In either case, do we just attach results here and Paul will download them or something?

 Similar Threads Thread Thread Starter Forum Replies Last Post axn GPU Computing 28 2012-05-28 12:05 jasonp Operation Kibibit 200 2011-11-05 21:31 Wacky NFSNET Discussion 13 2005-07-14 00:25 Xyzzy Hardware 23 2003-04-18 23:27 Xyzzy PrimeNet 8 2003-02-27 02:26

All times are UTC. The time now is 20:46.

Mon Jan 18 20:46:23 UTC 2021 up 46 days, 16:57, 0 users, load averages: 3.08, 2.91, 2.42