mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Operation Kibibit

Reply
 
Thread Tools
Old 2012-11-17, 14:12   #1
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

DA116 Posts
Default Call for volunteers: RSA896

As of SVN823 the Msieve code is almost stable enough to consider a release of v1.51 (the only thing left is deciding what to do with the windows build projects, which will need a lot of fixing after the recent GPU overhaul). A win32 binary is available for your perusal.

As a larger-scale test of the code, I propose we run polynomial selection on RSA896. Paul Zimmermann reports his group still has a lot of work left with this task, and using a GPU is much more effective on this size problem than using a CPU. So if you have a graphics card and at least version 4.2 of the Nvidia drivers, consider running a range of poly selection stage 1 for RSA896 with me (I've been at it for 2 weeks now).

More specifically, write the following in a file worktodo.ini:
Code:
412023436986659543855531365332575948179811699844327982845455626433876445565248426198098870423161841879261420247188869492560931776375033421130982397485150944909106910269861031862704114880866970564902903653658867433731720813104105190864254793282601391257624033946373269391
and then run

msieve -v -np1 "stage1_norm=5e33 X,Y"

where X and Y is a range of numbers with magnitude 10^12 to 10^15. You should pick a fairly large range since the code will skip over many leading coefficients that are not smooth enough, and multiple machines (say up to 5) can search the same range with a high probability of not duplicating any work. With a GTS450 card (which is pretty low-end) I get about 80000 hits per week, which is about 2MB zipped. So you will periodically want to interrupt a run in progress, restart with X set to start above the last coefficient searched, and msieve.dat.m moved to another filename.

The CPU utilization will be very low while a run is in progress, but interactivity in my winXP GUI is noticeably worse, which will be annoying. The binary can be run multithreaded but it does not make a difference with such large problems.

Please post here if you have problems or questions. I've only had time to test the GPU-specific engine for Fermi cards, custom engines exist for older cards but I have no idea if they work.

Some additional pointers:
First, X and Y have to be in decimal, not in scientific notation like older versions used to allow.

Also, if building from SVN make sure to use the trunk and not the msieve_faststage1 branch; that branch is now old.

Last fiddled with by jasonp on 2012-12-12 at 12:32 Reason: updated norm bound
jasonp is offline   Reply With Quote
Old 2012-11-17, 17:35   #2
pinhodecarlos
 
pinhodecarlos's Avatar
 
"Carlos Pinho"
Oct 2011
Milton Keynes, UK

473610 Posts
Default

So you don't recommend using only CPU.
pinhodecarlos is offline   Reply With Quote
Old 2012-11-17, 18:44   #3
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3·1,163 Posts
Default

In my tests, the GPU code is perhaps 100x faster than the CPU code. The CPU code is not multithread-aware but could benefit from multithreading. Actually the CPU code needs to be able to use a thread pool and high-performance radix sorting just like the GPU code does now, after which the CPU and GPU code could probably be mostly unified together. But we're not there yet, and could probably buy a 10x speedup in the CPU code by doing so.

Given a choice between a CPU run and nothing, then obviously a CPU run would be better. But currently the situation with stage 1 polynomial selection is exactly analogous to Mersenne trial factoring, vis-a-vis the GPU speedup.
jasonp is offline   Reply With Quote
Old 2012-11-17, 18:49   #4
pinhodecarlos
 
pinhodecarlos's Avatar
 
"Carlos Pinho"
Oct 2011
Milton Keynes, UK

473610 Posts
Default

Quote:
Originally Posted by jasonp View Post
In my tests, the GPU code is perhaps 100x faster than the CPU code.
How many cores has your CPU?
pinhodecarlos is offline   Reply With Quote
Old 2012-11-17, 19:22   #5
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

348910 Posts
Default

This is 1 core of a Core2 from 2007 vs a low-end Fermi card from 2010. The GPU finds about one hit every 5-10 seconds, and now I see that the CPU code found two hits after 24 minutes. Neither codebase is optimal, neither set of hardware is very fast in absolute terms and the search itself can find many times more hits if the cutoff for hit quality is loosened. The CPU code would certainly speed up linearly if you threw more cores at it. But currently GPUs have a huge performance advantage when running this algorithm.
jasonp is offline   Reply With Quote
Old 2012-11-17, 20:35   #6
debrouxl
 
debrouxl's Avatar
 
Sep 2009

977 Posts
Default

BOINC wrapper integration would enable us to explode the CPU+GPU power working on polynomial selection for RSA-896. I should dig again into my mailbox for the discussions about that, though I don't have time to help with it right now...
debrouxl is offline   Reply With Quote
Old 2012-11-19, 20:08   #7
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

32×227 Posts
Default

Quote:
Originally Posted by jasonp View Post
With a GTS450 card (which is pretty low-end) I get about 80000 hits per week, which is about 2MB zipped.
I started at 10^13. On the GTX 480, I got 229 hits in 8 minutes, which extrapolates to a bit under 300000 per week. How many are you wanting to collect?
frmky is offline   Reply With Quote
Old 2012-11-19, 20:54   #8
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

160358 Posts
Default

Quote:
Originally Posted by frmky View Post
I started at 10^13. On the GTX 480, I got 229 hits in 8 minutes, which extrapolates to a bit under 300000 per week. How many are you wanting to collect?
Sheesh, that's a lot. Perhaps we should lower the norm to (e.g.) 5e32.

How will -nps and -npr be run? Should we run the former our selves, as had previously been suggested for BOINCification?

frmky, how fast are you traversing leading-coeff ranges?

I plan to participate, once a) I'm back at school and b) I get my recently RMAd 460.
Dubslow is offline   Reply With Quote
Old 2012-11-20, 01:25   #9
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

1101101000012 Posts
Default

I've computed about 65000 hits so far, and Paul reports computing 500k hits with CADO-NFS. I'm currently covering the 1e13 range as well, so we will probably have a little duplication. The code selects very few leading coefficients in this range, after two weeks I'm up to around (1e13+1e9).

I'm also being lazy, in that Paul's group can turn around stage 2 pretty quickly so I'm not bothering with it. A BOINC project would definitely need to do its own size optimization, since I'd expect thousands of good size-optimized polynomials out of tens of millions of stage 1 hits.

The upper limit is a good question; we would want millions of hits at least.
jasonp is offline   Reply With Quote
Old 2012-11-20, 03:13   #10
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

111111110112 Posts
Default

It's a big project, but if we only want millions of hits I'm not sure it's big or long term enough for BOINC. With two GPU's I will be getting over a half million per week.

Starting at 10^13, the leading coefficient has advanced by 21 million in 6.5 hours, with a bit over 10000 hits. Extrapolating, if we want to cover the entire range, this will take 35000 GTX480-years and yield a half-trillion hits totally 12 terabytes. To put this in perspective, NFS@Home uses about 1.5 CPU-years each day.
frmky is offline   Reply With Quote
Old 2012-11-20, 03:16   #11
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·2,399 Posts
Default

Quote:
Originally Posted by jasonp View Post
I've computed about 65000 hits so far, and Paul reports computing 500k hits with CADO-NFS. I'm currently covering the 1e13 range as well, so we will probably have a little duplication. The code selects very few leading coefficients in this range, after two weeks I'm up to around (1e13+1e9).

I'm also being lazy, in that Paul's group can turn around stage 2 pretty quickly so I'm not bothering with it. A BOINC project would definitely need to do its own size optimization, since I'd expect thousands of good size-optimized polynomials out of tens of millions of stage 1 hits.

The upper limit is a good question; we would want millions of hits at least.
Does CADO support CUDA, or is yours the only such code?

How does lowering the norm affect how many hits you need? How likely would you be to miss an unusually-great poly with too large a norm?

It seems to me that running our own -nps would, at the very least, cut down on how much data we need to transfer. In either case, do we just attach results here and Paul will download them or something?
Dubslow is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Volunteers needed axn GPU Computing 28 2012-05-28 12:05
call for volunteers: RSA768 polynomial selection jasonp Operation Kibibit 200 2011-11-05 21:31
Call for help Wacky NFSNET Discussion 13 2005-07-14 00:25
Volunteers needed! Xyzzy Hardware 23 2003-04-18 23:27
We need two volunteers... Xyzzy PrimeNet 8 2003-02-27 02:26

All times are UTC. The time now is 01:22.

Tue Oct 27 01:22:50 UTC 2020 up 46 days, 22:33, 0 users, load averages: 1.99, 1.95, 1.79

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.