![]() |
![]() |
#78 |
"Curtis"
Feb 2005
Riverside, CA
22·1,319 Posts |
![]()
I think we both would benefit from playing with settings and finding polys for a couple of smaller numbers. Perhaps some of the heavy hitters in this forum would like to supply you and I a couple of C155-180s to poly search? We can discuss settings, try to learn what stage1 bound produces the largest rate of useful nps hits per hour of gpu time, etc.
I have a c147 in my own queue; I had just done -np for 5 days for a poly when this thread began. I tried to apply what we learned last week, and restarted the search. 24 hours of -np1 with stage1 set to 2e21 (default is 2.38e22) produced roughly 700MB of hits with a 460M, which will take almost a core-week to size optimize. This makes me wonder how -np manages to do all 3 steps with just one CPU thread without massively stalling the GPU. So, my first tentative guideline is to set stage 1 norm 10x tighter than default when running -np1 on its own, and even then the cpu has no chance to keep up. Or does the -nps step work with the -t threads command? If I understand previous advice, I should not bother to npr more than, say, 500 best nps hits? -Curtis |
![]() |
![]() |
![]() |
#79 |
Tribal Bullet
Oct 2004
354510 Posts |
![]()
500 is a good upper limit. If you use -np, stage 2 is still given one extra thread so that the main thread can concentrate on keeping the GPU busy.
|
![]() |
![]() |
![]() |
#81 |
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
22·5·491 Posts |
![]()
It is hard to tell - the deg6 are almost as good and have a good skew.
I haven't trial-sieved... |
![]() |
![]() |
![]() |
#82 |
I moo ablest echo power!
May 2013
22×449 Posts |
![]()
Curtis, I've taken to doing -np1 -nps, which still screams on the GPU. I let it run for an arbitrary amount of time, stop it, do the sort/top X step, and start an -npr step on a 2nd window. Then I restart the -np1 -nps with a new min_coeff. It is also possible to run -np1 -nps and have -npr running in a separate process at the same time. But then you have what is essentially wasted time since you're not looking at just the top results.
I'm still refining the searching, but this process lets me see if I'm getting any better results, and I can search in chunks of time as I see fit. |
![]() |
![]() |
![]() |
#83 |
"Curtis"
Feb 2005
Riverside, CA
22·1,319 Posts |
![]()
Lorgix-
I'll tackle at least one of these once I'm done with the C163 from the other thread. More observations: On the C163, 9 hours of GPU -np1 at default stage1 norm produced 1GB of hits. I chose a stage2norm tight enough to root-optimize every hit, and proceeded to -np1 -nps on the gpu, while -nps on the first 1GB. 40 hours (about 1/3rd of the file) of processing on the 1GB has produced fewer hits than 7 hours of simultaneous -np1 -nps. Based on this outcome, I see no reason to do those first two steps separately. If I were to process the entire file, 9 hours of GPU time + 100 hr CPU-nps time would produce more than twice as many hits as 9 hours of -np1 -nps. So, it seems the nps thread cannot keep up with the stage 1 hits. How hard would it be to spawn two size-optimizing threads (or n threads?) to handle the GPU output live? |
![]() |
![]() |
![]() |
#84 |
Tribal Bullet
Oct 2004
5×709 Posts |
![]()
It would take a fair amount of re-engineering to duplicate all the ancillary structures for running multiple independent stage 2 jobs in separate threads. For larger jobs you can count on stage 1 hits to happen infrequently enough that -nps can keep up with a GPU, but at the C155-C163 size you get tons of output from stage 1 almost irrespective of what you do, and more threads will not get stage 2 to keep up.
jrk added a preliminary pass to -nps that reportedly makes degree 5 a lot faster, but that code is in an experimental branch with lots of other polynomial selection changes, that I don't have the time to work on now. |
![]() |
![]() |
![]() |
#85 |
"Curtis"
Feb 2005
Riverside, CA
22·1,319 Posts |
![]()
Thanks for the reply. The whole process is so fast that complaining the nps stage can't keep up with unbelievably fast GPU code is hardly fair. Good to know this experience is not universal to all composite sizes, too.
|
![]() |
![]() |
![]() |
#86 |
Sep 2010
Scandinavia
3·5·41 Posts |
![]()
I tried to use GPU poly select on a small test case (RSA-number).
What's wrong here? |
![]() |
![]() |
![]() |
#87 |
Sep 2009
2×3×163 Posts |
![]()
I often see this error. You need to have a copy / symlink of one of the files suitable for the GPU in the current working directory. But I don't currently have access to the computer with a GPU, so that I can check and post the file name...
|
![]() |
![]() |
![]() |
#88 | |
Sep 2010
Scandinavia
3·5·41 Posts |
![]() Quote:
I took a guess and placed a copy of the PTX-file (I don't actually know what that is) in the working directory. This got the poly selection to run. Mostly looking like in this screenshot. But msieve.dat.m remained empty. |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Polynomial Discriminant is n^k for an n-1 degree polynomial | carpetpool | Miscellaneous Math | 14 | 2017-02-18 19:46 |
Help choosing motherboard please. | Flatlander | GPU Computing | 4 | 2011-01-26 08:15 |
Choosing the best CPU for sieving | siew | Factoring | 14 | 2010-02-27 10:07 |
MPQS: choosing a good polynomial | ThiloHarich | Factoring | 4 | 2006-09-05 07:51 |
Choosing amount of memory | azhad | Software | 2 | 2004-10-16 16:41 |