mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2017-10-02, 07:37   #12
usermode
 
Sep 2017

32 Posts
Default

wombatman, thank you.
But something strange...
I'm try to start today gpu version with parameters:
msieve -g 0 -v -np 1,4000 -t 4
for 256bit N:
84995282910877845319177434936754876201592264917464386708389127475187790029013

and all is works now with your and with original sortlib but I'm not see in log what the videocard is detected. all works with "-g 0" flag or without it have a same speed result.

I write to worktodo.ini test value with 335 bit:
2881039827457895971881627053137530734638790825166127496066674320241571446494762386620442953820735453
and again have CUDA_ERROR_FILE_NOT_FOUND. modified sortlib engine not help.

I need to factorize 512 bit value, and my CPU do 0.1% for ~14 min. I think what it will be a little faster with my gpu but most likely for this time it's incompatible for cuda 6.1.

Last fiddled with by usermode on 2017-10-02 at 07:43
usermode is offline   Reply With Quote
Old 2017-10-02, 16:03   #13
chris2be8
 
chris2be8's Avatar
 
Sep 2009

2·1,021 Posts
Default

84995282910877845319177434936754876201592264917464386708389127475187790029013 is 78 digits and msieve won't use NFS for numbers with less than 85 digits, it just uses the quadratic sieve on them. So it won't have tried to use the GPU.

2881039827457895971881627053137530734638790825166127496066674320241571446494762386620442953820735453 is a reasonable test case where it will try to use the GPU.

Or you could give up on the GPU and use msieve on the CPU for polynomial selection. The speed difference is probably less than the time you've spent trying to get the GPU to work.

@jasonp, why doesn't msieve say the name of the file it can't find? That would save a lot of puzzling.

Chris

Last fiddled with by chris2be8 on 2017-10-02 at 16:05
chris2be8 is offline   Reply With Quote
Old 2017-10-03, 14:34   #14
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

33×131 Posts
Default

That error is from CUDA, which also doesn't say the file name it was looking for. It could actually refer to either their driver or a CUBIN section of a DLL that is supposed to have the GPU code for the relevant card.

I have seen cases where for a modern enough Pascal card you must compile with the 8.0 toolkit or it won't work even if it doesn't crash.

Last fiddled with by jasonp on 2017-10-03 at 14:36
jasonp is offline   Reply With Quote
Old 2017-10-05, 21:16   #15
Jeff Gilchrist
 
Jeff Gilchrist's Avatar
 
Jun 2003
Ottawa, Canada

117310 Posts
Default

I just tried running the factmsieve.py script from my example page with the msieve GPU binary that Brian Gladman compiled and is hosted on my website (http://gilchrist.ca/jeff/factoring/) and it works fine on my GTX 1070 card. It just takes a *really* long time for polynomial selection (5 hours vs 1 minute using 4 thread CPU).

@usermode do you have cudart64_80.dll in your system path or in the same folder as msieve? If not you can download it from here https://developer.nvidia.com/cuda-downloads

Code:
Thu Oct 05 10:00:57 2017 -> factmsieve.py (v0.86)
Thu Oct 05 10:00:57 2017 -> This is client 1 of 1
Thu Oct 05 10:00:57 2017 -> Running on 4 Cores with 1 hyper-thread per Core
Thu Oct 05 10:00:57 2017 -> Working with NAME = example
Thu Oct 05 10:00:57 2017 -> Running polynomial selection ...
Thu Oct 05 10:00:57 2017 -> msieve -s ..\GPU\example.dat -l ..\GPU\example.log -i ..\GPU\example.ini -nf ..\GPU\example.fb -g 0 -v -np 1,4000 -t 4
Thu Oct  5 10:00:57 2017  
Thu Oct  5 10:00:57 2017  
Thu Oct  5 10:00:57 2017  Msieve v. 1.53 (SVN 998)
Thu Oct  5 10:00:57 2017  random seeds: 830ea260 d235ef08
Thu Oct  5 10:00:57 2017  factoring 2881039827457895971881627053137530734638790825166127496066674320241571446494762386620442953820735453 (100 digits)
Thu Oct  5 10:00:57 2017  searching for 15-digit factors
Thu Oct  5 10:00:58 2017  commencing number field sieve (100-digit input)
Thu Oct  5 10:00:58 2017  commencing number field sieve polynomial selection
Thu Oct  5 10:00:58 2017  polynomial degree: 4
Thu Oct  5 10:00:58 2017  max stage 1 norm: 1.58e+17
Thu Oct  5 10:00:58 2017  max stage 2 norm: 3.44e+15
Thu Oct  5 10:00:58 2017  min E-value: 8.85e-09
Thu Oct  5 10:00:58 2017  poly select deadline: 1317
Thu Oct  5 10:00:58 2017  time limit set to 0.37 CPU-hours
Thu Oct  5 10:00:58 2017  expecting poly E from 1.43e-08 to > 1.64e-08
Thu Oct  5 10:00:58 2017  searching leading coefficients from 1 to 4000
Thu Oct  5 10:00:58 2017  using GPU 0 (GeForce GTX 1070)
Thu Oct  5 10:00:58 2017  selected card has CUDA arch 6.1
Thu Oct  5 15:17:29 2017  polynomial selection complete
Thu Oct  5 15:17:29 2017  R0: -1191805077826652345824255
Thu Oct  5 15:17:29 2017  R1: 1949275902691
Thu Oct  5 15:17:29 2017  A0: -900094273514840852683747752
Thu Oct  5 15:17:29 2017  A1: -7337844764575786222070
Thu Oct  5 15:17:29 2017  A2: -3360162038991689
Thu Oct  5 15:17:29 2017  A3: 258820560
Thu Oct  5 15:17:29 2017  A4: 1428
Thu Oct  5 15:17:29 2017  skew 1641629.80, size 1.033e-13, alpha -5.078, combined = 1.251e-08 rroots = 2
Thu Oct  5 15:17:29 2017  elapsed time 05:16:32
Jeff Gilchrist is offline   Reply With Quote
Old 2017-10-05, 21:23   #16
Jeff Gilchrist
 
Jeff Gilchrist's Avatar
 
Jun 2003
Ottawa, Canada

3·17·23 Posts
Default

The only problem I had was when msieve was finally done GPU poly selection there was an error and the script stopped (could not open msieve.dat.p file). The files were actually written so I was able to just re-run the script and it continued from the right spot.

During the long poly run it seemed like the msieve.dat.p file was 0 bytes, except at the end it actually had about 15MB worth of data in it so not sure if there is an issue with it not flushing to disk or something before the end. Any ideas @jasonp?

Last fiddled with by Jeff Gilchrist on 2017-10-05 at 21:51
Jeff Gilchrist is offline   Reply With Quote
Old 2017-10-05, 21:48   #17
usermode
 
Sep 2017

32 Posts
Default

Quote:
Originally Posted by Jeff Gilchrist View Post
I just tried running the factmsieve.py script from my example page with the msieve GPU binary that Brian Gladman compiled and is hosted on my website (http://gilchrist.ca/jeff/factoring/) and it works fine on my GTX 1070 card. It just takes a *really* long time for polynomial selection (5 hours vs 1 minute using 4 thread CPU).

@usermode do you have cudart64_80.dll in your system path or in the same folder as msieve? If not you can download it from here https://developer.nvidia.com/cuda-downloads

Code:
Thu Oct 05 10:00:57 2017 -> factmsieve.py (v0.86)
Thu Oct 05 10:00:57 2017 -> This is client 1 of 1
Thu Oct 05 10:00:57 2017 -> Running on 4 Cores with 1 hyper-thread per Core
Thu Oct 05 10:00:57 2017 -> Working with NAME = example
Thu Oct 05 10:00:57 2017 -> Running polynomial selection ...
Thu Oct 05 10:00:57 2017 -> msieve -s ..\GPU\example.dat -l ..\GPU\example.log -i ..\GPU\example.ini -nf ..\GPU\example.fb -g 0 -v -np 1,4000 -t 4
Thu Oct  5 10:00:57 2017  
Thu Oct  5 10:00:57 2017  
Thu Oct  5 10:00:57 2017  Msieve v. 1.53 (SVN 998)
Thu Oct  5 10:00:57 2017  random seeds: 830ea260 d235ef08
Thu Oct  5 10:00:57 2017  factoring 2881039827457895971881627053137530734638790825166127496066674320241571446494762386620442953820735453 (100 digits)
Thu Oct  5 10:00:57 2017  searching for 15-digit factors
Thu Oct  5 10:00:58 2017  commencing number field sieve (100-digit input)
Thu Oct  5 10:00:58 2017  commencing number field sieve polynomial selection
Thu Oct  5 10:00:58 2017  polynomial degree: 4
Thu Oct  5 10:00:58 2017  max stage 1 norm: 1.58e+17
Thu Oct  5 10:00:58 2017  max stage 2 norm: 3.44e+15
Thu Oct  5 10:00:58 2017  min E-value: 8.85e-09
Thu Oct  5 10:00:58 2017  poly select deadline: 1317
Thu Oct  5 10:00:58 2017  time limit set to 0.37 CPU-hours
Thu Oct  5 10:00:58 2017  expecting poly E from 1.43e-08 to > 1.64e-08
Thu Oct  5 10:00:58 2017  searching leading coefficients from 1 to 4000
Thu Oct  5 10:00:58 2017  using GPU 0 (GeForce GTX 1070)
Thu Oct  5 10:00:58 2017  selected card has CUDA arch 6.1
Thu Oct  5 15:17:29 2017  polynomial selection complete
Thu Oct  5 15:17:29 2017  R0: -1191805077826652345824255
Thu Oct  5 15:17:29 2017  R1: 1949275902691
Thu Oct  5 15:17:29 2017  A0: -900094273514840852683747752
Thu Oct  5 15:17:29 2017  A1: -7337844764575786222070
Thu Oct  5 15:17:29 2017  A2: -3360162038991689
Thu Oct  5 15:17:29 2017  A3: 258820560
Thu Oct  5 15:17:29 2017  A4: 1428
Thu Oct  5 15:17:29 2017  skew 1641629.80, size 1.033e-13, alpha -5.078, combined = 1.251e-08 rroots = 2
Thu Oct  5 15:17:29 2017  elapsed time 05:16:32
Thanks! Now all is good! Just some questions, I'm use msieve153dev_svn1002_win64_cuda:
1. At this time I have previous CPU factoring result 26% of work: if I continue process with gpu support, the "cpu" results will be continued or recreated?
2. Now CPU not fully loaded (4 threads from 8 loaded on 40-50% only) and have only one msieve.exe process - is it normally?
3. Which engine I should use better for CPU i7 6700K: msieve.gpu.core2 or msieve.gpu.ivybridge?
4. Oh, ~5+ hours with gpu for 100 digits number. With cpu it 0.35 hours for me. I need to factorize 155 digits number - it will be faster with cpu?

Last fiddled with by usermode on 2017-10-05 at 22:17
usermode is offline   Reply With Quote
Old 2017-10-05, 23:52   #18
wombatman
I moo ablest echo power!
 
wombatman's Avatar
 
May 2013

5·349 Posts
Default

Quote:
Originally Posted by usermode View Post
Thanks! Now all is good! Just some questions, I'm use msieve153dev_svn1002_win64_cuda:
1. At this time I have previous CPU factoring result 26% of work: if I continue process with gpu support, the "cpu" results will be continued or recreated?
2. Now CPU not fully loaded (4 threads from 8 loaded on 40-50% only) and have only one msieve.exe process - is it normally?
3. Which engine I should use better for CPU i7 6700K: msieve.gpu.core2 or msieve.gpu.ivybridge?
4. Oh, ~5+ hours with gpu for 100 digits number. With cpu it 0.35 hours for me. I need to factorize 155 digits number - it will be faster with cpu?
Just for fun, try this:
Code:
(msieve file name here) -s ..\GPU\example.dat -l ..\GPU\example.log -i ..\GPU\example.ini -nf ..\GPU\example.fb -g 0 -v -np1 1,4000 -t 4
This will only perform the first step of polynomial selection--that is, the part that is done on the GPU. There are two more steps (size and root optimization) that are CPU single-threaded only, so that's why you see low CPU usage. See if running the above command line is faster. If so, it will confirm that your GPU is being limited by the CPU. It'll be a good start to learning how to use msieve most efficiently

Last fiddled with by wombatman on 2017-10-05 at 23:54
wombatman is offline   Reply With Quote
Old 2017-10-06, 06:01   #19
usermode
 
Sep 2017

32 Posts
Default

Quote:
Originally Posted by wombatman View Post
Just for fun, try this:
Code:
(msieve file name here) -s ..\GPU\example.dat -l ..\GPU\example.log -i ..\GPU\example.ini -nf ..\GPU\example.fb -g 0 -v -np1 1,4000 -t 4
This will only perform the first step of polynomial selection--that is, the part that is done on the GPU. There are two more steps (size and root optimization) that are CPU single-threaded only, so that's why you see low CPU usage. See if running the above command line is faster. If so, it will confirm that your GPU is being limited by the CPU. It'll be a good start to learning how to use msieve most efficiently
thanks, with these parameters the cpu and gpu loaded better (I set -t 8), and at the end I have error for test 100 digits number 2881039827457895971881627053137530734638790825166127496066674320241571446494762386620442953820735453:
https://www.upload.ee/image/7529357/err2.png
error generating or reading NFS polynomials

in "gpu" dir following files:
example.log
example.dat.m (157.7 Mb)
example.ini

which next step is needed?

Last fiddled with by usermode on 2017-10-06 at 06:10
usermode is offline   Reply With Quote
Old 2017-10-06, 13:10   #20
wombatman
I moo ablest echo power!
 
wombatman's Avatar
 
May 2013

174510 Posts
Default

Quote:
Originally Posted by usermode View Post
thanks, with these parameters the cpu and gpu loaded better (I set -t 8), and at the end I have error for test 100 digits number 2881039827457895971881627053137530734638790825166127496066674320241571446494762386620442953820735453:
https://www.upload.ee/image/7529357/err2.png
error generating or reading NFS polynomials

in "gpu" dir following files:
example.log
example.dat.m (157.7 Mb)
example.ini

which next step is needed?
First, I would suggest running
Code:
(msieve_file_name) --help
That will show you the wide array of command line options you have. Also, go through the readme file for more details on what each does.

Second, that's not actually an error. It just means you've only done one part of the polynomial selection. Your next step is to run the same command line, but replace "-np1 1,4000 -t 4" with "-nps -npr". This will run the size and root optimization of the candidates you generated (in the file example.dat.m). Note again that this step is single-threaded, so it will take a bit longer.

Once you've finished this step, start playing with the command line parameters related to these steps (stage1_norm, stage2_norm, min_evalue, and so on).
wombatman is offline   Reply With Quote
Old 2017-10-06, 17:05   #21
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

34×59 Posts
Default

Quote:
Originally Posted by wombatman View Post
Once you've finished this step, start playing with the command line parameters related to these steps (stage1_norm, stage2_norm, min_evalue, and so on).
To expand on this:
The default msieve settings were created before GPU code was written, so the GPU first stage (-np1) generates more data than is efficient for -nps and -npr to handle. For instance, 3 hrs of -np1 GPU work might generate 8 hours of -nps, and the -nps step might generate 12+ hrs of -npr work.

You can order the -nps output by score ("sort" command in linux) and only run -npr on the top 100 or 200 results to save time, or you can adjust the msieve settings with the above flags. I alter stage1_norm and stage2_norm by first looking at the default settings in msieve.log, and then dividing stage 1 by 8 to 10 and dividing stage 2 by 25 to 35. This reduces output enough to make the -nps and -npr steps comparable in length to the -np1 GPU-enabled step, and saves me from having to sort/edit the -nps output.

Note that -t threads step in -np1 is threads sent to GPU, *not* threads on CPU. The CPU handles some overhead to manage the data generated by the GPU, and for small jobs or fast cards setting threads to 3 or 4 can more fully utilize the GPU to reduce the time the GPU waits . In no case does this need to be 10+ threads!
VBCurtis is online now   Reply With Quote
Old 2017-10-06, 17:42   #22
wombatman
I moo ablest echo power!
 
wombatman's Avatar
 
May 2013

5·349 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
To expand on this:
The default msieve settings were created before GPU code was written, so the GPU first stage (-np1) generates more data than is efficient for -nps and -npr to handle. For instance, 3 hrs of -np1 GPU work might generate 8 hours of -nps, and the -nps step might generate 12+ hrs of -npr work.

You can order the -nps output by score ("sort" command in linux) and only run -npr on the top 100 or 200 results to save time, or you can adjust the msieve settings with the above flags. I alter stage1_norm and stage2_norm by first looking at the default settings in msieve.log, and then dividing stage 1 by 8 to 10 and dividing stage 2 by 25 to 35. This reduces output enough to make the -nps and -npr steps comparable in length to the -np1 GPU-enabled step, and saves me from having to sort/edit the -nps output.

Note that -t threads step in -np1 is threads sent to GPU, *not* threads on CPU. The CPU handles some overhead to manage the data generated by the GPU, and for small jobs or fast cards setting threads to 3 or 4 can more fully utilize the GPU to reduce the time the GPU waits . In no case does this need to be 10+ threads!
wombatman is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Best 4XX series GPU siegert81 GPU Computing 47 2011-10-14 00:49
PR 4 # 33 -- The last puzzle from this series Wacky Puzzles 31 2006-09-14 16:17
An interesting series Citrix Math 0 2005-11-02 05:33
Another Series Gary Edstrom Puzzles 7 2003-07-03 08:32
Series Rosenfeld Puzzles 2 2003-07-01 17:41

All times are UTC. The time now is 19:58.

Tue May 11 19:58:15 UTC 2021 up 33 days, 14:39, 1 user, load averages: 1.44, 1.75, 1.83

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.