mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Msieve (https://www.mersenneforum.org/forumdisplay.php?f=83)
-   -   Low CPU/GPU usage? (https://www.mersenneforum.org/showthread.php?t=21993)

GeoffreyY 2017-02-01 19:36

Low CPU/GPU usage?
 
Hi, just started learning to use msieve, trying to run on a 170 digit number, but I notice CPU and GPU usage are both quite low. Is this normal?

GeoffreyY 2017-02-01 21:45

Sorry, apparently I forgot the attachment or something:

[url]https://imgur.com/a/g3xFM[/url]

I am in fact running factmsieve.py, after following instructions from that guide. I should be using the CUDA version if I did it correctly.I just started the process, so I'm still in the poly selecting stage.

Here's the log file, if it's of any use:

[url]http://pastebin.com/0jitawas[/url]

wombatman 2017-02-01 21:53

The GPU is used only for part of the -np step (-np1, if you look at the msieve -h), so it's not being used 100% of the time. Not sure about the CPU since I think the other parts of -np1 (-nps and -npr) are multithreaded. Maybe -npr is not.

Also, based on your filename, are you trying to crack an RSA number? Might want to explain what you're doing...

GeoffreyY 2017-02-01 22:10

As you mentioned them, can you explain the flags / process in a bit more depth? I'm still reading up on NFS very slowly...

As for the "RSA", we're learning cryptography in our computing class, and the 170 digit number is the example modulus used. I take this chance to try and crack it while reading up on some extra maths.

I believe the standard for RSA nowadays is (or rather [I]should be[/I]) 2048 bit anyway, and probably not this unusual 170 digit.

wombatman 2017-02-01 22:17

Truthfully, there are others on here (including MSieve's creator) who can explain it better than I can, so I'll leave that to them.

I only asked about the RSA name because there have previously been others on here trying to crack keys. As you might imagine, people are less inclined to help them.

Also, that C170 is going to take a while to factor using GNFS. I'm working on a C150 (with 6 threads), and that's going to take me a week or so. I believe the general rule of thumb is that time to factor via GNFS roughly doubles for each 5 digit increase, so you're looking at something like 4 months or so?

GeoffreyY 2017-02-01 22:30

Thanks for the info, I'll keep looking around!

I wasn't sure how long will it take; your 4 months estimate is a bit high for me. My spring term ends at the end of March, and if I don't expect to finish cracking by then, I'll probably stop the project and go back to finding primes.

I may be able to use my uni's computers to speed up the process a bit, but I'm not sure how effective can that be.

VBCurtis 2017-02-01 22:43

Expanding on Wombatman's info:

Poly select can be run for as long as you like; I suggest about a GPU-week as a reasonable balance (that is, if you spent only 3 days on poly search you might spend 4 extra days on the other longer steps, compared to a better poly that the extra 4 days *might* find- it's guesswork and expectations).

Once that step is done,and you can run that step on different ranges of leading coefficients on multiple machines (say, 1 day each on 7 machines with each one assigned a range on the command line), then you'd invoke factmsieve.py with the proper number of threads for your machine's hardware. For best speed, DO take advantage of hyperthreading; a quad-core HT should have 8 threads running.

Speed: I've finished a GNFS-156 on a 6-core i7 in nearly exactly 7 days. So, using the "double every 5 digits" rule of thumb, you can expect 7-8 weeks if you have 6 cores available, or 10-12 weeks if you have just a single quad-core. You can definitely split the task over two or more machines; just make each machine do its own region of q-values.

If you look at the command that factmsieve issues to the command line as it runs, you can figure out how to manually invoke lasieve4i15e on any number of machines. It takes a little practice/trial and error; I suggest you try to factor something around 130 digits and mess with command lines and editing q-values and renaming outputs to spairs.add to see how the script adapts to these things.

If you post the number here, I'll give my GPU a few hours of poly select on it, and can explain what score means for your expected sieving time.

Note the stage after sieving, when you solve the matrix, will be done on one machine only. Expect a few days for that last step (unless you have a really fast machine like a Xeon or 6-core, which might reduce that to 50-65 hr).

Good luck!

GeoffreyY 2017-02-01 23:04

If you're interested, here's the information for our given RSA encryption: [url]http://pastebin.com/QTC4LpvJ[/url]

I'll mess around a bit more with the code on uni's computer tomorrow.

I got so far as to figured out the e-score(?) is the 10^-13 one in .dat.p file; "rroots" is probably the number of real roots of the polynomial, and that smallest integer param is a good indication of progress.

I'm progressing but struggling quite a bit understanding the ggnfs-doc.pdf, as I barely know any group theory. On the bright side, I'll have a quite a head start next year :smile:

Again, thank you everybody for the information!

VBCurtis 2017-02-02 04:06

I admit I don't grok the NFS algorithm as well as I'd like, but for your purposes it's quite useful to understand the stages and what progress looks like.

I'm running poly select on two GPUs this evening on your number; I'll let them each run a couple hours and post the best poly I find here (so you can compare what you find, which will surely be better).

In this subforum, you'll find a thread "best msieve poly scores". Post #7 notes a C170 has best score 3.91e-13. If you break that score, I'd end poly select and proceed right to the sieving step (and post the poly you found so I can note the new record!)

A C171 has score of 3.13e-13, so you should expect to beat that score with your search. Score is, roughly, a measure of the rate at which you can expect to find relations, so a larger score is better (and project length scales roughly with inverse of score- a 3e-13 project is *about* 10 times harder than a 3e-12 project).

Your search will produce quite a few okay polys, with occasional good ones. In some sense, we wait for a really good one to pop out of the search- but how long to wait, as well as what "good" is, isn't quite clear. That's why I track best scores- if you break someone else's best for a size, you've likely found a really good poly and can skip the rest of poly select. If you find a few around 3.2e-13, and then one batch has one or two like 10% better, you've probably found your poly. It's worth holding out a few days of poly select for a hit 8-10% better than the typical best results from a run (I check twice a day, and consider each half-day a "run"), because 8% shorter project length for an 8-week effort is ~5 days of saved sieving time.

The factmsieve script has been tested and refined for inputs up to about 150-155 digits; the automatically generated settings aren't the fastest for 150+ digits, simply because so few projects are done at those sizes the author lacked data to create a set of best parameters (so he guessed a little, and figured anyone serious enough to do C170 would already know enough to set parameters himself). The NFS@home BOINC project has recently done a bunch of factorizations in this size category, but they've not always optimized parameters for shortest project length; all the same, their parameters are useful to copy for this project absent better info.

If you decide you wish to do this factorization, I'll explain test-sieving in a future post, so you can test a couple possible parameters choices and hope to discover savings of a day or week in sieving via better-than-script-default choices.

VBCurtis 2017-02-02 05:51

2 hrs on each of two GPUs yielded 5 polys between 3e-13 and 3.1e-13, one just over 3.1e-13, and this one:
[code]# norm 1.664492e-16 alpha -7.157348 e 3.470e-13 rroots 5
skew: 5053398.85
c0: -2786893278589734609571856633574386128995
c1: 4306299542743462566376217981723261
c2: 3783578211791106561032446613
c3: -821977447850264829373
c4: -149300456419050
c5: 5352984
Y0: -387655707969523635036413081890624
Y1: 16516041263376677[/code]
I found it using the following invocation for msieve:
./msieve -np1 -nps "5000000,6000000 stage1_norm=4e24 stage2_norm=4e22" -s rsa170test
When I got bored, I stopped it and then ran:
./msieve -npr -s rsa170test
The 5million,6million part is the value of the first coefficient to search over; this is the part you can easily spread over multiple machines. The other settings I set based on personal experience and the speed of my GPUs.
Hope this helps!

GeoffreyY 2017-02-02 08:50

Sorry, I think I set up something wrong.

I ran the poly sel overnight (~16 hour total), and the best poly is slightly less than 3e-13, and you have few over 3e-13 in few hours.

The test also only ran from 0 to ~400 for the leading coefficient, instead of your 5 million.

Perhaps because I invoked it through factmsieve.py, instead of using custom parameters?

Again, the CPU / GPU usage is consistently low (~50% at best), is that normal?

Lastly, I tried to run '.\msieve -np1 -nps "6000000,7000000 stage1_norm=4e24 stage2_norm=4e22" -s RSA', it complains "cannot open input file 'worktodo.ini'".

I tried to make a worktodo.ini with the C170 number in it and see what happen. It creates a .ms file, it runs and outputs log on the command prompt, but it has yet to output anything. When I interrupted it, it shows "error generating or reading NFS polynomials". All files remain empty / unchanged

Sorry for the bombard of questions :bow:

GeoffreyY 2017-02-02 10:48

I stand corrected, after running for an hour, I have the following output in RSA.ms:

[url]http://pastebin.com/Avj6B01B[/url]

Still, this is different from what I saw previously. How can I check if this is valid?

Edit: good news, with invoking msieve from command line, my GPU is currently on full load. Thanks guys :smile:

chris2be8 2017-02-02 17:12

GNFS has several stages:
1: Searching for small factors with ECM. If you know it doesn't have any small factors you can skip this (RSA numbers usually don't).
2: Searching for a polynomial. This can be aided by your GPU. And split over several systems.
3: Sieving for relations. This can be split over several systems, each core on each system sieving a separate range. Be careful to ensure you don't sieve the same range more than once, this produces duplicate relations which are useless.
4: Building a matrix (by running msieve -nc1). It may fail saying it needs more relations, if so sieve some more. But don't take the message saying how many more it needs seriously. This is single threaded.
5: Solving the matrix (msieve -nc2). This may take several days for RSA170. It can use multiple cores on 1 system.
6: The square root stage (msieve -nc3) which finds the factors. This is single threaded.

You might do better to start by factoring RSA100 first (In a different directory to the one you are working in). That should not take long and seeing it all the way through will tell you what to expect in larger runs.

Then RSA110, RSA120, etc. Each will take about 4 times as long as the last.

Also look at the NFS@Home forum thread [url]http://mersenneforum.org/showthread.php?t=20023&page=162[/url] on linear algebra. You can read the msieve logs for similar sized jobs which might be helpful.

HTH

Chris

VBCurtis 2017-02-02 17:19

Sorry I forgot the worktodo.ini part- glad you solved that one!

My settings do not create much output into .ms; I think four GPU-hours found ~90 lines in the .ms file, something on the order of one every 4-5 minutes.

The -npr command works on the raw data in the .ms file, outputting actual usable polynomials into RSA.p. The best-scoring poly found will be saved to msieve.fb; in order to begin another run, you'll have to rename that file (I use filenames like RSA{scoreofthispoly} so I can see from the directory which poly had the best score, so the one I found for you is saved as RSA346).

I don't know why the script wasn't using your GPU fully, but I can tell you the script predates msieve-GPU code. I also can tell you that the -npr step takes a lot of time on each hit, and the default settings in msieve generate a LOT of hits into .ms! Perhaps the script tries to do -npr on all of them, pausing the GPU as it processes all of the GPU's output. My custom norm settings reduce the output from the GPU to just the very best candidates in the raw data, so that -npr takes perhaps an hour or two per day of GPU output.

If you have a fast GPU, say 900 series or newer, you may wish to add -t 2 after the -np1 -nps flags; this sends two sets of data to the GPU to process at the same time (two threaded, so to speak). This helps with smaller input numbers, say C150s, but is likely only slightly useful at this size.

As for running from 0 to 400, the default behavior is to start with coefficient 0 and work up, and really small coeffs take a while to churn through. I often start my search at 5000 or 10000; I only selected 5 million for mine to avoid overlapping any work you had already done. Smaller coeffs have a history of producing good scores with slightly less search time than large ones; if you run 10000 to 500000 with my norm settings, I bet you'll find a few polys above 3.2e-13.

Yes, your pastebin output is exactly what should be in .ms; if you ran ./msieve -npr -s RSA msieve would process that raw polynomial into one that the script can run sieving on (a list is saved to RSA.p). Not every raw poly line will produce a good-enough poly to save into RSA.p; I'd wait for 50 or more lines in RSA.ms to bother stopping the GPU search to run -npr. Note that once you run -npr, you should rename RSA.ms into something else, so that you don't re-process those hits the next time you run -npr. Also, change the input coefficient to start where the GPU left off; if your search got from 6000000 to 6150328, I'd start the next run at 6150400. But, start at 10000 sometime.

GeoffreyY 2017-02-02 21:52

Thank you very much for all the info! I don't think I'll have any more problems from here going forwards. :smile:

I sieved through 5.5 to 6.2 mil this afternoon, but I didn't find any poly above 3.2e-13. I only have a laptop with a 950m, which's the only machine I've been sieving on. There are some technical difficulties with uni's machine.

One more question, is the polynomial searching non-deterministic? I think I got different results passing through the same range twice.

Also, are there any particular aspect / coefficient of the polynomial I should be looking for, besides the Murphy E-score? Or are they simply data points that don't carry much weight?

VBCurtis 2017-02-02 23:12

[QUOTE=GeoffreyY;452106]One more question, is the polynomial searching non-deterministic? I think I got different results passing through the same range twice.

Also, are there any particular aspect / coefficient of the polynomial I should be looking for, besides the Murphy E-score? Or are they simply data points that don't carry much weight?[/QUOTE]

1. msieve searches a subset of the search space for each leading coefficient, randomizing which piece each run to minimize repeated work if multiple people run the same range. It was my impression that my tight stage1-norm reduced the search space so much that msieve didn't do this, but your results suggest otherwise. So, sort-of deterministic? :)

2. Score is an estimate of poly effectiveness; it is imperfect but within 3-5% of accurate; that is, two polys of the same score can differ by up to 5% speed in use. For jobs this size, folks usually test any poly that scores within 3-4% of the best poly (say, within 0.15 in your case) in case the best one happens to run poorly and 2nd or 3rd place runs well. If you find nothing better than 3.30, I wouldn't bother with test sieving; just use the 3.4699 poly I posted.

Best wishes on your effort- post if you have further questions, such as what parameters to select. The choices are made around line 520 in factmsieve.py, and formulas are easily edited to nudge the script to pick a different siever or large-prime bound. Hopefully, it picks 15e rather than 14e (part of the name of the lasieve executable), and chooses 2^31 as large prime bound (in a .job file created by the running script, you'll see lbpr and lpba; those should be 31 for a job this size, with 32 acceptable but 30 probly bad).

GeoffreyY 2017-02-03 10:35

I still haven't found any better poly, so I've moved on to finding relations using your poly. I think this RSA.poly file should be alright: [url]http://pastebin.com/w15j4PDx[/url]

I'm able to use uni's computer for this step, but I have to manually create the .job files so that there's no overlap, because I'm not sure how to do otherwise. (example: [url]http://pastebin.com/jKdcRigw[/url]) I guess I can make my own .resume files?

One problem though, sometimes after invoking the command, the uni computer reports the .exe is not responding. I think it took a bit too long to create the .afb.0 file. They all have the same size (14,951,800 bytes), are they all identical? If I simply copy, paste and rename the files, I no longer have the occasional crash at start-up, but I'm not sure if I should do that.

Another thing, do I simply transfer all the spairs.out.t_ files from my multiple computers and truncate them? I'm just zipping them up for now.

Sorry if my questions are getting bothersome. I'm partly recording the process for myself for when I come back in the future. Thanks!

GeoffreyY 2017-02-03 13:34

quick update: currently I have the following files in \gfns\RSA: [url]http://pastebin.com/sWYD4t5Q[/url]

And from there using "..\factmsieve.py RSA" seems to be working.

I think I have a total of 100 relations/sec utilizing uni's computers. This easy setup means I can easily set up more computers if I want to / if I can.

jasonp 2017-02-03 14:47

Your error is an FAQ; if you only run poly selection in stages then unless you run the stage 2 root sieve you will not assemble a complete polynomial. By default the NFS driver in Msieve assumes you want to do the whole factorization, so not having a complete polynomial is treated as an error. It actually is silly to do that, since even if you did have a complete polynomial the sieve code in Msieve is probably an order of magnitude slower than the high-performance lattice sieve that a script calls an external binary to perform.

In the past we have had students complete big factorizations (i.e. 160 digits) in a remarkably short time, by taking over idle computer labs. On modern machines a 170-digit factorization starts to get into the range where the linear algebra is a major headache to perform in a reasonable time, i.e. it will take weeks unless you use MPI and have a few machines with a high-speed interconnect (gigabit ethernet isn't enough).

VBCurtis 2017-02-03 16:48

[QUOTE=GeoffreyY;452153]
One problem though, sometimes after invoking the command, the uni computer reports the .exe is not responding. I think it took a bit too long to create the .afb.0 file. They all have the same size (14,951,800 bytes), are they all identical? If I simply copy, paste and rename the files, I no longer have the occasional crash at start-up, but I'm not sure if I should do that.

Sorry if my questions are getting bothersome. I'm partly recording the process for myself for when I come back in the future. Thanks![/QUOTE]

No, the .afb files do change from range to range, and it's not wise to move/rename them.

Threads like this become reference material when the next person comes along, so your idea of recording the process for yourself is wise.

(I don't know the answer to spairs.add, so I didn't address that question)

EdH 2017-02-03 21:34

[QUOTE=GeoffreyY;452153]...
Another thing, do I simply transfer all the spairs.out.t_ files from my multiple computers and truncate them? I'm just zipping them up for now.
...[/QUOTE]
You should not truncate the files. You should concatenate all of the unzipped spairs.out.t_ files into spairs.add:
[code]
cat spairs.out.t_* >> spairs.add
[/code]From there, factmsieve.py will concatenate spairs.add to the RSA.dat file. By using the spairs.add file, you allow factmsieve to modify the RSA.dat file instead of risking a manual modification causing a crash.

GeoffreyY 2017-02-17 11:47

I have successfully factorized: C170 = P85 * P85. For various reasons I won't post the exact result until 27th Feb. (deadline for this bonus challenge) Thank you everyone for helping.

[QUOTE]In the past we have had students complete big factorizations (i.e. 160 digits) in a remarkably short time, by taking over idle computer labs.[/QUOTE]

:whistle:

To be safe, I got up to ~120% of minimum relations needed, and got 30 nontrivial dependencies at the end. Dependency number 1 worked anyway. Oh well.

Apparently I had to convert my .poly into .fb to start the linear algebra phase with factmsieve.py.

During the -nc2 phase my windows crashed (BSOD), and the .mat file became 0 bytes for some reason. The .dat file was also re-written. but luckily we have .gz backup. :bow: I think it's a RAM issue, as I had chrome with multiple tabs opened.

I'll try and see if I can reproduce the low CPU/GPU usage issue that started this thread.

Besides that, I'm probably going to grab an aliquot sequence to work on. I was looking into helping factordb too, but it doesn't seem 100% stable at the moment.

VBCurtis 2017-02-17 15:29

Mr Yeung-
Congratulations! University lab or no, it is a non-trivial undertaking to get a C170 factored in two weeks using GNFS, never mind that you did so whilst learning the tools and how to manage parameters to and data from so many machines.

We're glad to have you sticking around to contribute to the forum and factorization efforts!

wombatman 2017-02-17 18:01

Seconding the well-done! Looking forward to having you around! :smile:


All times are UTC. The time now is 02:36.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.