mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Operation Kibibit (https://www.mersenneforum.org/forumdisplay.php?f=97)
-   -   Call for volunteers: RSA896 (https://www.mersenneforum.org/showthread.php?t=17460)

 jasonp 2012-11-17 14:12

Call for volunteers: RSA896

As of SVN823 the Msieve code is almost stable enough to consider a release of v1.51 (the only thing left is deciding what to do with the windows build projects, which will need a lot of fixing after the recent GPU overhaul). [url="www.boo.net/~jasonp/msieve_svn823.zip"]A win32 binary[/url] is available for your perusal.

As a larger-scale test of the code, I propose we run polynomial selection on RSA896. Paul Zimmermann reports his group still has a lot of work left with this task, and using a GPU is much more effective on this size problem than using a CPU. So if you have a graphics card and at least version 4.2 of the Nvidia drivers, consider running a range of poly selection stage 1 for RSA896 with me (I've been at it for 2 weeks now).

More specifically, write the following in a file worktodo.ini:
[code]
412023436986659543855531365332575948179811699844327982845455626433876445565248426198098870423161841879261420247188869492560931776375033421130982397485150944909106910269861031862704114880866970564902903653658867433731720813104105190864254793282601391257624033946373269391
[/code]
and then run

msieve -v -np1 "stage1_norm=5e33 X,Y"

where X and Y is a range of numbers with magnitude 10^12 to 10^15. You should pick a fairly large range since the code will skip over many leading coefficients that are not smooth enough, and multiple machines (say up to 5) can search the same range with a high probability of not duplicating any work. With a GTS450 card (which is pretty low-end) I get about 80000 hits per week, which is about 2MB zipped. So you will periodically want to interrupt a run in progress, restart with X set to start above the last coefficient searched, and msieve.dat.m moved to another filename.

The CPU utilization will be very low while a run is in progress, but interactivity in my winXP GUI is noticeably worse, which will be annoying. The binary can be run multithreaded but it does not make a difference with such large problems.

Please post here if you have problems or questions. I've only had time to test the GPU-specific engine for Fermi cards, custom engines exist for older cards but I have no idea if they work.

First, X and Y have to be in decimal, not in scientific notation like older versions used to allow.

Also, if building from SVN make sure to use the trunk and not the msieve_faststage1 branch; that branch is now old.

 pinhodecarlos 2012-11-17 17:35

So you don't recommend using only CPU.

 jasonp 2012-11-17 18:44

In my tests, the GPU code is perhaps 100x faster than the CPU code. The CPU code is not multithread-aware but could benefit from multithreading. Actually the CPU code needs to be able to use a thread pool and high-performance radix sorting just like the GPU code does now, after which the CPU and GPU code could probably be mostly unified together. But we're not there yet, and could probably buy a 10x speedup in the CPU code by doing so.

Given a choice between a CPU run and nothing, then obviously a CPU run would be better. But currently the situation with stage 1 polynomial selection is exactly analogous to Mersenne trial factoring, vis-a-vis the GPU speedup.

 pinhodecarlos 2012-11-17 18:49

[QUOTE=jasonp;318747]In my tests, the GPU code is perhaps 100x faster than the CPU code. [/QUOTE]

How many cores has your CPU?

 jasonp 2012-11-17 19:22

This is 1 core of a Core2 from 2007 vs a low-end Fermi card from 2010. The GPU finds about one hit every 5-10 seconds, and now I see that the CPU code found two hits after 24 minutes. Neither codebase is optimal, neither set of hardware is very fast in absolute terms and the search itself can find many times more hits if the cutoff for hit quality is loosened. The CPU code would certainly speed up linearly if you threw more cores at it. But currently GPUs have a huge performance advantage when running this algorithm.

 debrouxl 2012-11-17 20:35

BOINC wrapper integration would enable us to explode the CPU+GPU power working on polynomial selection for RSA-896. I should dig again into my mailbox for the discussions about that, though I don't have time to help with it right now...

 frmky 2012-11-19 20:08

[QUOTE=jasonp;318710]With a GTS450 card (which is pretty low-end) I get about 80000 hits per week, which is about 2MB zipped.[/QUOTE]
I started at 10^13. On the GTX 480, I got 229 hits in 8 minutes, which extrapolates to a bit under 300000 per week. How many are you wanting to collect?

 Dubslow 2012-11-19 20:54

[QUOTE=frmky;318965]I started at 10^13. On the GTX 480, I got 229 hits in 8 minutes, which extrapolates to a bit under 300000 per week. How many are you wanting to collect?[/QUOTE]

Sheesh, that's a lot. :smile: Perhaps we should lower the norm to (e.g.) 5e32.

How will -nps and -npr be run? Should we run the former our selves, as had previously been suggested for BOINCification?

frmky, how fast are you traversing leading-coeff ranges?

I plan to participate, once a) I'm back at school and b) I get my recently RMAd 460.

 jasonp 2012-11-20 01:25

I've computed about 65000 hits so far, and Paul reports computing 500k hits with CADO-NFS. I'm currently covering the 1e13 range as well, so we will probably have a little duplication. The code selects very few leading coefficients in this range, after two weeks I'm up to around (1e13+1e9).

I'm also being lazy, in that Paul's group can turn around stage 2 pretty quickly so I'm not bothering with it. A BOINC project would definitely need to do its own size optimization, since I'd expect thousands of good size-optimized polynomials out of tens of millions of stage 1 hits.

The upper limit is a good question; we would want millions of hits at least.

 frmky 2012-11-20 03:13

It's a big project, but if we only want millions of hits I'm not sure it's big or long term enough for BOINC. With two GPU's I will be getting over a half million per week.

Starting at 10^13, the leading coefficient has advanced by 21 million in 6.5 hours, with a bit over 10000 hits. Extrapolating, if we want to cover the entire range, this will take 35000 GTX480-years and yield a half-trillion hits totally 12 terabytes. To put this in perspective, NFS@Home uses about 1.5 CPU-years each day.

 Dubslow 2012-11-20 03:16

[QUOTE=jasonp;318995]I've computed about 65000 hits so far, and Paul reports computing 500k hits with CADO-NFS. I'm currently covering the 1e13 range as well, so we will probably have a little duplication. The code selects very few leading coefficients in this range, after two weeks I'm up to around (1e13+1e9).

I'm also being lazy, in that Paul's group can turn around stage 2 pretty quickly so I'm not bothering with it. A BOINC project would definitely need to do its own size optimization, since I'd expect thousands of good size-optimized polynomials out of tens of millions of stage 1 hits.

The upper limit is a good question; we would want millions of hits at least.[/QUOTE]

Does CADO support CUDA, or is yours the only such code?

How does lowering the norm affect how many hits you need? How likely would you be to miss an unusually-great poly with too large a norm?

It seems to me that running our own -nps would, at the very least, cut down on how much data we need to transfer. In either case, do we just attach results here and Paul will download them or something?

 jasonp 2012-11-20 16:02

CADO does not support GPUs; other researchers have substituted their own GPU components for the corresponding pieces in CADO-NFS (i.e. make the Block Weidemann code use a GPU for matrix multiplies).

It's an open question whether lowering the target norm makes it easier to find good hits; it certainly drastically reduces the number of hits found, and you would think that it would help the size optimization deal with the low-order coefficients if the top few polynomial coefficients were much smaller than average. But I don't know that, so far all the hits I've found have led to polynomials a lot worse than their current best one.

Delivering big batches of hits isn't a problem; I can take them by email or you can upload to Rapidshare or some other online storage service if the results are more than a few megabytes compressed.

Greg: I don't know how many hits we should look for. Nobody has ever done a massively parallel polynomial search, as in throwing BOINC at it for months. For RSA768 we spent three months and collected something like 15 million stage 1 hits, and the best resulting polynomial was just about as good as the one actually used for the sieving. For RSA896 the current code divides the search space for each leading coefficient into 46 pieces, so searching one of those only exhausts 2% of the available search space. A trillion hits would probably be too much to deal with, though it would be awesome :)

 frmky 2012-11-20 18:10

What's the Murphy e of their current best?

 jasonp 2012-11-20 18:26

4.8e-19; Paul tells me that the best they expect is 20e-19.

 frmky 2012-11-20 19:35

I ran stage 2 on a single core overnight, and the best was 1.704e-19.

 RichD 2012-11-22 19:05

Thanks to Jason's help I've got a run started at 5*10^13.

It looks like I am getting around ten per minute in this range.

Correction: Better than 25/min. (I can't divide...)

 frmky 2012-11-23 08:15

Great minds think alike. I started one GPU at 10^13, same as Jason, and the second at 5*10^13, same as you. :smile:

So far it's done 10^13 to 10^13+2.5*10^8 with 131k hits and 5*10^13 to 5*10^13+3*10^8 with 152k hits.

 RichD 2012-11-23 19:17

Update

Just over 24 hours I am approaching 38k but still just under
5*10^13 + 1*10^8.

 poily 2012-11-28 17:57

I got 417K stage1 hits for
[CODE]
msieve -np1 "stage1_norm=1e33 100000000000000,1000000000000000"[/CODE]
should I run stage 2 on them or better give them to somebody?

 jasonp 2012-11-29 01:24

Maybe post to rapidshare or some similar storage hosting?

 poily 2012-11-29 14:03

Ok, here's my rsa896 stage1 data for [URL="http://depositfiles.com/files/c12l1rwrj"]1e13[/URL] and [URL="http://depositfiles.com/files/wxwee5w7d"]1e14[/URL]. I'll try to run stage2 on part of it.

 jasonp 2012-11-29 14:42

Okay. When running stage 2, you will save a lot of time if you first run with

-nps "stage2_norm=1e100"

so that only the size optimization runs for every hit. On my machine running the size optimization on the 250k hits I have took about 36 hours. Then sort the resulting .ms file in order of increasing score (which is the last field in each line) and just run msieve with -npr on the few lines in the .ms file that have the smallest score, perhaps only the 1000 best hits. The root sieve runs faster with a tight bound on the stage 2 norm, so try

-npr "stage2_norm=1e37"

or possibly smaller if that winds up taking too long. We've found that if you run all of stage 2 on every hit you spend the majority of the time running the root sieve on hits that will never generate very good polynomials, and the root sieve can take minutes for a single hit (where the size optimization takes half a second)

 poily 2012-11-30 12:53

Stage 2 finished, the best 10 polynomials murphy_e ranges in 2.348e-19 - 2.129e-19. Not very impressing.

 jasonp 2012-11-30 16:42

I ran stage 2 on 250k hits computed by RichD and myself, the best e-value was 2.0e-19, so not that great. Shi Bai ran the CADO stage 2 on the same dataset and has gotten better high scores (I think the best so far was 3.2e-19)

 Dubslow 2012-11-30 19:44

[QUOTE=jasonp;320043]I ran stage 2 on 250k hits computed by RichD and myself, the best e-value was 2.0e-19, so not that great. Shi Bai ran the CADO stage 2 on the same dataset and has gotten better high scores (I think the best so far was 3.2e-19)[/QUOTE]

As I recall from past comparisons (B200?), msieve typically has a ~10% lower score due to your more sophisticated integrator. How much of that better score is due to the differences in integration?

 jasonp 2012-12-01 13:07

I was going to write that most of the time the e-values calculated by the two suites differ by less than 1%, and larger differences were actually because of bugs in Msieve that miscalculated the alpha value. But then to make sure I ran the best CADO polynomial through Msieve, and the difference in e-values was 4.78e-19 vs 3.493e-19!

For the record, the best polynomial I could find in the batch of 260k hits was
[code]
R0: -5876926706329267758590334567904669751467577
R1: 869332622169838859059
A0: 15018543190390770338953421520339801353757924940775872204660800
A1: 909208080136930262159142936022283921496280008021988680
A2: -13303473065609161166913597414184875440269406556
A3: -138145188131827120402436868843139274434
A4: 763752915545626483996372079531
A5: 4312543659621449260154
A6: 10000466830200
skew 133010468.05, size 3.823e-020, alpha -12.788, combined = 2.080e-019 rroots = 4
[/code]

 poily 2012-12-01 14:25

Here's my slightly better (according to msieve) polynomial
[CODE]
SKEW 28338177.70
R0 -3626941552197564826492128852700460060642852
R1 48957407582916194761589
A0 -26101732745933806144485280988037796254029367990700826499285
A1 -2408464141902741608017790242140644715688145388490381
A2 -174314770228113076791419006080421369720970639
A3 28697722660097589508721192118263624637
A4 1036456362256909021188219944324
A5 -21292410351587764080336
A6 181000001476800
skew 28338177.70, size 4.381e-20, alpha -12.189, combined = 2.348e-19 rroots = 2
[/CODE]

 poily 2012-12-04 09:59

I got it by myself: CADO's E.sage gives 1.84701205471987e-19 for your and 2.08696883313422e-19 for my polynomial.

 jasonp 2012-12-04 12:19

The difference is that the CADO tools compute the size score by integrating in radial coordinates, whereas Msieve starts in radial coordinates and switches to computing the integral in rectangular coordinates. They don't compute the same numbers in general.

Msieve uses the rectangular integral because pol51opt did, and the GGNFS scripts had a table of precomputed 'good' scores to shoot for. Nonetheless, the radial score is more robust for some reason, i.e. it finds a better minimum a lot of the time.

 frmky 2012-12-05 07:05

Here are approx. 1.3 million stage 1 hits:
[URL="https://www.dropbox.com/s/spssx3ze8iagvpn/rsa896_1.dat.m.gz"]https://www.dropbox.com/s/spssx3ze8iagvpn/rsa896_1.dat.m.gz[/URL]

Edit: I restarted the stage 1 search at different values. I'm now running the size optimization on this file now, and I will run the root sieve on those with the smallest score.

 poily 2012-12-05 09:43

So, the best murphy_e we have so far is about 2.5e-16, I wonder how CADO guys got murphy_e 2x larger than ours? Either their polynomial is quite extraordinary or they use different optimization algorihtms.

 henryzz 2012-12-05 17:49

Is it possible we should run msieve stage 2 on their stage 1 results or their stage 2 on our stage 1 results?

 jasonp 2012-12-06 03:00

I don't have access to the CADO stage 1 hits. We've collected 2.7M hits of our own, and they will all have stage 2 performed by both packages.

Keep 'em coming!

Ilya, I suspect a lot of the difference boils down to the more effective root sieve developed by Shi Bai for the CADO stage 2. From what I've seen it consistently produces better alpha scores without sacrificing as much polynomial size as Msieve needs.

 frmky 2012-12-06 05:35

[QUOTE=jasonp;320654]the more effective root sieve developed by Shi Bai[/QUOTE]
Steal it and put it in msieve? :devil:

 frmky 2012-12-11 22:07

From my 1.3 million stage 1 hits, so far I've gotten a

# norm 4.371817e-20 alpha -11.383344 e 2.347e-19 rroots 4

The run continues...

 jasonp 2012-12-12 01:49

I sent a batch of hits computed with a cutoff of 5e33 instead of the previous 1e33, and apparently the polynomials generated with the looser cutoff are similar in quality. So if you use 5e33 for the stage1_norm you'll generate hits about 5x faster.

 pinhodecarlos 2012-12-12 09:17

[QUOTE=jasonp;318710]and then run

msieve -v -np1 "stage1_norm=[B]5[/B]e33 X,Y"

where X and Y is a range of numbers with magnitude 10^12 to 10^15. [/QUOTE]

So it's better to update first post.

 bai 2012-12-13 23:52

[QUOTE=poily;320153]Here's my slightly better (according to msieve) polynomial
[CODE]
SKEW 28338177.70
R0 -3626941552197564826492128852700460060642852
R1 48957407582916194761589
A0 -26101732745933806144485280988037796254029367990700826499285
A1 -2408464141902741608017790242140644715688145388490381
A2 -174314770228113076791419006080421369720970639
A3 28697722660097589508721192118263624637
A4 1036456362256909021188219944324
A5 -21292410351587764080336
A6 181000001476800
skew 28338177.70, size 4.381e-20, alpha -12.189, combined = 2.348e-19 rroots = 2
[/CODE]

[CODE]
Y1: 48957407582916194761589
Y0: -3626941552197564826492128852700460060642852
c6: 181000001476800
c5: -21292410351587764080336
c4: 1036456362256909021188219944324
c3: 28697722660097589508721192118263624637
c2: -174314770228113076791419006080421369720970639
c1: -2408464141902741608017790242140644715688145388490381
c0: -26101732745933806144485280988037796254029367990700826499285
skew: 20504576.000
# lognorm: 83.45, alpha: -12.19 (proj: -3.18), E: 71.26, nr: 2
# MurphyE(Bf=10000000,Bg=5000000,area=1.00e+16)=3.39e-19
[/CODE]

 jasonp 2012-12-14 01:30

Msieve computes an E-value of 2.339e-19 when using the skew computed by CADO. Would it be worthwhile to find out what causes such big differences?

 poily 2012-12-14 09:31

Ok, [URL="http://depositfiles.com/files/bopjglvg4"]here[/URL] are about 1.8M hits for the new bound 5e33. It seems the tighter bound was better: with 1e33 as a bound on stage 1 after size optimizations I saw norms of about Xe32 whereas the new bound gives only Xe33 and worse. The best e for this pack was about 2.2e-19.

Bai, I used E.sage from the public CADO repository and got almost the same Murphy e values as msieve shows. Do you compute it somehow different now?

 Dubslow 2012-12-14 16:13

[QUOTE=jasonp;321620]Would it be worthwhile to find out what causes such big differences?[/QUOTE]

As a bystander who has no knowledge of either how to go about such finding out or how much work it might be, I would be very interested in the results. :popcorn: (poily's post directly above this makes it all the more interesting!)

 debrouxl 2012-12-14 16:26

1 Attachment(s)
For the purposes of generating several hits with CPUs (as I can't currently use my GPU...) more quickly, I resurrected poily's MPI polsel patch, from [url]http://www.mersenneforum.org/showpost.php?p=304639&postcount=9[/url] .
Mainline msieve changed somewhat since then (e.g. separation of the size optimization and the root sieve), so I started by upgrading and expanding the additions to the code; I also changed whitespace to follow the coding convention of the surrounding code more closely :smile:

That allowed the computer to produce a number of hits with:
\$ mpirun -n 6 msieve -np1 "stage1_norm=2e33 1100000000000,1100010000000" -v

I think it would be great to see the MPI polsel patch integrated to mainline msieve. This way, every user of multi-threaded polsel (even on a single computer) would benefit without having to apply (and upgrade) an out of tree patch :smile:
However, AFAICS, the patch requires improvements before this can happen. Indeed, I noticed that when a child process has finished working ("polynomial selection complete"), it sticks its core to 100% usage. One of the processes found no suitable leading coefficient in its range and gave up on polsel immediately; after ~2h47 of CPU time wasted by that one, soon after another process had exhausted its range as well, I tried to kill one of the inactive processes... which killed all of its siblings, as I should have expected...
293 stage 1 hits generated in ~10h40 CPU time on Xeon E3-1230 @ 3.2 GHz.

I have zero MPI experience, and therefore, no clue how to make finished jobs not waste significant CPU power :wink:

 jasonp 2012-12-14 18:01

The patch makes all the MPI processes wait at a barrier when poly selection is finishing, and my experience with OpenMPI is that this is a computationally expensive operation. I suspect that's what is burning up 100% CPU.

Unfortunately there's no real way around that, if you wanted processes to stop waiting on each other then you wouldn't be using barrier synchronization; put another way, MPI is a poor substitute for a distributed client-server architecture :)

 firejuggler 2012-12-14 20:22

1 Attachment(s)
ran a 15 min 'search' on a 560, got 227 hit, best poly

[code]
Fri Dec 14 21:18:19 2012 Msieve v. 1.51 (SVN 766)
Fri Dec 14 21:18:19 2012 random seeds: ccf343ec 7dcd68de
Fri Dec 14 21:18:19 2012 factoring 412023436986659543855531365332575948179811699844327982845455626433876445565248426198098870423161841879261420247188869492560931776375033421130982397485150944909106910269861031862704114880866970564902903653658867433731720813104105190864254793282601391257624033946373269391 (270 digits)
Fri Dec 14 21:18:21 2012 no P-1/P+1/ECM available, skipping
Fri Dec 14 21:18:21 2012 commencing number field sieve (270-digit input)
Fri Dec 14 21:18:21 2012 commencing number field sieve polynomial selection
Fri Dec 14 21:18:21 2012 polynomial degree: 6
Fri Dec 14 21:18:21 2012 max stage 1 norm: 1.08e+035
Fri Dec 14 21:18:21 2012 max stage 2 norm: 1.05e+035
Fri Dec 14 21:18:21 2012 min E-value: 0.00e+000
Fri Dec 14 21:18:21 2012 poly select deadline: 1079999
Fri Dec 14 21:20:02 2012 polynomial selection complete
Fri Dec 14 21:20:02 2012 R0: -3334009071282816277832905784211277620964254
Fri Dec 14 21:20:02 2012 R1: 100281358245707420758237
Fri Dec 14 21:20:02 2012 A0: 372828901627475347894334652824133151170797461276260673
Fri Dec 14 21:20:02 2012 A1: 77826863799406908305447316219479205604506482315
Fri Dec 14 21:20:02 2012 A2: -8440449792077225990613216882014698768493181
Fri Dec 14 21:20:02 2012 A3: -434343399475320970345769932599595433
Fri Dec 14 21:20:02 2012 A4: 27102576369602076901509411656588
Fri Dec 14 21:20:02 2012 A5: 139702468047107462100798
Fri Dec 14 21:20:02 2012 A6: 300000002082000
Fri Dec 14 21:20:02 2012 skew 1498026.43, size 1.811e-020, alpha -8.436, combined = 1.092e-019 rroots = 4
[/code]
(yeah I know, score is pretty bad, but thats a 15 min run)
and the msieve.dat.m file to be run in cado

 debrouxl 2012-12-14 21:08

firejuggler: your best poly isn't good, it has pretty low E value :wink:

jasonp:
[quote]The patch makes all the MPI processes wait at a barrier when poly selection is finishing, and my experience with OpenMPI is that this is a computationally expensive operation. I suspect that's what is burning up 100% CPU.[/quote]
Meh.

[quote]Unfortunately there's no real way around that, if you wanted processes to stop waiting on each other then you wouldn't be using barrier synchronization; put another way, MPI is a poor substitute for a distributed client-server architecture :)[/quote]
Indeed.
Several weeks ago, for factoring a 512-bit RSA key under a "tight" deadline (well, I hadn't fully taken into account how much easier it had become since 2009, and the fact that in the end, I had 40 cores of varying power ^^), I spent several hours starting to assemble a simple client-server work reservation system, speaking HTTP GET & PUT, in Perl (CPAN lists a number of modules for dealing with HTTP).
The initial results were good, but the task was easy enough that I ended up just invoking ggnfs-lasieve4I14e in sh scripts instead of finishing the reservation system.

 firejuggler 2012-12-14 21:15

I know my poly isn't good, it's just a quasi-note to myself that 'I can do it'. I will make another run, a bit longer and see what come out of it.

 bai 2012-12-15 02:01

[QUOTE=poily;321633]
Bai, I used E.sage from the public CADO repository and got almost the same Murphy e values as msieve shows. Do you compute it somehow different now?[/QUOTE]

poily, thanks for noting it. Yes, there is some difference when computing the Dickman function. It might explain the difference between the E-score in msieve/cado when the skewness is large. I'll keep you updated.

Btw: on your 02 Dec 12 batch, we had a better poly in terms of cado E-score. What's its score in msieve?

[CODE]
Y1: 9931515945117268997291
Y0: -3136213832380525741156980143018262150146233
c6: 433000028380920
c5: -16322847463557673445083
c4: 259713520920507700049024012254
c3: 313991938835577956615333794603976847
c2: -745313305236683785443677828883521154358049
c1: -8720107683218999638110247380980954223758742214
c0: 177258901573455869908310457463881265392025516480999800
skew: 1650176.000
# lognorm: 79.02, alpha: -9.49 (proj: -2.31), E: 69.53, nr: 4
# MurphyE(Bf=10000000,Bg=5000000,area=1.00e+16)=4.33e-19
[/CODE]

Anyway, if we can observe a correlation between the scores in msieve and cado, then it's safe to just record a few top polynomials found by both.

 jasonp 2012-12-15 03:20

I get
[code]
skew 1650176.00, size 6.274e-20, alpha -9.489, combined = 3.183e-19 rroots = 4
[/code]

 poily 2012-12-15 13:52

I agree with debrouxl that the MPI-ed polynomial selection should be included in some form in the mainline msieve. I'm tired of doing the trivial patchwork for every new major msieve revision ;)

[QUOTE]The patch makes all the MPI processes wait at a barrier when poly selection is finishing, and my experience with OpenMPI is that this is a computationally expensive operation. I suspect that's what is burning up 100% CPU. [/QUOTE]
Actually looks like the problem is in MPI_Gather which is there to collect the results from all the processors and select the best after the polynomial selection is complete. I suggest we remove this stage at all making each of the MPI processes save it's own fb-file allowing the user to select the best.

[QUOTE]Btw: on your 02 Dec 12 batch, we had a better poly in terms of cado E-score.[/QUOTE]
Hmm, looks like CADO does better size optimization not only the root sieve. What was the original hit that produced that polynomial? And what size norm was it of after the size optimization? I wonder what msieve can do with it if we try to tune relevant parameters a bit.

[QUOTE]Anyway, if we can observe a correlation between the scores in msieve and cado, then it's safe to just record a few top polynomials found by both.[/QUOTE]
Tbh, I'd prefer both of the tools show the same values, it's exact science after all.

 debrouxl 2012-12-15 14:05

[quote][quote]The patch makes all the MPI processes wait at a barrier when poly selection is finishing, and my experience with OpenMPI is that this is a computationally expensive operation. I suspect that's what is burning up 100% CPU.[/quote]
Actually looks like the problem is in MPI_Gather which is there to collect the results from all the processors and select the best after the polynomial selection is complete. I suggest we remove this stage at all making each of the MPI processes save it's own fb-file allowing the user to select the best.[/quote]
Amusingly, that's what I had just done locally, by replacing #ifdef HAVE_MPI by #if 0 in the hunk modifying gnfs/poly/poly.c :smile:

 firejuggler 2012-12-15 17:33

Did a longer run, around 11k hits in 15 hours, will update when -nps "stage2_norm=1e100"
will have ran.
Is there an easy way to sort by field under windows? trying to do it with Librecalc/excel is doing poorly.

 jasonp 2012-12-15 17:44

Get a windows port of the unix sort utility, and then try

sort -g -k 11 <file>

to sort in ascending numerical order based on field 11 (the last)

 firejuggler 2012-12-15 18:54

1 Attachment(s)
ok... so cygwin is no good.
I have value from 1.007815e+34 to 1.348243e+36 and the sort don't give me the +34 and a handfull of +35 I need.
grblbl managed somehow with Librecalc, will post my hit. 11700+ hit, starting at 3e13

 poily 2012-12-17 09:05

[URL="http://depositfiles.com/files/tr0zv0mha"]This batch[/URL] resulted in the following best so far polynomial:
[CODE]Mon Dec 17 04:01:41 2012 R0: -5382714938488690441209883907451379700210334
Mon Dec 17 04:01:41 2012 R1: 16663482999754897882819
Mon Dec 17 04:01:41 2012 A0: -55149587676457221661976298276728946108324308303659979586785
Mon Dec 17 04:01:41 2012 A1: 2084363501398641733507654116431852917627220374513244
Mon Dec 17 04:01:41 2012 A2: 115328708937118208864934751144873150311335660
Mon Dec 17 04:01:41 2012 A3: -3417129234272571170820957161968499682
Mon Dec 17 04:01:41 2012 A4: -7825764054023217154734470955
Mon Dec 17 04:01:41 2012 A5: 1089651465313173279574
Mon Dec 17 04:01:41 2012 A6: 16940008971600
Mon Dec 17 04:01:41 2012 skew 43447739.22, size 4.643e-20, alpha -11.010, combined = 2.483e-19 rroots = 2 [/CODE]

I'm still not sure how I've missed the 3.183e-19 one, maybe that's because I take only 50-100 size optimized values (with norms 1e32-*e33). I'll try to restart root optimization with 1000 best size norms (probably, 1e32-*e34).

 bai 2012-12-17 10:30

[QUOTE=poily;321737]
What was the original hit that produced that polynomial? And what size norm was it of after the size optimization? I wonder what msieve can do with it if we try to tune relevant parameters a bit.
[/QUOTE]

The raw polynomial is this one,
[CODE]
433000028380920 9931515945117268997291 3136213832380463342922092847039625251389592
[/CODE]

The size optimized polynomial in cado is,
[CODE]
Y1: 9931515945117268997291
Y0: -3136213832380526651668430475424366552787822
c6: 433000028380920
c5: -16561029521169279633163
c4: 267250423315714655877175299839
c3: 217372220497467849979460290451947353
c2: -818449055340804222519290511517032253432988
c1: -162909026138981074398068520700268950925476629841
c0: 139263538137801830327982923308744587295054517282499033
skew: 1619456.000
# lognorm: 78.88, alpha: -0.83 (proj: -2.31), E: 78.05, nr: 4
# MurphyE(Bf=10000000,Bg=5000000,area=1.00e+16)=3.54e-20
[/CODE]

 debrouxl 2012-12-28 07:20

1 Attachment(s)
I had a Xeon E3-1230 spend:
* more than a CPU-month on `msieve -np1` with norm 2e33, producing 13705 hits (attached) - I still can't use my GPU at the moment;
* a negligible amount of time on `msieve -nps` with norm 1e100;
* another CPU-month or so on `msieve -npr` for [I]all[/I] hits.

The second CPU month aimed at testing the updated version of poily's MPI patch (see post #42, with the modification mentioned in posts #49 and #50) for -nps + -npr - seems to work :smile:

The best polynomial produced by stage 2 is worse than the best hits posted earlier in this topic by a significant margin:
[code]# norm 3.453121e-20 alpha -10.768289 e 1.933e-19 rroots 2
skew: 113809421.14
c0: -777040021595652574801970561643527925908245389976845316223520
c1: -56689368246124675729566462839180299627446395002122964
c2: -1184036534621771421230391418974433516476565936
c3: 13945075463277536309744615078227615403
c4: 162343630391708833196793797373
c5: -762530856893636308781
c6: 1500024500100
Y0: -8062516096125044668396811397254458255803647
Y1: 34136309016942249143093[/code]I'll now make the computer run more stage 1.

 Dubslow 2012-12-28 07:41

Just so y'all know, I've been running the last few days with my [STRIKE]460[/STRIKE] 560. I started at 5e12 with norm 2e33, and am averaging around 25K hits per day (give or take a few thousand, it's just a rough estimate). I have ~85.8K total at the moment, and ~100 more every few minutes.

jasonp, perhaps worker's locations should be marked in the first thread? Even if it's hard to duplicate work of the same coefficient, it still seems to me like we'll get better results if we run as many coefficients as possible.

 frmky 2012-12-28 08:18

Here's another 2 million stage 1 hits with norm of 1e33. I've now restarted it with norm of 5e33.

[url]https://www.dropbox.com/s/vczyvz0bgmnrgvm/rsa896_2.dat.m.gz[/url]

 firejuggler 2012-12-31 22:00

running msieve -v -np1 "stage1_norm=5e33 2345678901234,20000000000000" will post my hits/optimised once I get back home.

 Dubslow 2012-12-31 23:39

How many S1 hits have there been, either from this thread alone or total? I'd estimate we're somewhere between 3-5 million, with Greg doing most of it?

I have ~180K hits [URL="http://dubslow.tk/random/batch1.gz"]here[/URL] (4.7 MB) (I would run size opt myself but my CPU's all tied up ATM).

 RichD 2013-01-01 00:51

I've sent jasonp, via PM, just under 4M.

 Dubslow 2013-01-01 02:24

[QUOTE=RichD;323256]I've sent jasonp, via PM, just under 4M.[/QUOTE]

Sweet! I guess there's more than one big player around here :smile:

So we're easily north of 5M, probably somewhere near 10M?

 debrouxl 2013-01-01 08:51

By tomorrow, I'll have another dozen thousand CPU stage 1 hits (norm 2e33) from MPI msieve.

And yeah, we're probably around 10M hits now :smile:

 firejuggler 2013-01-01 18:43

1 Attachment(s)
168xx hits min norm seem to be @ 4.xxx +034 but I may have done a bad job at optimising

 RichD 2013-01-02 05:24

[QUOTE=Dubslow;323262]... I guess there's more than one big player around here ... probably somewhere near 10M?[/QUOTE]

Not really a big player, just consistent. Been running since my first post.

[QUOTE=jasonp;319061]A trillion hits would probably be too much to deal with, though it would be awesome :)[/QUOTE]

 jasonp 2013-01-02 12:10

So far the total contributions amount to 10.4M hits. Keep 'em coming!

Edit: by comparison, the few months that everyone spent on RSA768 produced 30 million hits, the vast majority of which came from CPU Msieve.

 firejuggler 2013-01-02 17:44

playing with frmky's last upload (only half thru it)
poly is
23000623460100 -4200026754030636886542 325872963021220734540680389841 71104560237504929295111562904399389 -2515044822412759992774011178811387566386570 -214938857506910312002364866202388589245612515363 2518563929690957541751459455188308671386866572717152336 53041291506298756714237 -5115212953529178920472571451294711821103119 -2.09 5.249183e+033
[code]

Wed Jan 02 07:30:03 2013 Msieve v. 1.51 (SVN 766)
Wed Jan 02 07:30:03 2013 random seeds: 6d21a47c 93a68ff0
Wed Jan 02 07:30:03 2013 factoring 412023436986659543855531365332575948179811699844327982845455626433876445565248426198098870423161841879261420247188869492560931776375033421130982397485150944909106910269861031862704114880866970564902903653658867433731720813104105190864254793282601391257624033946373269391 (270 digits)
Wed Jan 02 07:30:05 2013 no P-1/P+1/ECM available, skipping
Wed Jan 02 07:30:05 2013 commencing number field sieve (270-digit input)
Wed Jan 02 07:30:05 2013 commencing number field sieve polynomial selection
Wed Jan 02 07:30:05 2013 polynomial degree: 6
Wed Jan 02 07:30:05 2013 max stage 1 norm: 1.08e+035
Wed Jan 02 07:30:05 2013 max stage 2 norm: 2.46e+035
Wed Jan 02 07:30:05 2013 min E-value: 0.00e+000
Wed Jan 02 07:30:05 2013 poly select deadline: 1079999
Wed Jan 02 07:41:41 2013 polynomial selection complete
Wed Jan 02 07:41:41 2013 R0: -5115212953529306844186435346484985056758031
Wed Jan 02 07:41:41 2013 R1: 53041291506298756714237
Wed Jan 02 07:41:41 2013 A0: -379896940535659287453260136796270948676592389901631962720
Wed Jan 02 07:41:41 2013 A1: -356485807960054857061300970481298628845280475487976
Wed Jan 02 07:41:41 2013 A2: -88244722381684984039807462984918078359735096
Wed Jan 02 07:41:41 2013 A3: -3323380506955608894448655022916446492
Wed Jan 02 07:41:41 2013 A4: 378527384940320290220809646801
Wed Jan 02 07:41:41 2013 A5: -4532860863907273712142
Wed Jan 02 07:41:41 2013 A6: 23000623460100
Wed Jan 02 07:41:41 2013 skew 18516337.01, size 4.775e-020, alpha -10.361, combined = 2.509e-019 rroots = 4
[/code]

 debrouxl 2013-01-02 18:43

1 Attachment(s)
11752 new stage 1 hits with norm 2e33, produced by MPI-patched CPU msieve.

 firejuggler 2013-01-02 19:30

poly was
[code]
23000623460100 53041291506298756714237 5115212953527564653744592445975459711681095
[/code]

 debrouxl 2013-01-07 07:21

1 Attachment(s)
40197 new stage 1 hits, most of them generated by GPU msieve on my GT540M, and the rest by MPI-patched CPU msieve.

 debrouxl 2013-01-07 19:10

I've run size optimization on my latest 40197 hits (a number of "line optimization failed" messages), and I'm now running root optimization in ascending order of stage 1 norm (e+33 and e+34 values, less than 1% of all size-optimized hits).
The norms, alphas and e values tend to be decreasing on average, and I haven't seen a 3e-20 norm for a while, so chances are that the best potential hits have already been found...
The best polynomial so far is better than my previous best, but still much worse than the best ones posted here:
[code]# norm 3.928016e-20 alpha -11.376284 e 2.143e-19 rroots 4
skew: 47860990.01
c0: 102243377053834035977456615540543318152506688915339403180155
c1: -6607905899690278300548521267259375978782553228004647
c2: -990115457387976288997208498176855800512746728
c3: 2374578563317318524203706157960245520
c4: 115319739334363910337347697197
c5: -3817121901056644507273
c6: 45678923690400
Y0: -4562480378270649518664783149805881289770388
Y1: 35998309767524359673137[/code]

But the above polynomial is, in fact, not the main reason why I'm writing this post. I'm writing it because, in the output, I noticed several outliers with shorter c0, very low skew, and terrible indicated e values, despite norms comparable to those of the "normal" polynomials.
Here are several examples:
[code]# norm 2.372578e-20 alpha -10.426309 e 1.964e-20 rroots 2
skew: 1197.46
c0: 18481412854226641032415109537792402344324145678149569243
c1: -412818324451811043545670157890441895685514959354757096
c2: 2456685007867823022280006789699102572790645669
c3: -66233377310249932462663890531016567923
c4: 496905266675509984230728250988
c5: -1336373776546492878581
c6: 1500024500100
Y0: -8062516096127221174983998069912229229885966
Y1: 34136309016942249143093

# norm 2.368751e-20 alpha -10.846135 e 2.245e-20 rroots 4
skew: 1377.05
c0: -2642388665373048624150808121402634764591949455367447825
c1: 438501348563241987768009661055349213982843345699856290
c2: -4693749974898874734068961081565468612650025973
c3: -186189783670734795089307249827424690087
c4: 2924675753151214297925870173078
c5: -12806778906904494395723
c6: 23456710532640
Y0: -5098500528781644024246804152176670963216164
Y1: 22196851711482804191917

# norm 3.036890e-20 alpha -11.223265 e 2.666e-20 rroots 4
skew: 1334.97
c0: -28576717055994791421425058746265638291236302755674259745
c1: 222801134536579704276697225767050208595951749040306072
c2: 1398221794954370584472235616298871514610192843
c3: -124644134602986531319500832562710506959
c4: 690941799193854044069220331794
c5: 1776275142072176247887
c6: 1900035882108
Y0: -7751040741275170619614657060899972044441224
Y1: 109564369221121497194087

# norm 2.288852e-20 alpha -10.619501 e 2.490e-20 rroots 2
skew: 1766.88
c0: 1393418822768280472160344800352078143099930198234976875
c1: 85215789369701604171965788635140830817277453757134150
c2: -368799273673517383122693984287542029297070690
c3: -73498701786682246657794324606801605568
c4: 1583577793726330690346273176767
c5: 2617296398705592280666
c6: 1800000012120
Y0: -7821227202316007911700848546218259081952266
Y1: 240333343470261878614787

# norm 2.319390e-20 alpha -10.667771 e 1.962e-20 rroots 4
skew: 1197.42
c0: -60297133975171382084881582505263853794821480445420380429
c1: 735812972956460443707231473680530061138485617982676392
c2: 5483937239726035047381552949647510651082363571
c3: -183781698612356696913135396595290025285
c4: 443631729780461087297695693254
c5: 1570418878625124677333
c6: 2300058204444
Y0: -7508108080138893173363569991159709738912898
Y1: 42471148941794878087325
[/code]
Where do these come from ?

 Dubslow 2013-01-09 00:35

[URL="http://dubslow.tk/random/rsa.dat.gz"]~221K[/URL] hits (~5.7MB) with a 2e33 norm. I think I'll call it quits for now and go play with mfaktc 0.20 :smile:

 jasonp 2013-01-09 01:38

The 'norm' value is a numerically-computed integral that does not depend on translation and skew, so it is a more approximate measure of polynomial quality than the Murphy E score.

Usually when the low-order coefficients are extremely small it is an indication that the size optimization cannot do very much with the hit from stage 1; my experience is that you can force the skew to be small and can find a weak local optimum in the neighborhood of the original hit, but better optima are almost always very far away.

Dubslow, thanks for the contributions!

 firejuggler 2013-01-10 21:30

After collecting a few stage 1 hit, and meddling with the second stage, i'm left wondering why, in the first part of the second stage, we allow such a high norm.From what I seen, unless your norm is below 5e34 you won't have a final murphy score above 1e-19.
It make no sense to allow a "stage2_norm=1e100". Now, if there is a good explanation, I'm all ears.

 jasonp 2013-01-10 23:09

The stage 2 norm is set deliberately too large so that all the outputs from the size optimization make it into the output file. That's really just so we can get a statistical idea of how rare it is to get a polynomial with good size. Of course, if you then ran the root optimization on everything it's a disastrously bad idea.

 debrouxl 2013-01-11 07:38

[quote]Of course, if you then ran the root optimization on everything it's a disastrously bad idea.[/quote]
Indeed, firejuggler, unless it's for testing purposes (e.g. the MPI patch), as ~99% of the size-optimized hits have no chance of producing a good full polynomial (whatever the size of the number to be factored, it seems - we get similar ~1% ratios for C15x-C16x and the RSA-896 C270).

 frmky 2013-01-29 22:24

Here's another 6 million hits: [url]https://www.dropbox.com/s/9j8oml7r5hthx6g/rsa896_3.dat.m.gz[/url]

 Stargate38 2013-02-03 18:04

How many hits are needed? What's the ETA after the required number of hits is reached?

 jasonp 2013-02-03 19:14

The answer to your first question is that nobody knows. On theoretical grounds we're supposed to be able to find polynomials that are drastically better than what we've been seeing, so if the 'time constant' of the search is measured by the time it takes to get really lucky once, then we've barely started. The previous search for RSA768 generated 30 million hits, one of which turned into a polynomial whose performance was indistinguishable from the one actually used in 2008-2010

My archives now contain 21 million hits (thanks everybody!), and Paul's group has posted 18 million more found with CADO NFS (they have a large number of CPUs searching). All of the hits we've found have passed through the CADO size optimization (except Greg's large batch above), and though we've found some polynomials in their top 10 we haven't gotten near their current best.

That being said, only a little of the current dataset (perhaps 5%) has passed through Msieve's size optimization. Could I persuade everyone to take a break and run the size optimization on the 39 million hits we have? This uses CPU only, and the current code can optimize a single polynomial in about 0.5 seconds, so we're looking at about 30 CPU-weeks of work to isolate the ~5000 best-scoring polynomials. Running the root sieve on those will take about 5 CPU days.

To avoid me having to upload a blizzard of files, can I get 8 CPUs worth of volunteers? This job will be much easier if you can use unix text-processing tools, and I'll tell you how to run Msieve and postprocess the output.

(Paul's group estimated that their best polynomial as of a few months ago would require 35000 CPU-years to complete the sieving)

 debrouxl 2013-02-03 19:33

I can use 4 hyperthreads of the usual Xeon E3-1230 through MPI-patched root optimization.

 Dubslow 2013-02-03 20:16

I can throw in 1-2 quad core Sandy Britches (in a few days). GNU-Linux of course :smile: (I had been looking for a next project to keep my cores busy, good timing :smile:)

 WraithX 2013-02-03 21:15

[QUOTE=jasonp;327353]To avoid me having to upload a blizzard of files, can I get 8 CPUs worth of volunteers? This job will be much easier if you can use unix text-processing tools, and I'll tell you how to run Msieve and postprocess the output.[/QUOTE]

I have a 12-core (dual 6-core Xeon 5645) linux computer that I can dedicate to this. I'd be happy to help out.

 jasonp 2013-02-03 22:16

- unzip and rename to msieve.dat.m
- with worktodo.ini set for RSA896, run
[code]
msieve -v -i <rsa896 worktodo> -nps "stage2_norm=1e100"
[/code]
- the resulting output file is msieve.dat.ms; find the best size scores with
[code]
sort -g -k 11 msieve.dat.ms | head -5000 > sizeopt.out
[/code]
(Note for windows that this is the unix sort, not the crappy MS sort)
If you want to manually split the file to run across multiple cores, rename each piece to <something_unique>.m and add '-s <something_unique>' to the msieve command line. The optimized output will be in <something_unique>.ms

- send me your top hits somehow. If you can't stand the suspense, rename your list of best hits to <something_unique>.ms and run the root sieve yourself with
[code]
msieve -v -i <rsa896 worktodo> -npr -s <something_unique>
[/code]
Note that it isn't necessary to run the root sieve on all 5000 candidates; the odds are overwhelming that the best result your batch produces will come from one of the top 100 size-optimized polynomials. Also, note that the latest Msieve SVN has an adjustment to the alpha computation that will bias the E-value downwards, but the modified score exactly matches what the CADO tools report.

I'll update this post with the other files (8 total). Each file has 5M hits, and would take about 30 days on one core.

WraithX: Greg's pile in post #110

 Dubslow 2013-02-03 23:19

Having succesfully started the large LA with many less threads than I initially thought, I can now put at least 4 full cores on this right now, with maybe a few extra threads here and there. (Well, after the super bowl :smile:)

 firejuggler 2013-02-03 23:46

i'll claim the third file if nobody else want it

 Dubslow 2013-02-03 23:52

I certainly do want it :razz::smile:

 WraithX 2013-02-04 04:35

I'll take number 5 too.

 Dubslow 2013-02-04 06:13

If I may, I would recommend [URL="http://filesmelt.com/"]filesmelt.com[/URL] for any more of these transfers. It's completely free, and more importantly, is free of any intrusive ads, doesn't have any bandwidth restrictions, doesn't try to sign you up with your credit card, and doesn't force you to wait 60 seconds to begin the download (and you don't have to press like four buttons).

Edit: Easy way to split up the file into multiple chunks: `more +<1> <filename> | head -<2>`, where <1> is the beginning line number, and <2> is how many lines. So to split up a 1000 line file into 4 chunks, you'd do
[code]for num in 1 251 501 751; do
echo \$num
more +\$num <filename> | head -250 > <filename>.\$num
done[/code] (or something similar).

 xilman 2013-02-04 10:24

[QUOTE=Dubslow;327432]If I may, I would recommend [URL="http://filesmelt.com/"]filesmelt.com[/URL] for any more of these transfers. It's completely free, and more importantly, is free of any intrusive ads, doesn't have any bandwidth restrictions, doesn't try to sign you up with your credit card, and doesn't force you to wait 60 seconds to begin the download (and you don't have to press like four buttons).

Edit: Easy way to split up the file into multiple chunks: `more +<1> <filename> | head -<2>`, where <1> is the beginning line number, and <2> is how many lines. So to split up a 1000 line file into 4 chunks, you'd do
[code]for num in 1 251 501 751; do
echo \$num
more +\$num <filename> | head -250 > <filename>.\$num
done[/code] (or something similar).[/QUOTE]
Even easier, and works for aribtrary values of 1000:
[code]split -l 250 <filename>
[/code] if you want an arbitrary number of at most 250-line files or
[code]split -n 4 <filename>[/code] if you want precisely four output files.

 poily 2013-02-04 10:55

1 Attachment(s)
I've attached modified msieve MPI polynomial selection patch so you guys wouldn't have to manually split the input files.

jasonp, can you post links to all the batches left?

 jasonp 2013-02-04 12:22

Okay, everything is up (I apparently don't have the upload capacity to be a good pirate :)

 poily 2013-02-04 13:09

I took batch 7.

 debrouxl 2013-02-04 19:36

Taking batch 8 as well, I'll try to run it on 8 cores of FX-8150.

 Dubslow 2013-02-04 19:37

[QUOTE=xilman;327443]Even easier, and works for aribtrary values of 1000:
[code]split -l 250 <filename>
[/code] if you want an arbitrary number of at most 250-line files or
[code]split -n 4 <filename>[/code] if you want precisely four output files.[/QUOTE]

Ah, gracias! The old "post something bad enough to be corrected" trick has worked again! :smile: (It wasn't deliberate though :razz:)

 firejuggler 2013-02-04 20:06

Nobody took batch 6 yet.just saying it.

and for MinGW32, the line is
[code]
split -b 100m msieve.dat.m
[/code]
to split the file in 100Mb chunk

 frmky 2013-02-04 23:48

Just a warning, I plan to upload another batch of 6 million or so stage 1 hits around the end of the week.

 WraithX 2013-02-05 00:13

[QUOTE=frmky;327577]Just a warning, I plan to upload another batch of 6 million or so stage 1 hits around the end of the week.[/QUOTE]

That should be fine. I should be done with my two by then, and can pick up Batch 6, plus this new one.

All times are UTC. The time now is 04:41.