Call for volunteers: RSA896
As of SVN823 the Msieve code is almost stable enough to consider a release of v1.51 (the only thing left is deciding what to do with the windows build projects, which will need a lot of fixing after the recent GPU overhaul). [url="www.boo.net/~jasonp/msieve_svn823.zip"]A win32 binary[/url] is available for your perusal.
As a largerscale test of the code, I propose we run polynomial selection on RSA896. Paul Zimmermann reports his group still has a lot of work left with this task, and using a GPU is much more effective on this size problem than using a CPU. So if you have a graphics card and at least version 4.2 of the Nvidia drivers, consider running a range of poly selection stage 1 for RSA896 with me (I've been at it for 2 weeks now). More specifically, write the following in a file worktodo.ini: [code] 412023436986659543855531365332575948179811699844327982845455626433876445565248426198098870423161841879261420247188869492560931776375033421130982397485150944909106910269861031862704114880866970564902903653658867433731720813104105190864254793282601391257624033946373269391 [/code] and then run msieve v np1 "stage1_norm=5e33 X,Y" where X and Y is a range of numbers with magnitude 10^12 to 10^15. You should pick a fairly large range since the code will skip over many leading coefficients that are not smooth enough, and multiple machines (say up to 5) can search the same range with a high probability of not duplicating any work. With a GTS450 card (which is pretty lowend) I get about 80000 hits per week, which is about 2MB zipped. So you will periodically want to interrupt a run in progress, restart with X set to start above the last coefficient searched, and msieve.dat.m moved to another filename. The CPU utilization will be very low while a run is in progress, but interactivity in my winXP GUI is noticeably worse, which will be annoying. The binary can be run multithreaded but it does not make a difference with such large problems. Please post here if you have problems or questions. I've only had time to test the GPUspecific engine for Fermi cards, custom engines exist for older cards but I have no idea if they work. Some additional pointers: First, X and Y have to be in decimal, not in scientific notation like older versions used to allow. Also, if building from SVN make sure to use the trunk and not the msieve_faststage1 branch; that branch is now old. 
So you don't recommend using only CPU.

In my tests, the GPU code is perhaps 100x faster than the CPU code. The CPU code is not multithreadaware but could benefit from multithreading. Actually the CPU code needs to be able to use a thread pool and highperformance radix sorting just like the GPU code does now, after which the CPU and GPU code could probably be mostly unified together. But we're not there yet, and could probably buy a 10x speedup in the CPU code by doing so.
Given a choice between a CPU run and nothing, then obviously a CPU run would be better. But currently the situation with stage 1 polynomial selection is exactly analogous to Mersenne trial factoring, visavis the GPU speedup. 
[QUOTE=jasonp;318747]In my tests, the GPU code is perhaps 100x faster than the CPU code. [/QUOTE]
How many cores has your CPU? 
This is 1 core of a Core2 from 2007 vs a lowend Fermi card from 2010. The GPU finds about one hit every 510 seconds, and now I see that the CPU code found two hits after 24 minutes. Neither codebase is optimal, neither set of hardware is very fast in absolute terms and the search itself can find many times more hits if the cutoff for hit quality is loosened. The CPU code would certainly speed up linearly if you threw more cores at it. But currently GPUs have a huge performance advantage when running this algorithm.

BOINC wrapper integration would enable us to explode the CPU+GPU power working on polynomial selection for RSA896. I should dig again into my mailbox for the discussions about that, though I don't have time to help with it right now...

[QUOTE=jasonp;318710]With a GTS450 card (which is pretty lowend) I get about 80000 hits per week, which is about 2MB zipped.[/QUOTE]
I started at 10^13. On the GTX 480, I got 229 hits in 8 minutes, which extrapolates to a bit under 300000 per week. How many are you wanting to collect? 
[QUOTE=frmky;318965]I started at 10^13. On the GTX 480, I got 229 hits in 8 minutes, which extrapolates to a bit under 300000 per week. How many are you wanting to collect?[/QUOTE]
Sheesh, that's a lot. :smile: Perhaps we should lower the norm to (e.g.) 5e32. How will nps and npr be run? Should we run the former our selves, as had previously been suggested for BOINCification? frmky, how fast are you traversing leadingcoeff ranges? I plan to participate, once a) I'm back at school and b) I get my recently RMAd 460. 
I've computed about 65000 hits so far, and Paul reports computing 500k hits with CADONFS. I'm currently covering the 1e13 range as well, so we will probably have a little duplication. The code selects very few leading coefficients in this range, after two weeks I'm up to around (1e13+1e9).
I'm also being lazy, in that Paul's group can turn around stage 2 pretty quickly so I'm not bothering with it. A BOINC project would definitely need to do its own size optimization, since I'd expect thousands of good sizeoptimized polynomials out of tens of millions of stage 1 hits. The upper limit is a good question; we would want millions of hits at least. 
It's a big project, but if we only want millions of hits I'm not sure it's big or long term enough for BOINC. With two GPU's I will be getting over a half million per week.
Starting at 10^13, the leading coefficient has advanced by 21 million in 6.5 hours, with a bit over 10000 hits. Extrapolating, if we want to cover the entire range, this will take 35000 GTX480years and yield a halftrillion hits totally 12 terabytes. To put this in perspective, NFS@Home uses about 1.5 CPUyears each day. 
[QUOTE=jasonp;318995]I've computed about 65000 hits so far, and Paul reports computing 500k hits with CADONFS. I'm currently covering the 1e13 range as well, so we will probably have a little duplication. The code selects very few leading coefficients in this range, after two weeks I'm up to around (1e13+1e9).
I'm also being lazy, in that Paul's group can turn around stage 2 pretty quickly so I'm not bothering with it. A BOINC project would definitely need to do its own size optimization, since I'd expect thousands of good sizeoptimized polynomials out of tens of millions of stage 1 hits. The upper limit is a good question; we would want millions of hits at least.[/QUOTE] Does CADO support CUDA, or is yours the only such code? How does lowering the norm affect how many hits you need? How likely would you be to miss an unusuallygreat poly with too large a norm? It seems to me that running our own nps would, at the very least, cut down on how much data we need to transfer. In either case, do we just attach results here and Paul will download them or something? 
CADO does not support GPUs; other researchers have substituted their own GPU components for the corresponding pieces in CADONFS (i.e. make the Block Weidemann code use a GPU for matrix multiplies).
It's an open question whether lowering the target norm makes it easier to find good hits; it certainly drastically reduces the number of hits found, and you would think that it would help the size optimization deal with the loworder coefficients if the top few polynomial coefficients were much smaller than average. But I don't know that, so far all the hits I've found have led to polynomials a lot worse than their current best one. Delivering big batches of hits isn't a problem; I can take them by email or you can upload to Rapidshare or some other online storage service if the results are more than a few megabytes compressed. Greg: I don't know how many hits we should look for. Nobody has ever done a massively parallel polynomial search, as in throwing BOINC at it for months. For RSA768 we spent three months and collected something like 15 million stage 1 hits, and the best resulting polynomial was just about as good as the one actually used for the sieving. For RSA896 the current code divides the search space for each leading coefficient into 46 pieces, so searching one of those only exhausts 2% of the available search space. A trillion hits would probably be too much to deal with, though it would be awesome :) 
What's the Murphy e of their current best?

4.8e19; Paul tells me that the best they expect is 20e19.

I ran stage 2 on a single core overnight, and the best was 1.704e19.

Thanks to Jason's help I've got a run started at 5*10^13.
It looks like I am getting around ten per minute in this range. Correction: Better than 25/min. (I can't divide...) 
Great minds think alike. I started one GPU at 10^13, same as Jason, and the second at 5*10^13, same as you. :smile:
So far it's done 10^13 to 10^13+2.5*10^8 with 131k hits and 5*10^13 to 5*10^13+3*10^8 with 152k hits. 
Update
Just over 24 hours I am approaching 38k but still just under
5*10^13 + 1*10^8. 
I got 417K stage1 hits for
[CODE] msieve np1 "stage1_norm=1e33 100000000000000,1000000000000000"[/CODE] should I run stage 2 on them or better give them to somebody? 
Maybe post to rapidshare or some similar storage hosting?

Ok, here's my rsa896 stage1 data for [URL="http://depositfiles.com/files/c12l1rwrj"]1e13[/URL] and [URL="http://depositfiles.com/files/wxwee5w7d"]1e14[/URL]. I'll try to run stage2 on part of it.

Okay. When running stage 2, you will save a lot of time if you first run with
nps "stage2_norm=1e100" so that only the size optimization runs for every hit. On my machine running the size optimization on the 250k hits I have took about 36 hours. Then sort the resulting .ms file in order of increasing score (which is the last field in each line) and just run msieve with npr on the few lines in the .ms file that have the smallest score, perhaps only the 1000 best hits. The root sieve runs faster with a tight bound on the stage 2 norm, so try npr "stage2_norm=1e37" or possibly smaller if that winds up taking too long. We've found that if you run all of stage 2 on every hit you spend the majority of the time running the root sieve on hits that will never generate very good polynomials, and the root sieve can take minutes for a single hit (where the size optimization takes half a second) 
Stage 2 finished, the best 10 polynomials murphy_e ranges in 2.348e19  2.129e19. Not very impressing.

I ran stage 2 on 250k hits computed by RichD and myself, the best evalue was 2.0e19, so not that great. Shi Bai ran the CADO stage 2 on the same dataset and has gotten better high scores (I think the best so far was 3.2e19)

[QUOTE=jasonp;320043]I ran stage 2 on 250k hits computed by RichD and myself, the best evalue was 2.0e19, so not that great. Shi Bai ran the CADO stage 2 on the same dataset and has gotten better high scores (I think the best so far was 3.2e19)[/QUOTE]
As I recall from past comparisons (B200?), msieve typically has a ~10% lower score due to your more sophisticated integrator. How much of that better score is due to the differences in integration? 
I was going to write that most of the time the evalues calculated by the two suites differ by less than 1%, and larger differences were actually because of bugs in Msieve that miscalculated the alpha value. But then to make sure I ran the best CADO polynomial through Msieve, and the difference in evalues was 4.78e19 vs 3.493e19!
For the record, the best polynomial I could find in the batch of 260k hits was [code] R0: 5876926706329267758590334567904669751467577 R1: 869332622169838859059 A0: 15018543190390770338953421520339801353757924940775872204660800 A1: 909208080136930262159142936022283921496280008021988680 A2: 13303473065609161166913597414184875440269406556 A3: 138145188131827120402436868843139274434 A4: 763752915545626483996372079531 A5: 4312543659621449260154 A6: 10000466830200 skew 133010468.05, size 3.823e020, alpha 12.788, combined = 2.080e019 rroots = 4 [/code] 
Here's my slightly better (according to msieve) polynomial
[CODE] SKEW 28338177.70 R0 3626941552197564826492128852700460060642852 R1 48957407582916194761589 A0 26101732745933806144485280988037796254029367990700826499285 A1 2408464141902741608017790242140644715688145388490381 A2 174314770228113076791419006080421369720970639 A3 28697722660097589508721192118263624637 A4 1036456362256909021188219944324 A5 21292410351587764080336 A6 181000001476800 skew 28338177.70, size 4.381e20, alpha 12.189, combined = 2.348e19 rroots = 2 [/CODE] jasonp, what score does CADO show for your or my polynomial? 
I got it by myself: CADO's E.sage gives 1.84701205471987e19 for your and 2.08696883313422e19 for my polynomial.

The difference is that the CADO tools compute the size score by integrating in radial coordinates, whereas Msieve starts in radial coordinates and switches to computing the integral in rectangular coordinates. They don't compute the same numbers in general.
Msieve uses the rectangular integral because pol51opt did, and the GGNFS scripts had a table of precomputed 'good' scores to shoot for. Nonetheless, the radial score is more robust for some reason, i.e. it finds a better minimum a lot of the time. 
Here are approx. 1.3 million stage 1 hits:
[URL="https://www.dropbox.com/s/spssx3ze8iagvpn/rsa896_1.dat.m.gz"]https://www.dropbox.com/s/spssx3ze8iagvpn/rsa896_1.dat.m.gz[/URL] Edit: I restarted the stage 1 search at different values. I'm now running the size optimization on this file now, and I will run the root sieve on those with the smallest score. 
So, the best murphy_e we have so far is about 2.5e16, I wonder how CADO guys got murphy_e 2x larger than ours? Either their polynomial is quite extraordinary or they use different optimization algorihtms.

Is it possible we should run msieve stage 2 on their stage 1 results or their stage 2 on our stage 1 results?

I don't have access to the CADO stage 1 hits. We've collected 2.7M hits of our own, and they will all have stage 2 performed by both packages.
Keep 'em coming! Ilya, I suspect a lot of the difference boils down to the more effective root sieve developed by Shi Bai for the CADO stage 2. From what I've seen it consistently produces better alpha scores without sacrificing as much polynomial size as Msieve needs. 
[QUOTE=jasonp;320654]the more effective root sieve developed by Shi Bai[/QUOTE]
Steal it and put it in msieve? :devil: 
From my 1.3 million stage 1 hits, so far I've gotten a
# norm 4.371817e20 alpha 11.383344 e 2.347e19 rroots 4 The run continues... 
I sent a batch of hits computed with a cutoff of 5e33 instead of the previous 1e33, and apparently the polynomials generated with the looser cutoff are similar in quality. So if you use 5e33 for the stage1_norm you'll generate hits about 5x faster.

[QUOTE=jasonp;318710]and then run
msieve v np1 "stage1_norm=[B]5[/B]e33 X,Y" where X and Y is a range of numbers with magnitude 10^12 to 10^15. [/QUOTE] So it's better to update first post. 
[QUOTE=poily;320153]Here's my slightly better (according to msieve) polynomial
[CODE] SKEW 28338177.70 R0 3626941552197564826492128852700460060642852 R1 48957407582916194761589 A0 26101732745933806144485280988037796254029367990700826499285 A1 2408464141902741608017790242140644715688145388490381 A2 174314770228113076791419006080421369720970639 A3 28697722660097589508721192118263624637 A4 1036456362256909021188219944324 A5 21292410351587764080336 A6 181000001476800 skew 28338177.70, size 4.381e20, alpha 12.189, combined = 2.348e19 rroots = 2 [/CODE] jasonp, what score does CADO show for your or my polynomial?[/QUOTE] Hi poily, in cado, it's [CODE] Y1: 48957407582916194761589 Y0: 3626941552197564826492128852700460060642852 c6: 181000001476800 c5: 21292410351587764080336 c4: 1036456362256909021188219944324 c3: 28697722660097589508721192118263624637 c2: 174314770228113076791419006080421369720970639 c1: 2408464141902741608017790242140644715688145388490381 c0: 26101732745933806144485280988037796254029367990700826499285 skew: 20504576.000 # lognorm: 83.45, alpha: 12.19 (proj: 3.18), E: 71.26, nr: 2 # MurphyE(Bf=10000000,Bg=5000000,area=1.00e+16)=3.39e19 [/CODE] 
Msieve computes an Evalue of 2.339e19 when using the skew computed by CADO. Would it be worthwhile to find out what causes such big differences?

Ok, [URL="http://depositfiles.com/files/bopjglvg4"]here[/URL] are about 1.8M hits for the new bound 5e33. It seems the tighter bound was better: with 1e33 as a bound on stage 1 after size optimizations I saw norms of about Xe32 whereas the new bound gives only Xe33 and worse. The best e for this pack was about 2.2e19.
Bai, I used E.sage from the public CADO repository and got almost the same Murphy e values as msieve shows. Do you compute it somehow different now? 
[QUOTE=jasonp;321620]Would it be worthwhile to find out what causes such big differences?[/QUOTE]
As a bystander who has no knowledge of either how to go about such finding out or how much work it might be, I would be very interested in the results. :popcorn: (poily's post directly above this makes it all the more interesting!) 
1 Attachment(s)
For the purposes of generating several hits with CPUs (as I can't currently use my GPU...) more quickly, I resurrected poily's MPI polsel patch, from [url]http://www.mersenneforum.org/showpost.php?p=304639&postcount=9[/url] .
Mainline msieve changed somewhat since then (e.g. separation of the size optimization and the root sieve), so I started by upgrading and expanding the additions to the code; I also changed whitespace to follow the coding convention of the surrounding code more closely :smile: That allowed the computer to produce a number of hits with: $ mpirun n 6 msieve np1 "stage1_norm=2e33 1100000000000,1100010000000" v I think it would be great to see the MPI polsel patch integrated to mainline msieve. This way, every user of multithreaded polsel (even on a single computer) would benefit without having to apply (and upgrade) an out of tree patch :smile: However, AFAICS, the patch requires improvements before this can happen. Indeed, I noticed that when a child process has finished working ("polynomial selection complete"), it sticks its core to 100% usage. One of the processes found no suitable leading coefficient in its range and gave up on polsel immediately; after ~2h47 of CPU time wasted by that one, soon after another process had exhausted its range as well, I tried to kill one of the inactive processes... which killed all of its siblings, as I should have expected... 293 stage 1 hits generated in ~10h40 CPU time on Xeon E31230 @ 3.2 GHz. I have zero MPI experience, and therefore, no clue how to make finished jobs not waste significant CPU power :wink: 
The patch makes all the MPI processes wait at a barrier when poly selection is finishing, and my experience with OpenMPI is that this is a computationally expensive operation. I suspect that's what is burning up 100% CPU.
Unfortunately there's no real way around that, if you wanted processes to stop waiting on each other then you wouldn't be using barrier synchronization; put another way, MPI is a poor substitute for a distributed clientserver architecture :) 
1 Attachment(s)
ran a 15 min 'search' on a 560, got 227 hit, best poly
[code] Fri Dec 14 21:18:19 2012 Msieve v. 1.51 (SVN 766) Fri Dec 14 21:18:19 2012 random seeds: ccf343ec 7dcd68de Fri Dec 14 21:18:19 2012 factoring 412023436986659543855531365332575948179811699844327982845455626433876445565248426198098870423161841879261420247188869492560931776375033421130982397485150944909106910269861031862704114880866970564902903653658867433731720813104105190864254793282601391257624033946373269391 (270 digits) Fri Dec 14 21:18:21 2012 no P1/P+1/ECM available, skipping Fri Dec 14 21:18:21 2012 commencing number field sieve (270digit input) Fri Dec 14 21:18:21 2012 commencing number field sieve polynomial selection Fri Dec 14 21:18:21 2012 polynomial degree: 6 Fri Dec 14 21:18:21 2012 max stage 1 norm: 1.08e+035 Fri Dec 14 21:18:21 2012 max stage 2 norm: 1.05e+035 Fri Dec 14 21:18:21 2012 min Evalue: 0.00e+000 Fri Dec 14 21:18:21 2012 poly select deadline: 1079999 Fri Dec 14 21:20:02 2012 polynomial selection complete Fri Dec 14 21:20:02 2012 R0: 3334009071282816277832905784211277620964254 Fri Dec 14 21:20:02 2012 R1: 100281358245707420758237 Fri Dec 14 21:20:02 2012 A0: 372828901627475347894334652824133151170797461276260673 Fri Dec 14 21:20:02 2012 A1: 77826863799406908305447316219479205604506482315 Fri Dec 14 21:20:02 2012 A2: 8440449792077225990613216882014698768493181 Fri Dec 14 21:20:02 2012 A3: 434343399475320970345769932599595433 Fri Dec 14 21:20:02 2012 A4: 27102576369602076901509411656588 Fri Dec 14 21:20:02 2012 A5: 139702468047107462100798 Fri Dec 14 21:20:02 2012 A6: 300000002082000 Fri Dec 14 21:20:02 2012 skew 1498026.43, size 1.811e020, alpha 8.436, combined = 1.092e019 rroots = 4 [/code] (yeah I know, score is pretty bad, but thats a 15 min run) and the msieve.dat.m file to be run in cado 
firejuggler: your best poly isn't good, it has pretty low E value :wink:
jasonp: [quote]The patch makes all the MPI processes wait at a barrier when poly selection is finishing, and my experience with OpenMPI is that this is a computationally expensive operation. I suspect that's what is burning up 100% CPU.[/quote] Meh. [quote]Unfortunately there's no real way around that, if you wanted processes to stop waiting on each other then you wouldn't be using barrier synchronization; put another way, MPI is a poor substitute for a distributed clientserver architecture :)[/quote] Indeed. Several weeks ago, for factoring a 512bit RSA key under a "tight" deadline (well, I hadn't fully taken into account how much easier it had become since 2009, and the fact that in the end, I had 40 cores of varying power ^^), I spent several hours starting to assemble a simple clientserver work reservation system, speaking HTTP GET & PUT, in Perl (CPAN lists a number of modules for dealing with HTTP). The initial results were good, but the task was easy enough that I ended up just invoking ggnfslasieve4I14e in sh scripts instead of finishing the reservation system. 
I know my poly isn't good, it's just a quasinote to myself that 'I can do it'. I will make another run, a bit longer and see what come out of it.

[QUOTE=poily;321633]
Bai, I used E.sage from the public CADO repository and got almost the same Murphy e values as msieve shows. Do you compute it somehow different now?[/QUOTE] poily, thanks for noting it. Yes, there is some difference when computing the Dickman function. It might explain the difference between the Escore in msieve/cado when the skewness is large. I'll keep you updated. Btw: on your 02 Dec 12 batch, we had a better poly in terms of cado Escore. What's its score in msieve? [CODE] Y1: 9931515945117268997291 Y0: 3136213832380525741156980143018262150146233 c6: 433000028380920 c5: 16322847463557673445083 c4: 259713520920507700049024012254 c3: 313991938835577956615333794603976847 c2: 745313305236683785443677828883521154358049 c1: 8720107683218999638110247380980954223758742214 c0: 177258901573455869908310457463881265392025516480999800 skew: 1650176.000 # lognorm: 79.02, alpha: 9.49 (proj: 2.31), E: 69.53, nr: 4 # MurphyE(Bf=10000000,Bg=5000000,area=1.00e+16)=4.33e19 [/CODE] Anyway, if we can observe a correlation between the scores in msieve and cado, then it's safe to just record a few top polynomials found by both. 
I get
[code] skew 1650176.00, size 6.274e20, alpha 9.489, combined = 3.183e19 rroots = 4 [/code] 
I agree with debrouxl that the MPIed polynomial selection should be included in some form in the mainline msieve. I'm tired of doing the trivial patchwork for every new major msieve revision ;)
[QUOTE]The patch makes all the MPI processes wait at a barrier when poly selection is finishing, and my experience with OpenMPI is that this is a computationally expensive operation. I suspect that's what is burning up 100% CPU. [/QUOTE] Actually looks like the problem is in MPI_Gather which is there to collect the results from all the processors and select the best after the polynomial selection is complete. I suggest we remove this stage at all making each of the MPI processes save it's own fbfile allowing the user to select the best. [QUOTE]Btw: on your 02 Dec 12 batch, we had a better poly in terms of cado Escore.[/QUOTE] Hmm, looks like CADO does better size optimization not only the root sieve. What was the original hit that produced that polynomial? And what size norm was it of after the size optimization? I wonder what msieve can do with it if we try to tune relevant parameters a bit. [QUOTE]Anyway, if we can observe a correlation between the scores in msieve and cado, then it's safe to just record a few top polynomials found by both.[/QUOTE] Tbh, I'd prefer both of the tools show the same values, it's exact science after all. 
[quote][quote]The patch makes all the MPI processes wait at a barrier when poly selection is finishing, and my experience with OpenMPI is that this is a computationally expensive operation. I suspect that's what is burning up 100% CPU.[/quote]
Actually looks like the problem is in MPI_Gather which is there to collect the results from all the processors and select the best after the polynomial selection is complete. I suggest we remove this stage at all making each of the MPI processes save it's own fbfile allowing the user to select the best.[/quote] Amusingly, that's what I had just done locally, by replacing #ifdef HAVE_MPI by #if 0 in the hunk modifying gnfs/poly/poly.c :smile: 
Did a longer run, around 11k hits in 15 hours, will update when nps "stage2_norm=1e100"
will have ran. Is there an easy way to sort by field under windows? trying to do it with Librecalc/excel is doing poorly. 
Get a windows port of the unix sort utility, and then try
sort g k 11 <file> to sort in ascending numerical order based on field 11 (the last) 
1 Attachment(s)
ok... so cygwin is no good.
I have value from 1.007815e+34 to 1.348243e+36 and the sort don't give me the +34 and a handfull of +35 I need. grblbl managed somehow with Librecalc, will post my hit. 11700+ hit, starting at 3e13 
[URL="http://depositfiles.com/files/tr0zv0mha"]This batch[/URL] resulted in the following best so far polynomial:
[CODE]Mon Dec 17 04:01:41 2012 R0: 5382714938488690441209883907451379700210334 Mon Dec 17 04:01:41 2012 R1: 16663482999754897882819 Mon Dec 17 04:01:41 2012 A0: 55149587676457221661976298276728946108324308303659979586785 Mon Dec 17 04:01:41 2012 A1: 2084363501398641733507654116431852917627220374513244 Mon Dec 17 04:01:41 2012 A2: 115328708937118208864934751144873150311335660 Mon Dec 17 04:01:41 2012 A3: 3417129234272571170820957161968499682 Mon Dec 17 04:01:41 2012 A4: 7825764054023217154734470955 Mon Dec 17 04:01:41 2012 A5: 1089651465313173279574 Mon Dec 17 04:01:41 2012 A6: 16940008971600 Mon Dec 17 04:01:41 2012 skew 43447739.22, size 4.643e20, alpha 11.010, combined = 2.483e19 rroots = 2 [/CODE] I'm still not sure how I've missed the 3.183e19 one, maybe that's because I take only 50100 size optimized values (with norms 1e32*e33). I'll try to restart root optimization with 1000 best size norms (probably, 1e32*e34). 
[QUOTE=poily;321737]
What was the original hit that produced that polynomial? And what size norm was it of after the size optimization? I wonder what msieve can do with it if we try to tune relevant parameters a bit. [/QUOTE] The raw polynomial is this one, [CODE] 433000028380920 9931515945117268997291 3136213832380463342922092847039625251389592 [/CODE] The size optimized polynomial in cado is, [CODE] Y1: 9931515945117268997291 Y0: 3136213832380526651668430475424366552787822 c6: 433000028380920 c5: 16561029521169279633163 c4: 267250423315714655877175299839 c3: 217372220497467849979460290451947353 c2: 818449055340804222519290511517032253432988 c1: 162909026138981074398068520700268950925476629841 c0: 139263538137801830327982923308744587295054517282499033 skew: 1619456.000 # lognorm: 78.88, alpha: 0.83 (proj: 2.31), E: 78.05, nr: 4 # MurphyE(Bf=10000000,Bg=5000000,area=1.00e+16)=3.54e20 [/CODE] 
1 Attachment(s)
I had a Xeon E31230 spend:
* more than a CPUmonth on `msieve np1` with norm 2e33, producing 13705 hits (attached)  I still can't use my GPU at the moment; * a negligible amount of time on `msieve nps` with norm 1e100; * another CPUmonth or so on `msieve npr` for [I]all[/I] hits. The second CPU month aimed at testing the updated version of poily's MPI patch (see post #42, with the modification mentioned in posts #49 and #50) for nps + npr  seems to work :smile: The best polynomial produced by stage 2 is worse than the best hits posted earlier in this topic by a significant margin: [code]# norm 3.453121e20 alpha 10.768289 e 1.933e19 rroots 2 skew: 113809421.14 c0: 777040021595652574801970561643527925908245389976845316223520 c1: 56689368246124675729566462839180299627446395002122964 c2: 1184036534621771421230391418974433516476565936 c3: 13945075463277536309744615078227615403 c4: 162343630391708833196793797373 c5: 762530856893636308781 c6: 1500024500100 Y0: 8062516096125044668396811397254458255803647 Y1: 34136309016942249143093[/code]I'll now make the computer run more stage 1. 
Just so y'all know, I've been running the last few days with my [STRIKE]460[/STRIKE] 560. I started at 5e12 with norm 2e33, and am averaging around 25K hits per day (give or take a few thousand, it's just a rough estimate). I have ~85.8K total at the moment, and ~100 more every few minutes.
jasonp, perhaps worker's locations should be marked in the first thread? Even if it's hard to duplicate work of the same coefficient, it still seems to me like we'll get better results if we run as many coefficients as possible. 
Here's another 2 million stage 1 hits with norm of 1e33. I've now restarted it with norm of 5e33.
[url]https://www.dropbox.com/s/vczyvz0bgmnrgvm/rsa896_2.dat.m.gz[/url] 
running msieve v np1 "stage1_norm=5e33 2345678901234,20000000000000" will post my hits/optimised once I get back home.

How many S1 hits have there been, either from this thread alone or total? I'd estimate we're somewhere between 35 million, with Greg doing most of it?
I have ~180K hits [URL="http://dubslow.tk/random/batch1.gz"]here[/URL] (4.7 MB) (I would run size opt myself but my CPU's all tied up ATM). 
I've sent jasonp, via PM, just under 4M.

[QUOTE=RichD;323256]I've sent jasonp, via PM, just under 4M.[/QUOTE]
Sweet! I guess there's more than one big player around here :smile: So we're easily north of 5M, probably somewhere near 10M? 
By tomorrow, I'll have another dozen thousand CPU stage 1 hits (norm 2e33) from MPI msieve.
And yeah, we're probably around 10M hits now :smile: 
1 Attachment(s)
168xx hits min norm seem to be @ 4.xxx +034 but I may have done a bad job at optimising

[QUOTE=Dubslow;323262]... I guess there's more than one big player around here ... probably somewhere near 10M?[/QUOTE]
Not really a big player, just consistent. Been running since my first post. [QUOTE=jasonp;319061]A trillion hits would probably be too much to deal with, though it would be awesome :)[/QUOTE] 
So far the total contributions amount to 10.4M hits. Keep 'em coming!
Edit: by comparison, the few months that everyone spent on RSA768 produced 30 million hits, the vast majority of which came from CPU Msieve. 
playing with frmky's last upload (only half thru it)
poly is 23000623460100 4200026754030636886542 325872963021220734540680389841 71104560237504929295111562904399389 2515044822412759992774011178811387566386570 214938857506910312002364866202388589245612515363 2518563929690957541751459455188308671386866572717152336 53041291506298756714237 5115212953529178920472571451294711821103119 2.09 5.249183e+033 [code] Wed Jan 02 07:30:03 2013 Msieve v. 1.51 (SVN 766) Wed Jan 02 07:30:03 2013 random seeds: 6d21a47c 93a68ff0 Wed Jan 02 07:30:03 2013 factoring 412023436986659543855531365332575948179811699844327982845455626433876445565248426198098870423161841879261420247188869492560931776375033421130982397485150944909106910269861031862704114880866970564902903653658867433731720813104105190864254793282601391257624033946373269391 (270 digits) Wed Jan 02 07:30:05 2013 no P1/P+1/ECM available, skipping Wed Jan 02 07:30:05 2013 commencing number field sieve (270digit input) Wed Jan 02 07:30:05 2013 commencing number field sieve polynomial selection Wed Jan 02 07:30:05 2013 polynomial degree: 6 Wed Jan 02 07:30:05 2013 max stage 1 norm: 1.08e+035 Wed Jan 02 07:30:05 2013 max stage 2 norm: 2.46e+035 Wed Jan 02 07:30:05 2013 min Evalue: 0.00e+000 Wed Jan 02 07:30:05 2013 poly select deadline: 1079999 Wed Jan 02 07:41:41 2013 polynomial selection complete Wed Jan 02 07:41:41 2013 R0: 5115212953529306844186435346484985056758031 Wed Jan 02 07:41:41 2013 R1: 53041291506298756714237 Wed Jan 02 07:41:41 2013 A0: 379896940535659287453260136796270948676592389901631962720 Wed Jan 02 07:41:41 2013 A1: 356485807960054857061300970481298628845280475487976 Wed Jan 02 07:41:41 2013 A2: 88244722381684984039807462984918078359735096 Wed Jan 02 07:41:41 2013 A3: 3323380506955608894448655022916446492 Wed Jan 02 07:41:41 2013 A4: 378527384940320290220809646801 Wed Jan 02 07:41:41 2013 A5: 4532860863907273712142 Wed Jan 02 07:41:41 2013 A6: 23000623460100 Wed Jan 02 07:41:41 2013 skew 18516337.01, size 4.775e020, alpha 10.361, combined = 2.509e019 rroots = 4 [/code] 
1 Attachment(s)
11752 new stage 1 hits with norm 2e33, produced by MPIpatched CPU msieve.

poly was
[code] 23000623460100 53041291506298756714237 5115212953527564653744592445975459711681095 [/code] 
1 Attachment(s)
40197 new stage 1 hits, most of them generated by GPU msieve on my GT540M, and the rest by MPIpatched CPU msieve.

I've run size optimization on my latest 40197 hits (a number of "line optimization failed" messages), and I'm now running root optimization in ascending order of stage 1 norm (e+33 and e+34 values, less than 1% of all sizeoptimized hits).
The norms, alphas and e values tend to be decreasing on average, and I haven't seen a 3e20 norm for a while, so chances are that the best potential hits have already been found... The best polynomial so far is better than my previous best, but still much worse than the best ones posted here: [code]# norm 3.928016e20 alpha 11.376284 e 2.143e19 rroots 4 skew: 47860990.01 c0: 102243377053834035977456615540543318152506688915339403180155 c1: 6607905899690278300548521267259375978782553228004647 c2: 990115457387976288997208498176855800512746728 c3: 2374578563317318524203706157960245520 c4: 115319739334363910337347697197 c5: 3817121901056644507273 c6: 45678923690400 Y0: 4562480378270649518664783149805881289770388 Y1: 35998309767524359673137[/code] But the above polynomial is, in fact, not the main reason why I'm writing this post. I'm writing it because, in the output, I noticed several outliers with shorter c0, very low skew, and terrible indicated e values, despite norms comparable to those of the "normal" polynomials. Here are several examples: [code]# norm 2.372578e20 alpha 10.426309 e 1.964e20 rroots 2 skew: 1197.46 c0: 18481412854226641032415109537792402344324145678149569243 c1: 412818324451811043545670157890441895685514959354757096 c2: 2456685007867823022280006789699102572790645669 c3: 66233377310249932462663890531016567923 c4: 496905266675509984230728250988 c5: 1336373776546492878581 c6: 1500024500100 Y0: 8062516096127221174983998069912229229885966 Y1: 34136309016942249143093 # norm 2.368751e20 alpha 10.846135 e 2.245e20 rroots 4 skew: 1377.05 c0: 2642388665373048624150808121402634764591949455367447825 c1: 438501348563241987768009661055349213982843345699856290 c2: 4693749974898874734068961081565468612650025973 c3: 186189783670734795089307249827424690087 c4: 2924675753151214297925870173078 c5: 12806778906904494395723 c6: 23456710532640 Y0: 5098500528781644024246804152176670963216164 Y1: 22196851711482804191917 # norm 3.036890e20 alpha 11.223265 e 2.666e20 rroots 4 skew: 1334.97 c0: 28576717055994791421425058746265638291236302755674259745 c1: 222801134536579704276697225767050208595951749040306072 c2: 1398221794954370584472235616298871514610192843 c3: 124644134602986531319500832562710506959 c4: 690941799193854044069220331794 c5: 1776275142072176247887 c6: 1900035882108 Y0: 7751040741275170619614657060899972044441224 Y1: 109564369221121497194087 # norm 2.288852e20 alpha 10.619501 e 2.490e20 rroots 2 skew: 1766.88 c0: 1393418822768280472160344800352078143099930198234976875 c1: 85215789369701604171965788635140830817277453757134150 c2: 368799273673517383122693984287542029297070690 c3: 73498701786682246657794324606801605568 c4: 1583577793726330690346273176767 c5: 2617296398705592280666 c6: 1800000012120 Y0: 7821227202316007911700848546218259081952266 Y1: 240333343470261878614787 # norm 2.319390e20 alpha 10.667771 e 1.962e20 rroots 4 skew: 1197.42 c0: 60297133975171382084881582505263853794821480445420380429 c1: 735812972956460443707231473680530061138485617982676392 c2: 5483937239726035047381552949647510651082363571 c3: 183781698612356696913135396595290025285 c4: 443631729780461087297695693254 c5: 1570418878625124677333 c6: 2300058204444 Y0: 7508108080138893173363569991159709738912898 Y1: 42471148941794878087325 [/code] Where do these come from ? 
[URL="http://dubslow.tk/random/rsa.dat.gz"]~221K[/URL] hits (~5.7MB) with a 2e33 norm. I think I'll call it quits for now and go play with mfaktc 0.20 :smile:

The 'norm' value is a numericallycomputed integral that does not depend on translation and skew, so it is a more approximate measure of polynomial quality than the Murphy E score.
Usually when the loworder coefficients are extremely small it is an indication that the size optimization cannot do very much with the hit from stage 1; my experience is that you can force the skew to be small and can find a weak local optimum in the neighborhood of the original hit, but better optima are almost always very far away. Dubslow, thanks for the contributions! 
After collecting a few stage 1 hit, and meddling with the second stage, i'm left wondering why, in the first part of the second stage, we allow such a high norm.From what I seen, unless your norm is below 5e34 you won't have a final murphy score above 1e19.
It make no sense to allow a "stage2_norm=1e100". Now, if there is a good explanation, I'm all ears. 
The stage 2 norm is set deliberately too large so that all the outputs from the size optimization make it into the output file. That's really just so we can get a statistical idea of how rare it is to get a polynomial with good size. Of course, if you then ran the root optimization on everything it's a disastrously bad idea.

[quote]Of course, if you then ran the root optimization on everything it's a disastrously bad idea.[/quote]
Indeed, firejuggler, unless it's for testing purposes (e.g. the MPI patch), as ~99% of the sizeoptimized hits have no chance of producing a good full polynomial (whatever the size of the number to be factored, it seems  we get similar ~1% ratios for C15xC16x and the RSA896 C270). 
Here's another 6 million hits: [url]https://www.dropbox.com/s/9j8oml7r5hthx6g/rsa896_3.dat.m.gz[/url]

How many hits are needed? What's the ETA after the required number of hits is reached?

The answer to your first question is that nobody knows. On theoretical grounds we're supposed to be able to find polynomials that are drastically better than what we've been seeing, so if the 'time constant' of the search is measured by the time it takes to get really lucky once, then we've barely started. The previous search for RSA768 generated 30 million hits, one of which turned into a polynomial whose performance was indistinguishable from the one actually used in 20082010
My archives now contain 21 million hits (thanks everybody!), and Paul's group has posted 18 million more found with CADO NFS (they have a large number of CPUs searching). All of the hits we've found have passed through the CADO size optimization (except Greg's large batch above), and though we've found some polynomials in their top 10 we haven't gotten near their current best. That being said, only a little of the current dataset (perhaps 5%) has passed through Msieve's size optimization. Could I persuade everyone to take a break and run the size optimization on the 39 million hits we have? This uses CPU only, and the current code can optimize a single polynomial in about 0.5 seconds, so we're looking at about 30 CPUweeks of work to isolate the ~5000 bestscoring polynomials. Running the root sieve on those will take about 5 CPU days. To avoid me having to upload a blizzard of files, can I get 8 CPUs worth of volunteers? This job will be much easier if you can use unix textprocessing tools, and I'll tell you how to run Msieve and postprocess the output. (Paul's group estimated that their best polynomial as of a few months ago would require 35000 CPUyears to complete the sieving) 
I can use 4 hyperthreads of the usual Xeon E31230 through MPIpatched root optimization.

I can throw in 12 quad core Sandy Britches (in a few days). GNULinux of course :smile: (I had been looking for a next project to keep my cores busy, good timing :smile:)

[QUOTE=jasonp;327353]To avoid me having to upload a blizzard of files, can I get 8 CPUs worth of volunteers? This job will be much easier if you can use unix textprocessing tools, and I'll tell you how to run Msieve and postprocess the output.[/QUOTE]
I have a 12core (dual 6core Xeon 5645) linux computer that I can dedicate to this. I'd be happy to help out. 
Okay, I'll start uploading files. Instructions:
 download your file (they're ~130MB)  unzip and rename to msieve.dat.m  with worktodo.ini set for RSA896, run [code] msieve v i <rsa896 worktodo> nps "stage2_norm=1e100" [/code]  the resulting output file is msieve.dat.ms; find the best size scores with [code] sort g k 11 msieve.dat.ms  head 5000 > sizeopt.out [/code] (Note for windows that this is the unix sort, not the crappy MS sort) If you want to manually split the file to run across multiple cores, rename each piece to <something_unique>.m and add 's <something_unique>' to the msieve command line. The optimized output will be in <something_unique>.ms  send me your top hits somehow. If you can't stand the suspense, rename your list of best hits to <something_unique>.ms and run the root sieve yourself with [code] msieve v i <rsa896 worktodo> npr s <something_unique> [/code] Note that it isn't necessary to run the root sieve on all 5000 candidates; the odds are overwhelming that the best result your batch produces will come from one of the top 100 sizeoptimized polynomials. Also, note that the latest Msieve SVN has an adjustment to the alpha computation that will bias the Evalue downwards, but the modified score exactly matches what the CADO tools report. I'll update this post with the other files (8 total). Each file has 5M hits, and would take about 30 days on one core. debrouxl: [url="http://depositfiles.com/files/wddg1gl9u"]download 1[/url] WraithX: [url="http://depositfiles.com/files/9cpaisgm2"]download 2[/url] dubslow: [url="http://depositfiles.com/files/27vuh15me"]download 3[/url] firejuggler: [url="http://depositfiles.com/files/yb6ilspsa"]download 4[/url] WraithX: [url="http://depositfiles.com/files/rid8u16t6"]download 5[/url] dubslow: [url="http://depositfiles.com/files/fvcn3e2kw"]download 6[/url] poily: [url="http://depositfiles.com/files/91a33baua"]download 7[/url] debrouxl: [url="http://depositfiles.com/files/9nf2v5dao"]download 8[/url] WraithX: Greg's pile in post #110 
Having succesfully started the large LA with many less threads than I initially thought, I can now put at least 4 full cores on this right now, with maybe a few extra threads here and there. (Well, after the super bowl :smile:)

i'll claim the third file if nobody else want it

I certainly do want it :razz::smile:

[QUOTE=jasonp;327372]debrouxl: download 1
WraithX: download 2 dubslow: download 3 firejuggler: download 4 download 5[/QUOTE] I'll take number 5 too. 
If I may, I would recommend [URL="http://filesmelt.com/"]filesmelt.com[/URL] for any more of these transfers. It's completely free, and more importantly, is free of any intrusive ads, doesn't have any bandwidth restrictions, doesn't try to sign you up with your credit card, and doesn't force you to wait 60 seconds to begin the download (and you don't have to press like four buttons).
Edit: Easy way to split up the file into multiple chunks: `more +<1> <filename>  head <2>`, where <1> is the beginning line number, and <2> is how many lines. So to split up a 1000 line file into 4 chunks, you'd do [code]for num in 1 251 501 751; do echo $num more +$num <filename>  head 250 > <filename>.$num done[/code] (or something similar). 
[QUOTE=Dubslow;327432]If I may, I would recommend [URL="http://filesmelt.com/"]filesmelt.com[/URL] for any more of these transfers. It's completely free, and more importantly, is free of any intrusive ads, doesn't have any bandwidth restrictions, doesn't try to sign you up with your credit card, and doesn't force you to wait 60 seconds to begin the download (and you don't have to press like four buttons).
Edit: Easy way to split up the file into multiple chunks: `more +<1> <filename>  head <2>`, where <1> is the beginning line number, and <2> is how many lines. So to split up a 1000 line file into 4 chunks, you'd do [code]for num in 1 251 501 751; do echo $num more +$num <filename>  head 250 > <filename>.$num done[/code] (or something similar).[/QUOTE] Even easier, and works for aribtrary values of 1000: [code]split l 250 <filename> [/code] if you want an arbitrary number of at most 250line files or [code]split n 4 <filename>[/code] if you want precisely four output files. 
1 Attachment(s)
I've attached modified msieve MPI polynomial selection patch so you guys wouldn't have to manually split the input files.
jasonp, can you post links to all the batches left? 
Okay, everything is up (I apparently don't have the upload capacity to be a good pirate :)

I took batch 7.

Taking batch 8 as well, I'll try to run it on 8 cores of FX8150.

[QUOTE=xilman;327443]Even easier, and works for aribtrary values of 1000:
[code]split l 250 <filename> [/code] if you want an arbitrary number of at most 250line files or [code]split n 4 <filename>[/code] if you want precisely four output files.[/QUOTE] Ah, gracias! The old "post something bad enough to be corrected" trick has worked again! :smile: (It wasn't deliberate though :razz:) 
Nobody took batch 6 yet.just saying it.
and for MinGW32, the line is [code] split b 100m msieve.dat.m [/code] to split the file in 100Mb chunk 
Just a warning, I plan to upload another batch of 6 million or so stage 1 hits around the end of the week.

[QUOTE=frmky;327577]Just a warning, I plan to upload another batch of 6 million or so stage 1 hits around the end of the week.[/QUOTE]
That should be fine. I should be done with my two by then, and can pick up Batch 6, plus this new one. 
All times are UTC. The time now is 20:00. 
Powered by vBulletin® Version 3.8.11
Copyright ©2000  2022, Jelsoft Enterprises Ltd.