mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > GMP-ECM

Reply
 
Thread Tools
Old 2019-05-24, 07:37   #1
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

3×89 Posts
Default Distributing gpu-ecm curves to multiple cpu-workers

I'm taking a stab at ECM for a c251 (home prime 49 step 119) with t65.
it's been tested to 2*t60 so I'm starting at t65 (b1=850e6, b2=1.4e13, ~80k curves)

I have 2x1080ti and a 32 core machine (2x E5-2650 v2).
on cpu, stage 1 takes ~7200s/curve, stage 2 takes ~1800s/curve.
on gpu, stage 1 is much faster (10x? 20x?)

Ideally I would like to do a ton of stage 1 calculations on gpu and then distribute those to the cpus for stage 2.

Do I just pass `-save curves_<worker> <b1> 0` to the gpu worker then
`-resume curves_<worker> <b1> <b2>` to each cpu worker?
SethTro is offline   Reply With Quote
Old 2019-05-24, 16:22   #2
chris2be8
 
chris2be8's Avatar
 
Sep 2009

34×52 Posts
Default

That's about right, but you need to split the save file to give each core running stage 2 separate lines from it.

Here's part of a script I use to run ECM to 35 digits. On this system I need 3 cores to run stage 2 as fast as the GPU does stage 1, you will probably need to change that.
Code:
#!/bin/bash

function do_block
{
B1=$1
/home/chris/ecm.2741/trunk/ecm -gpu -save $NAME.save $B1 1 > $INI | tee -a $LOG
wait # for the last stage 2 to finish
grep -q 'Factor found' $LOG* # Check if we found a factor
if (($?==0));then exit 0;fi
rm $NAME.saveaa $NAME.saveab $NAME.saveac
split -nl/3 $NAME.save $NAME.save
rm $NAME.save
(nice -n 19 /home/chris/ecm.2741/trunk/ecm -resume $NAME.saveaa $B1;/home/chris/ecm-6.4.4/ecm -c 999 -idlecmd 'ps -ef | grep -q [-]save' -n  $B1 <$INI )  | tee -a $LOG.1 | grep [Ff]actor &
(nice -n 19 /home/chris/ecm.2741/trunk/ecm -resume $NAME.saveab $B1;/home/chris/ecm-6.4.4/ecm -c 999 -idlecmd 'ps -ef | grep -q [-]save' -n  $B1 <$INI )  | tee -a $LOG.2 | grep [Ff]actor &
(nice -n 19 /home/chris/ecm.2741/trunk/ecm -resume $NAME.saveac $B1;/home/chris/ecm-6.4.4/ecm -c 999 -idlecmd 'ps -ef | grep -q [-]save' -n  $B1 <$INI )  | tee -a $LOG.3 | grep [Ff]actor &
}
Note /home/chris/ecm.2741/trunk/ecm is compiled with GPU support, while /home/chris/ecm-6.4.4/ecm was compiled with -enable-shellcmd to make it accept -idlecmd.

Chris
chris2be8 is offline   Reply With Quote
Old 2019-05-25, 00:57   #3
WraithX
 
WraithX's Avatar
 
Mar 2006

11·43 Posts
Default

You should also know about the previous work that was done on this number. The last posted record is post #111 in the following thread:
https://mersenneforum.org/showthread.php?t=3238&page=11

In it you can see that I had finished:
Code:
HP49 Step 119 c251:
Expected number of curves to find a factor of n digits:
      digits           35        40       45      50       55        60        65        70        75
------------------------------------------------------------------------------------------------------
    B1 =  11e6        138       788     5208   39497   336066   3167410  3.2e+007  3.7e+008
    B1 =  43e6         61       278     1459    8704    57844    419970   3346252  2.9e+007
    B1 = 260e6         23        82      335    1521     7650     42057    250476   1603736
    B1 =   1e9         14        44      154     599     2553     11843     59619    319570
    B1 =   3e9         11        31       96     335     1279      5292     23661    112329    565999
    B1 =   3e9         11        31       99     344     1315      5446     24234    115138    580561 (-maxmem 4096)

Number of curves completed so far, and their equivalent t-level:
 12000 @  11e6 =   86.956x   15.228x   2.304x   0.303x   0.035x    0.003x
 18000 @  43e6 =  295.081x   64.748x  12.337x   2.068x   0.311x    0.042x   0.005x
 90000 @ 260e6 = 3913.043x 1097.560x 268.656x  59.171x  11.764x    2.139x   0.359x   0.056x
120000 @   1e9 = 8571.428x 2727.272x 779.220x 200.333x  47.003x   10.132x   2.012x   0.375x
 16000 @3e9(4g)= 1454.545x  516.129x 161.616x  46.511x  12.167x    2.937x   0.660x   0.138x
 19200 @   3e9 = 1745.454x  619.354x 200.000x  57.313x  15.011x    3.628x   0.811x   0.170x
----------------------------------------------------------------------------------------------
                16066.507x 5040.291x 1424.133x 365.699x 86.291x   18.881x   3.847x   0.739x
ie, I finished > 3.8*t65 and > 0.7*t70.

I continued working on the number for a while splitting the work as you are planning on doing. I ended up doing a total of 162,000 curves at B1=3e9, which with the t70 recommended 112,329 curves at B1=3e9 means I did a total of 1.442*t70. You may want to start at B1=3e9, or work even higher. I'm just about to finish 1000 curves at B1=10e9, which isn't much compared to the previous effort.

When doing the work I would manually run the curves on the gpu's and then I would use my ecm.py script to spread all the finished stage-1 curves across multiple stage-2 jobs. For example, I had one gpu (I forget which) that could do 6400 stage-1 curves in 30 days, and another gpu that could do 9600 stage-1 curves in about 30 days. So, I could finish 16000 stage-1 curves per month. I would then use ecm.py to split those 16000 curves into 7 jobs (about 2286 curves per job) which could finish the work in 32 days. So, you may want to do the work in batches like this. The commands I used would look like:

./ecm -gpu -gpudevice 0 -gpucurves 9600 -save g0-001_9600_3e9.txt 3e9 1 < hp49_119_c251.txt
./ecm -gpu -gpudevice 1 -gpucurves 6400 -save g1-001_6400_3e9.txt 3e9 1 < hp49_119_c251.txt

And then I would combine those files into a single c-001_16000_3e9.txt file and use ecm.py like so:

python ecm.py -threads 7 -resume c-001_16000_3e9.txt -out r-001_16000_3e9.txt

You need to make sure you have enough memory to run 7 (or more) jobs for stage-2 with B1=3e9, otherwise your stage-2 jobs will swap and slow down drastically. I believe each job took up to 14500MB, which means the total used for those jobs on that machine was about 100GB.

You can find my ecm.py script in the following thread (the most recent version will be towards the end of the thread):
https://mersenneforum.org/showthread.php?t=15508
WraithX is offline   Reply With Quote
Old 2019-05-25, 04:20   #4
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

22·7·132 Posts
Default

This is a truly awesome amount of work! I appreciate the detail you provided of curves and t-levels. I just started the Cunningham 2_2330L factorization, which will saturate all my large-memory machines for the next season or two; I'd like to contribute some B1=1e10 curves to honor your effort, but they'll have to wait until Fall.
VBCurtis is online now   Reply With Quote
Old 2019-05-25, 21:36   #5
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

3·89 Posts
Default

Quote:
Originally Posted by WraithX View Post
You should also know about the previous work that was done on this number. The last posted record is post #111 in the following thread:
https://mersenneforum.org/showthread.php?t=3238&page=11

In it you can see that I had finished:
...

Thanks for the status update! I had seen your 2014 status from http://worldofnumbers.com/topic1.htm but hadn't searched here.


Quote:
Originally Posted by WraithX View Post
You need to make sure you have enough memory to run 7 (or more) jobs for stage-2 with B1=3e9, otherwise your stage-2 jobs will swap and slow down drastically.
Again thanks for the detailed report, why the "7 (or more)"?

Do you know of a write-up of the performance impact of using -maxmem? I did several searches but didn't find anything definitive (other than non-linear: https://www.mersenneforum.org/showpo...0&postcount=52)



Quote:
Originally Posted by WraithX View Post
You can find my ecm.py script in the following thread (the most recent version will be towards the end of the thread):
https://mersenneforum.org/showthread.php?t=15508
I'm already using your wonderful script. I did have to make a couple of modifications to make it work with python3 (I'll try and post a patch later).

Since you seem to be actively working on this, what curves would be most useful for me to work on? I'm willing to spend a 1080 GPU-month + a couple core years with high memory.
SethTro is offline   Reply With Quote
Old 2019-06-01, 04:14   #6
WraithX
 
WraithX's Avatar
 
Mar 2006

7318 Posts
Default

Quote:
Originally Posted by SethTro View Post
Again thanks for the detailed report, why the "7 (or more)"?
I used 7 because I was able to run that many jobs on a machine with 128GB of ram. You'll need to calculate how much ram a job will take (with or without maxmem) and then figure out how many you can run in parallel.

Quote:
Originally Posted by SethTro View Post
Do you know of a write-up of the performance impact of using -maxmem? I did several searches but didn't find anything definitive (other than non-linear: https://www.mersenneforum.org/showpo...0&postcount=52)
I don't really know of a write-up on the performance impact of using -maxmem. The only thing I know is that it will reduce the B2 value and increase the k value (internal gmp-ecm variable), which will slightly decrease the odds of finding a factor and somewhat increase the stage-2 runtime. You can use the -v option to see the impact of various -maxmem options. ie, Look for the "estimated number of curves to find a factor of n digits" table. This will let you know how many curves to run at the given B1 value to have a \(1 - 1/e^x\) (where x = curves_run/recommended_curves) chance of finding a factor of the specified size (if one exists).

For example, using B1=11e6 without maxmem will look like:
Code:
Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=1:2580904028
dF=32768, k=3, d=324870, d2=11, i0=23
Expected number of curves to find a factor of n digits:
35      40      45      50      55      60      65
138     788     5208    39497   336066  3167410 3.2e+007
And with maxmem 20 it will look like:
Code:
Using B1=11000000, B2=28545931060, polynomial Dickson(12), sigma=1:3439101351
dF=8192, k=40, d=79170, d2=11, i0=128
Expected number of curves to find a factor of n digits:
35      40      45      50      55      60      65
142     813     5420    40914   348634  3290145 3.4e+007
And with maxmem 40 it will look like:
Code:
Using B1=11000000, B2=28544268490, polynomial Dickson(12), sigma=1:2669738918
dF=16384, k=10, d=158340, d2=11, i0=59
Expected number of curves to find a factor of n digits:
35      40      45      50      55      60      65
142     813     5420    40914   348634  3290145 3.4e+007
You can see that maxmem decreases the B2 value and increases the k value, so you get slightly lower odds of finding a factor and somewhat longer runtimes.
In this case if you ran 10000 curves at B1=11e6 with no, 20, or 40 maxmem, your odds of finding various sized factors would be:
Code:
Chance to find factor, of size d, = 1 - 1/e^x, where x = 10000/recommended_curves
 d    35      40        45       50      55      60
no  ~100%  99.9996%  85.341%  22.367%  2.931%  0.315%
20  ~100%  99.9995%  84.197%  21.683%  2.827%  0.303%
40  ~100%  99.9995%  84.197%  21.683%  2.827%  0.303%

Chance to miss factor, of size d, = 1/e^x, where x = 10000/recommended_curves
 d    35      40        45       50       55       60
no   ~0%  0.000308%  14.658%  77.632%  97.068%  99.684%
20   ~0%  0.000455%  15.802%  78.316%  97.172%  99.696%
40   ~0%  0.000455%  15.802%  78.316%  97.172%  99.696%
Quote:
Originally Posted by SethTro View Post
I'm already using your wonderful script. I did have to make a couple of modifications to make it work with python3 (I'll try and post a patch later).
Thanks. I've been patching over the years, but haven't posted an update in a while. I'll post the update, with some of your changes, here in a bit.

Quote:
Originally Posted by SethTro View Post
Since you seem to be actively working on this, what curves would be most useful for me to work on? I'm willing to spend a 1080 GPU-month + a couple core years with high memory.
"Active" is subjective. I haven't really worked on this since Nov 2017. I think it was the 1080 that could complete 9600 stage-1 curves with B1=3e9 in ~30 days. My computers are busy, so you'll have to verify this on your own machine. I'd recommend running with B1=3e9 or higher.
WraithX is offline   Reply With Quote
Old 2019-06-01, 15:47   #7
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

3·89 Posts
Default

Thanks for the information.

I did a little investigation of maxmem by myself between when I last posted and now.

https://docs.google.com/spreadsheets...it?usp=sharing

https://github.com/sethtroisi/misc-s...ter/ecm-maxmem

The result is that it linearly slows down on stage 2 while slightly reducing B2 (about 10% on average).
If ecm would use 16000 MB by default, and I set maxmem=4000MB then I expect stage 2 to take 4x as long!
SethTro is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
ECM - why 3 curves? James Heinrich PrimeNet 3 2017-11-14 13:59
B1 and # curves for ECM Walter Nissen Factoring 36 2014-02-16 00:20
Max # Workers sk8kidamh Software 5 2011-07-16 15:58
Went from 8 workers to 4 workers on v26.6 upgrade dmoran Software 13 2011-05-23 12:36
Multiple systems/multiple CPUs. Best configuration? BillW Software 1 2003-01-21 20:11

All times are UTC. The time now is 01:50.

Tue Apr 13 01:50:28 UTC 2021 up 4 days, 20:31, 1 user, load averages: 2.85, 2.62, 2.47

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.