GMPECM on large memory systems ~ 60 Gb
Hi all, I have recently been given access to machine time on a 8core Intel system with a possible 64 Gb of memory and would like a few hints on possible ways to have GMPECM use all eight cores and all 64 Gb of ram. Will I need to run 8 processes? also, what are some numbers that could benefit with a few curves with extremely large B2's ?
> Will I need to run 8 processes?
Yes. GMPECM does not have multiprocessor support (i.e. in the form of threading). There may be some day, but it's a long way off. > also, what are some numbers that could benefit with a few curves with extremely large B2's ? Fermat numbers. If you get 8GB per process, you could do stage 2 on F_m, 14≤m≤20 or so very efficiently. Or use even more memory in a single process, and work on larger Fermat numbers or do a P1 stage 2 with extremely high B2. It's been a long time since the last Fermat number factor of a F_m, m<30, was found. It would be really nice to see one again. Alex 
would it be a good idea to have like 6 cores doing huge stage ones and only two cores using 25 Gb each to do stage two? also will GMPECM default to the best k value or do I need to set k=0?
Quote:
Seconded!!! It would also be nice if we could disprove the Selfridge conjecture that d(F_n) is nondecreasing. [d(x) is the number of prime divisors of x] (John admits that it is likely false) The conjecture is true through F_11. But surely some F_n for n > 11 must have fewer than 5 divisors... 

Quote:
You'll need stage 1 residues to feed to stage 2. If the cpus are IA32, you can use Prime95/mprime on, say, all but two cpus and let the remaining two do stage 2. If it's IA64 (Itanium), you should generate stage 1 residues on another machine or have other people produce them. The automatic choice of k is not optimal for Fermat numbers. We should fine tune dF and k for the B2 and memory use you want. Alex 

the machine has two quad core xenon cpus (post woodcrest) I believe that there is 2 Mb L2 cache per core. the amount of memory will depend on how many 4 Gb dimms we can get permission to use. I hope to run a first batch of curves tonight. I estimate that we should have 5664 Gb of ram available depending on how many 2Gb dimms we have to use.
right now, while not too many people are using the machine, I hope to have exclusive use most nights and weekends. I really don't know what b2 size to use. I would like to use all 64Gb in stage 2.  Tom 
When factoring F_m, each residue occupies 2^(m3) bytes. Each polynomial has dF residues, and we need to be able to store about 6 or 7 (iirc) polynomials in memory if you use the treefile option, around 7+log_2(dF) without the treefile option. B2 is approximately 10*dF^2*k.
Here's a table that estimates approx. how much memory different Fermat numbers with different dF will use. I hope I got the formula right. As you can see, it won't be easy to use all the 64GB with the smaller Fermat numbers. The B2 would become unreasonably high. Working on F20 with dF=16384 would use ~40GB and give B2=2.5G with k=1 or B2=5G with k=2, which sounds about right with B1=1M (the current level for F_20). Code:
dF 1024 2048 4096 8192 16384 32768 65536 131072 262144 524288 1048576 B2/k 10M 40M 160M 640M 2.5G 10G 40G 160G 2.5T 10T 40T m 12 8M 17M 36M 76M 160M 336M 704M 1472M 3072M 6400M 13312M 13 16M 34M 72M 152M 320M 672M 1408M 2944M 6144M 12800M 26624M 14 32M 68M 144M 304M 640M 1344M 2816M 5888M 12288M 25600M 53248M 15 64M 136M 288M 608M 1280M 2688M 5632M 11776M 24576M 51200M 106496M 16 128M 272M 576M 1216M 2560M 5376M 11264M 23552M 49152M 102400M 212992M 17 256M 544M 1152M 2432M 5120M 10752M 22528M 47104M 98304M 204800M 425984M 18 512M 1088M 2304M 4864M 10240M 21504M 45056M 94208M 196608M 409600M 851968M 19 1024M 2176M 4608M 9728M 20480M 43008M 90112M 188416M 393216M 819200M 20 2048M 4352M 9216M 19456M 40960M 86016M 180224M 376832M 786432M 21 4096M 8704M 18432M 38912M 81920M 172032M 360448M 753664M 22 8192M 17408M 36864M 77824M 163840M 344064M 720896M 23 16384M 34816M 73728M 155648M 327680M 688128M 24 32768M 69632M 147456M 311296M 655360M 
bearnol:

No. Bloody. Way. Keep your mad ideas to your own threads, please. Alex 
we now have linux running recognizing a full 64Gb ram. today I plan to start eight simultaneous processes each doing curves on F14 with B1 = 1.6e9 and B2=160e9. I head back to college tomorrow, so I will leave it running until friday morning.

B1=1.6e9 is too high. B1=110M would be more reasonable. And better let some cpus produce only stage 1 residues and run stage 2 on the other cpus so they can use more memory. You could use dF=262144 and k=2 then. I.e. put stage 1 residues into a file "F14.ecm.110M.save" (or whatever name you like) and run
ecm k 2 power 1 v resume F14.ecm.110M.save 110e6 5e12 on at most 4 processes (each with a separate file of stage 1 residues!). Alex 
Also consider using mprime to create those stage 1 residues. It should be faster than GMPECM.
And let me know about curve counts so that I can update http://mersenne.org/ecmf.htm 
