mersenneforum.org GMP-ECM Messages Killed/Aborted/cannot allocate memory
 Register FAQ Search Today's Posts Mark Forums Read

 2016-11-07, 17:31 #1 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 3,617 Posts GMP-ECM Messages Killed/Aborted/cannot allocate memory I'm getting some messages on some of my "aged" machines and would like to know some more details, if possible. I'm running the following command on many threads/machines: Code: ecm -maxmem NNNN -save residuesNNN.txt 2900000000 2900000000 >ecmTestRun Killed ecmTestRun: Code: GMP-ECM 7.0.3 [configured with GMP 6.1.1, --enable-asm-redc] [ECM] Tuned for x86_64/core2/params.h Running on math42 Input number is 2946089330333814475136036009797674301714904698125983205350145085 43624382856308454582943295881369613178204666034390617841242524699661693911485248 69909406500896547611862071404959591325864761463 (191 digits) Using MODMULN [mulredc:1, sqrredc:1] I haven't tried running ecm under gdb, since I'm still not familiar with it enough and am assuming the Killed/Aborted/size messages are from normal ECM actions, rather than crashes. I haven't really found anything explaining the messages in the documentation, although I might not have done a thorough enough search. Does GMP-ECM try to check for enough memory for stage 2 before it completes stage 1? Is there a time when GMP-ECM needs a large block in stage 1? The documentation seems to say, "No." Any thoughts, or suggestions? Or, are these machines just on the edge of their capabilities? Thanks for any assistance that can be provided... Last fiddled with by EdH on 2016-11-07 at 17:34 Reason: just because...
 2016-11-08, 07:34 #2 GP2     Sep 2003 32·7·41 Posts How many simultaneous threads are you running on the same machine? If it's a large enough number, maybe you're exceeding the total physical memory of the machine. I have seen GMP-ECM processes getting killed when that happens (usually in stage 2, of course). Not sure what does the killing, maybe the operating system?
2016-11-08, 14:51   #3
EdH

"Ed Hall"
Dec 2009

3,617 Posts

Quote:
 Originally Posted by GP2 How many simultaneous threads are you running on the same machine? If it's a large enough number, maybe you're exceeding the total physical memory of the machine.
I'm using either two or four threads (based on what linux shows with "cat proc/cpuinfo" and getting this on many of them. I have at least one that I knocked down to a single thread and still see it. I adjusted all the maxmems to match well less than what "top" shows free. Most of the machines have 4GB RAM.

I forgot to ask, in the allocation message, is the size given what is wanted or what is available?

Quote:
 Originally Posted by GP2 I have seen GMP-ECM processes getting killed when that happens (usually in stage 2, of course). Not sure what does the killing, maybe the operating system?
These messages are at just a few minutes into a run, when a full stage 1 will take several hours. This makes it "seem" that it hasn't anything to do with stage 2, but since stage 1 isn't supposed to use much memory and I should have a fair chunk, it's puzzling to me.

Thanks...

 2016-11-09, 00:02 #4 VBCurtis     "Curtis" Feb 2005 Riverside, CA 2·5·467 Posts My stage1 on these curves (C191, B1= 29e8) are using 400-450MB according to top. I don't know why your 4GB machines are puking on the curves. Puzzling. My machine does have tons of excess memory, so perhaps a small part of stage 1 uses more memory than we realize?
2016-11-09, 01:48   #5
Gordon

Nov 2008

7658 Posts

Quote:
 Originally Posted by EdH I'm running the following command on many threads/machines: Code: ecm -maxmem NNNN -save residuesNNN.txt 2900000000 2900000000 >ecmTestRun Killed ecmTestRun: Code: GMP-ECM 7.0.3 [configured with GMP 6.1.1, --enable-asm-redc] [ECM] Tuned for x86_64/core2/params.h Running on math42 Input number is 2946089330333814475136036009797674301714904698125983205350145085 43624382856308454582943295881369613178204666034390617841242524699661693911485248 69909406500896547611862071404959591325864761463 (191 digits) Using MODMULN [mulredc:1, sqrredc:1] I haven't tried running ecm under gdb, since I'm still not familiar with it enough and am assuming the Killed/Aborted/size messages are from normal ECM actions, rather than crashes. I haven't really found anything explaining the messages in the documentation, although I might not have done a thorough enough search. Does GMP-ECM try to check for enough memory for stage 2 before it completes stage 1? Is there a time when GMP-ECM needs a large block in stage 1? The documentation seems to say, "No." Any thoughts, or suggestions? Or, are these machines just on the edge of their capabilities? Thanks for any assistance that can be provided...
I had to check your memory request numbers twice, they are only 500 meg, I've run into memory allocation errors trying to allocate 4gb before (on a 32gb machine).

Your "Test" is on a B1 of 2.9 billion right? and 191 digits (633 bits) is a really tiny number have you tried GPU-ECM?

2016-11-09, 05:23   #6
WraithX

Mar 2006

23×59 Posts

Quote:
 Originally Posted by EdH II'm running the following command on many threads/machines: Code: ecm -maxmem NNNN -save residuesNNN.txt 2900000000 2900000000
Originally, for stage 1, GMP-ECM would multiply a point on an elliptic curve by all the primes, and prime powers, one at a time to get the next point on the curve. ie, it would do p1^e1*Q1 to get Q2, then do p2^e2*Q2 to get Q3, etc. running through all prime powers such that p_i^e_i < B1. This is pretty quick and requires very little memory overhead.

Now, GMP-ECM has added a feature that is called "batch mode". This multiplies all prime powers, with p_i^e_i < B1, together before starting stage 1, and then uses that product to multiply with the starting point on the elliptic curve. ie, it will do (p1^e1 * p2^e2 * ... * pn^en) and save that into a variable s. Then it will finish stage 1 by multiplying s*Q1. This turns out to be much faster than the original method. However, one drawback is that you now have to generate all the primes at once, and multiply them all together before starting your s*Q1 step.

For your machines, your programs are crashing during the calculation of s. The error you are seeing is a GMP specific error, and not a GMP-ECM error. The error is saying that at some point during the calculation of s, GMP needed to allocate more memory to store the next multiple in s, however it wasn't able to get that memory because too much had already been allocated during earlier parts of calculating s.

The easy way to get around this is to use the original multiplication method which can be accomplished with the command line argument "-param 0". All other "params" are related to batch mode and would thus require a lot of memory with large B1's. While "-param 0" will be a little bit slower, you will be able to use large B1's. Give this a try and let us know how it goes.

 2016-11-09, 15:38 #7 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 3,617 Posts Thanks WraithX! Your explanation is very helpful. It explains why, on some of the machines, it appears one thread has a high success rate while another has a high failure rate. The first one got its s completed and left no room for the other. I had suspected this, but thought that the maxmem would prevent that. I have switched three high failure rate machines over and should know within an hour or so if this is successful. Here're some earlier runs from one of my machines: Code: Current pass started at 16:51:33 ecm -maxmem 1000 -save residues43b.txt 2900000000 2900000000 ECM took 31453 seconds ECM took 8h 44m 13s Current pass started at 01:35:47 ecm -maxmem 1000 -save residues43b.txt 2900000000 2900000000 GNU MP: Cannot allocate memory (size=268697616) ECM took 1632 seconds ECM took 0h 27m 12s Current pass started at 02:02:59 ecm -maxmem 1000 -save residues43b.txt 2900000000 2900000000 GNU MP: Cannot allocate memory (size=268697616) ECM took 1293 seconds ECM took 0h 21m 33s Current pass started at 02:24:32 ecm -maxmem 1000 -save residues43b.txt 2900000000 2900000000 GNU MP: Cannot allocate memory (size=268697616) ECM took 1277 seconds ECM took 0h 21m 17s Current pass started at 02:45:49` One success followed by three failures! Even if somewhat slower, they should turn out more residues. The above shows over an hour of wasted time against a successful 8.75 hours. Am I safe to assume I can remove the maxmem for stage 1 runs? @Gordon: All three of my nVidia cards are ancient, unfortunately. They have 1.2 and 1.3 architecture. I have CUDA 6 on one of my machines, but GMP-ECM considers it too old. I haven't totally given up, but that's on a side table for now. Thanks again, everyone...
 2016-11-09, 16:38 #8 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 3,617 Posts Short-term Update The three machines that have been swapped over to -param 0 are all past an hour, with two threads each, and no failures. As a bonus, they are all much more responsive to headless operations... Thanks!!
 2016-11-10, 04:27 #9 EdH     "Ed Hall" Dec 2009 Adirondack Mtns E2116 Posts Longer-term Update Definitely an increase in processing time that may actually be greater than the loss from "collisions." The machine that took 8h 44m 13s in a previous message, just came back with a new time of 10h 11m 26s. However, its response is much more prompt when communicating with it, which is a bonus.
 2016-11-10, 06:15 #10 VBCurtis     "Curtis" Feb 2005 Riverside, CA 2×5×467 Posts More importantly, we now know it *is* related to the B1 bound, so in the future you can experiment with -param 0 when memory demands get too big. Sounds like this case is right on the cusp, where perhaps one thread with this flag and the rest without might stay within the machine's available memory? I have two cores running stage 2 for your curves, about 10 days to go on each. The 100 curves I ran myself are also in stage 2 now; reduce your remaining curve count accordingly.
 2016-11-10, 15:16 #11 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 3,617 Posts @VBCurtis: It seems more like I could let one thread run free and use the -param 0 on the rest. This is actually quite easy to accomplish across my machines with the scripts I'm using. I shall probably give it a try. I will probably have to wait for later today, but I'll gather the curves I have and look at what we were figuring earlier. It seems like we were expecting to do 1000 above what we already had as of Sunday. I will try to turn my capable machines over to stage 2 on a few curves and pass the rest to you. That way I'm still working on some at the same time. Sound like a plan? @WraithX: If I'm following correctly, s will be the same value for all iterations with the same B1. If I'm not lost yet, in my case, this would mean all my threads would be using the same s. With slight modification, could I not calculate s outside of the threads and reference it to each thread? I should be able to lock a memory space with my original calling script and then pass the location to all threads (unless all indexing is now relative and unique between processes). This should be persistent until the initial script closes. I would need to free the location prior to that close and that would have to be after all child scripts have completed. Is that something that sounds like I'm on a possible path, or have I stumbled too far over a ledge? Thanks everyone...

 Similar Threads Thread Thread Starter Forum Replies Last Post R.D. Silverman Tales From the Crypt(o) 48 2015-11-16 22:22 cheesehead Software 14 2013-05-16 00:45 fivemack Aliquot Sequences 15 2011-10-29 10:12 ixfd64 Soap Box 33 2011-05-12 02:00 ewmayer Soap Box 6 2007-04-30 01:39

All times are UTC. The time now is 10:49.

Fri Feb 26 10:49:27 UTC 2021 up 85 days, 7 hrs, 0 users, load averages: 1.12, 1.22, 1.26