mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > GMP-ECM

Reply
 
Thread Tools
Old 2016-11-07, 17:31   #1
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

2×7×263 Posts
Default GMP-ECM Messages Killed/Aborted/cannot allocate memory

I'm getting some messages on some of my "aged" machines and would like to know some more details, if possible.

I'm running the following command on many threads/machines:
Code:
ecm -maxmem NNNN -save residuesNNN.txt 2900000000 2900000000 <ecmIn
This is to catch the stage 1 residues to provide to someone with a better computer for handling stage 2 operations.

I know the -maxmem option is really only for stage 2, but some of the troubles appear to be a memory issue, so I'm trying it in the command.

My trouble is that I keep getting "Killed," "Aborted" and:
Code:
GNU MP: Cannot allocate memory (size=537395216)
The above is from a machine with -maxmem set to 1000. These errors are mixed in with successful runs, but on some of my machines the ratio of errors to successes is pretty high.
I often get this specific size, as well:
Code:
GNU MP: Cannot allocate memory (size=134348816)
Test runs with verbose don't seem to get me much more information:
Code:
ecm -v -v -v -maxmem 1000 -save residueTest.txt 2900000000 2900000000 <ecmIn >>ecmTestRun
Killed
ecmTestRun:
Code:
GMP-ECM 7.0.3 [configured with GMP 6.1.1, --enable-asm-redc] [ECM]
Tuned for x86_64/core2/params.h
Running on math42
Input number is 2946089330333814475136036009797674301714904698125983205350145085
43624382856308454582943295881369613178204666034390617841242524699661693911485248
69909406500896547611862071404959591325864761463 (191 digits)
Using MODMULN [mulredc:1, sqrredc:1]
I haven't tried running ecm under gdb, since I'm still not familiar with it enough and am assuming the Killed/Aborted/size messages are from normal ECM actions, rather than crashes.

I haven't really found anything explaining the messages in the documentation, although I might not have done a thorough enough search.

Does GMP-ECM try to check for enough memory for stage 2 before it completes stage 1? Is there a time when GMP-ECM needs a large block in stage 1? The documentation seems to say, "No."

Any thoughts, or suggestions? Or, are these machines just on the edge of their capabilities?

Thanks for any assistance that can be provided...

Last fiddled with by EdH on 2016-11-07 at 17:34 Reason: just because...
EdH is offline   Reply With Quote
Old 2016-11-08, 07:34   #2
GP2
 
GP2's Avatar
 
Sep 2003

13×199 Posts
Default

How many simultaneous threads are you running on the same machine? If it's a large enough number, maybe you're exceeding the total physical memory of the machine.

I have seen GMP-ECM processes getting killed when that happens (usually in stage 2, of course). Not sure what does the killing, maybe the operating system?
GP2 is offline   Reply With Quote
Old 2016-11-08, 14:51   #3
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

E6216 Posts
Default

Quote:
Originally Posted by GP2 View Post
How many simultaneous threads are you running on the same machine? If it's a large enough number, maybe you're exceeding the total physical memory of the machine.
I'm using either two or four threads (based on what linux shows with "cat proc/cpuinfo" and getting this on many of them. I have at least one that I knocked down to a single thread and still see it. I adjusted all the maxmems to match well less than what "top" shows free. Most of the machines have 4GB RAM.

I forgot to ask, in the allocation message, is the size given what is wanted or what is available?

Quote:
Originally Posted by GP2 View Post
I have seen GMP-ECM processes getting killed when that happens (usually in stage 2, of course). Not sure what does the killing, maybe the operating system?
These messages are at just a few minutes into a run, when a full stage 1 will take several hours. This makes it "seem" that it hasn't anything to do with stage 2, but since stage 1 isn't supposed to use much memory and I should have a fair chunk, it's puzzling to me.

Thanks...
EdH is offline   Reply With Quote
Old 2016-11-09, 00:02   #4
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

22×7×132 Posts
Default

My stage1 on these curves (C191, B1= 29e8) are using 400-450MB according to top. I don't know why your 4GB machines are puking on the curves. Puzzling.
My machine does have tons of excess memory, so perhaps a small part of stage 1 uses more memory than we realize?
VBCurtis is online now   Reply With Quote
Old 2016-11-09, 01:48   #5
Gordon
 
Gordon's Avatar
 
Nov 2008

7658 Posts
Default

Quote:
Originally Posted by EdH View Post
I'm running the following command on many threads/machines:
Code:
ecm -maxmem NNNN -save residuesNNN.txt 2900000000 2900000000 <ecmIn
This is to catch the stage 1 residues to provide to someone with a better computer for handling stage 2 operations.

I know the -maxmem option is really only for stage 2, but some of the troubles appear to be a memory issue, so I'm trying it in the command.

My trouble is that I keep getting "Killed," "Aborted" and:
Code:
GNU MP: Cannot allocate memory (size=537395216)
The above is from a machine with -maxmem set to 1000. These errors are mixed in with successful runs, but on some of my machines the ratio of errors to successes is pretty high.
I often get this specific size, as well:
Code:
GNU MP: Cannot allocate memory (size=134348816)
Test runs with verbose don't seem to get me much more information:
Code:
ecm -v -v -v -maxmem 1000 -save residueTest.txt 2900000000 2900000000 <ecmIn >>ecmTestRun
Killed
ecmTestRun:
Code:
GMP-ECM 7.0.3 [configured with GMP 6.1.1, --enable-asm-redc] [ECM]
Tuned for x86_64/core2/params.h
Running on math42
Input number is 2946089330333814475136036009797674301714904698125983205350145085
43624382856308454582943295881369613178204666034390617841242524699661693911485248
69909406500896547611862071404959591325864761463 (191 digits)
Using MODMULN [mulredc:1, sqrredc:1]
I haven't tried running ecm under gdb, since I'm still not familiar with it enough and am assuming the Killed/Aborted/size messages are from normal ECM actions, rather than crashes.

I haven't really found anything explaining the messages in the documentation, although I might not have done a thorough enough search.

Does GMP-ECM try to check for enough memory for stage 2 before it completes stage 1? Is there a time when GMP-ECM needs a large block in stage 1? The documentation seems to say, "No."

Any thoughts, or suggestions? Or, are these machines just on the edge of their capabilities?

Thanks for any assistance that can be provided...
I had to check your memory request numbers twice, they are only 500 meg, I've run into memory allocation errors trying to allocate 4gb before (on a 32gb machine).

Your "Test" is on a B1 of 2.9 billion right? and 191 digits (633 bits) is a really tiny number have you tried GPU-ECM?
Gordon is offline   Reply With Quote
Old 2016-11-09, 05:23   #6
WraithX
 
WraithX's Avatar
 
Mar 2006

47310 Posts
Default

Quote:
Originally Posted by EdH View Post
II'm running the following command on many threads/machines:
Code:
ecm -maxmem NNNN -save residuesNNN.txt 2900000000 2900000000 <ecmIn
My trouble is that I keep getting "Killed," "Aborted" and:
Code:
GNU MP: Cannot allocate memory (size=537395216)
Does GMP-ECM try to check for enough memory for stage 2 before it completes stage 1? Is there a time when GMP-ECM needs a large block in stage 1? The documentation seems to say, "No."

Any thoughts, or suggestions? Or, are these machines just on the edge of their capabilities?

Thanks for any assistance that can be provided...
Originally, for stage 1, GMP-ECM would multiply a point on an elliptic curve by all the primes, and prime powers, one at a time to get the next point on the curve. ie, it would do p1^e1*Q1 to get Q2, then do p2^e2*Q2 to get Q3, etc. running through all prime powers such that p_i^e_i < B1. This is pretty quick and requires very little memory overhead.

Now, GMP-ECM has added a feature that is called "batch mode". This multiplies all prime powers, with p_i^e_i < B1, together before starting stage 1, and then uses that product to multiply with the starting point on the elliptic curve. ie, it will do (p1^e1 * p2^e2 * ... * pn^en) and save that into a variable s. Then it will finish stage 1 by multiplying s*Q1. This turns out to be much faster than the original method. However, one drawback is that you now have to generate all the primes at once, and multiply them all together before starting your s*Q1 step.

For your machines, your programs are crashing during the calculation of s. The error you are seeing is a GMP specific error, and not a GMP-ECM error. The error is saying that at some point during the calculation of s, GMP needed to allocate more memory to store the next multiple in s, however it wasn't able to get that memory because too much had already been allocated during earlier parts of calculating s.

The easy way to get around this is to use the original multiplication method which can be accomplished with the command line argument "-param 0". All other "params" are related to batch mode and would thus require a lot of memory with large B1's. While "-param 0" will be a little bit slower, you will be able to use large B1's. Give this a try and let us know how it goes.
WraithX is offline   Reply With Quote
Old 2016-11-09, 15:38   #7
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

2·7·263 Posts
Default

Thanks WraithX! Your explanation is very helpful. It explains why, on some of the machines, it appears one thread has a high success rate while another has a high failure rate. The first one got its s completed and left no room for the other. I had suspected this, but thought that the maxmem would prevent that.

I have switched three high failure rate machines over and should know within an hour or so if this is successful. Here're some earlier runs from one of my machines:
Code:
Current pass started at 16:51:33
ecm -maxmem 1000 -save residues43b.txt 2900000000 2900000000
ECM took 31453 seconds
ECM took 8h 44m 13s
Current pass started at 01:35:47
ecm -maxmem 1000 -save residues43b.txt 2900000000 2900000000
GNU MP: Cannot allocate memory (size=268697616)
ECM took 1632 seconds
ECM took 0h 27m 12s
Current pass started at 02:02:59
ecm -maxmem 1000 -save residues43b.txt 2900000000 2900000000
GNU MP: Cannot allocate memory (size=268697616)
ECM took 1293 seconds
ECM took 0h 21m 33s
Current pass started at 02:24:32
ecm -maxmem 1000 -save residues43b.txt 2900000000 2900000000
GNU MP: Cannot allocate memory (size=268697616)
ECM took 1277 seconds
ECM took 0h 21m 17s
Current pass started at 02:45:49
One success followed by three failures! Even if somewhat slower, they should turn out more residues. The above shows over an hour of wasted time against a successful 8.75 hours.

Am I safe to assume I can remove the maxmem for stage 1 runs?

@Gordon:

All three of my nVidia cards are ancient, unfortunately. They have 1.2 and 1.3 architecture. I have CUDA 6 on one of my machines, but GMP-ECM considers it too old. I haven't totally given up, but that's on a side table for now.

Thanks again, everyone...
EdH is offline   Reply With Quote
Old 2016-11-09, 16:38   #8
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

2×7×263 Posts
Default Short-term Update

The three machines that have been swapped over to -param 0 are all past an hour, with two threads each, and no failures. As a bonus, they are all much more responsive to headless operations...

Thanks!!
EdH is offline   Reply With Quote
Old 2016-11-10, 04:27   #9
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

E6216 Posts
Default Longer-term Update

Definitely an increase in processing time that may actually be greater than the loss from "collisions."

The machine that took 8h 44m 13s in a previous message, just came back with a new time of 10h 11m 26s.

However, its response is much more prompt when communicating with it, which is a bonus.
EdH is offline   Reply With Quote
Old 2016-11-10, 06:15   #10
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

22×7×132 Posts
Default

More importantly, we now know it *is* related to the B1 bound, so in the future you can experiment with -param 0 when memory demands get too big. Sounds like this case is right on the cusp, where perhaps one thread with this flag and the rest without might stay within the machine's available memory?

I have two cores running stage 2 for your curves, about 10 days to go on each. The 100 curves I ran myself are also in stage 2 now; reduce your remaining curve count accordingly.
VBCurtis is online now   Reply With Quote
Old 2016-11-10, 15:16   #11
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

2·7·263 Posts
Default

@VBCurtis: It seems more like I could let one thread run free and use the -param 0 on the rest. This is actually quite easy to accomplish across my machines with the scripts I'm using. I shall probably give it a try.

I will probably have to wait for later today, but I'll gather the curves I have and look at what we were figuring earlier. It seems like we were expecting to do 1000 above what we already had as of Sunday. I will try to turn my capable machines over to stage 2 on a few curves and pass the rest to you. That way I'm still working on some at the same time. Sound like a plan?

@WraithX: If I'm following correctly, s will be the same value for all iterations with the same B1. If I'm not lost yet, in my case, this would mean all my threads would be using the same s. With slight modification, could I not calculate s outside of the threads and reference it to each thread? I should be able to lock a memory space with my original calling script and then pass the location to all threads (unless all indexing is now relative and unique between processes). This should be persistent until the initial script closes. I would need to free the location prior to that close and that would have to be after all child scripts have completed. Is that something that sounds like I'm on a possible path, or have I stumbled too far over a ledge?

Thanks everyone...
EdH is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
OMG, NSA Killed ECC! (You bastards!) R.D. Silverman Tales From the Crypt(o) 48 2015-11-16 22:22
A potential cause of Windows low-memory messages cheesehead Software 14 2013-05-16 00:45
Team sieve: c160 from 4788:i2715 (ABORTED) fivemack Aliquot Sequences 15 2011-10-29 10:12
bin Laden killed in groundstrike ixfd64 Soap Box 33 2011-05-12 02:00
Iran Exonerates Six Who Killed in Islam’s Name ewmayer Soap Box 6 2007-04-30 01:39

All times are UTC. The time now is 02:07.

Tue Apr 13 02:07:09 UTC 2021 up 4 days, 20:48, 1 user, load averages: 3.07, 2.78, 2.65

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.