mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > PrimeNet

Reply
 
Thread Tools
Old 2010-10-01, 02:47   #353
Mini-Geek
Account Deleted
 
Mini-Geek's Avatar
 
"Tim Sorbera"
Aug 2006
San Antonio, TX USA

17×251 Posts
Default

Quote:
Originally Posted by ixfd64 View Post
This is probably a silly question,
Not really.
Quote:
Originally Posted by ixfd64 View Post
does the "P-1/ECM stage 2 memory" refer to the allocation for each core or each processor? For example, I've heard that at least 256 MB of memory should be allowed for P-1 assignments. If I had a dual-core processor, should I enter 256 or 512 MB?
The amount you want (256 MB sounds like a good amount for a computer you use, in the readme.txt that's the desirable amount for exponents up to about 50M; for dedicated ones or if you have plenty of free memory, set it higher) should be available for each worker for its P-1 stage 2.
The memory limit you set in Options > CPU is for the whole processor (more accurately: the whole Prime95 instance). The workers share from that as needed. If they'll be doing P-1 stage 2 at the same time, that'd mean you should enter 512 MB (it will see that 512 MB is available and two workers need to run P-1, and split it to 256 MB per worker). If they'll be doing it at separate times, 256 MB will be fine (it will see that 256 MB is available and only one worker needs to use it at a time, and each will get 256 MB in their turn).
Or you can try to set the memory limit for each worker individually, (this would probably be best: you don't have to worry about when they do it - they'll each just take 256 MB when needed, whether that means you use 0, 256, or 512 MB for stage 2 at that moment) but this might not be working yet (see http://www.mersenneforum.org/showthr...097#post232097 and the link in it, and the post after it, for more info).

Last fiddled with by Mini-Geek on 2010-10-01 at 02:57
Mini-Geek is offline   Reply With Quote
Old 2010-10-01, 02:54   #354
Kevin
 
Kevin's Avatar
 
Aug 2002
Ann Arbor, MI

433 Posts
Default

Actually, I'd suggest putting it a little bit higher than that. Since the memory limit you input is for the entire instance of Prime95, even if only one core is doing P-1 stage 2, if the other core is doing LL work, then Prime95 will account for the 20-30MB of memory that test requires and only assign P-1 stage 2 ~230MB.
Kevin is offline   Reply With Quote
Old 2010-10-01, 03:22   #355
Mini-Geek
Account Deleted
 
Mini-Geek's Avatar
 
"Tim Sorbera"
Aug 2006
San Antonio, TX USA

17×251 Posts
Default

Quote:
Originally Posted by Kevin View Post
Actually, I'd suggest putting it a little bit higher than that. Since the memory limit you input is for the entire instance of Prime95, even if only one core is doing P-1 stage 2, if the other core is doing LL work, then Prime95 will account for the 20-30MB of memory that test requires and only assign P-1 stage 2 ~230MB.
I don't think it does that. I just tested it, with the memory set to 100 MB: whether another worker is running an LL or not, the P-1 assumes it has 100 MB. But then I tested it again with two P-1s and each assumes it has 100 MB to work with, so...I don't know, I guess when it starts the stage 2s it splits it then. It must be less efficient to let it choose bounds for 100 MB and then force it into 50 MB, but it doesn't look like the worker-specific settings are working yet.
I'd say the best bet would be to get the two workers to do stage 2 at separate times and set the whole instance to 256 MB.
This might be of note:
Code:
You can set MaxHighMemWorkers=n in local.txt.  This tells the program how
wany workers are allowed to use lots of memory.  This occurs doing stage 2
of P-1 or ECM on medium-to-large numbers.  Default is available memory / 200MB.
(emphasis mine)
So, if I'm interpreting this right, it will already automatically choose that only one worker can do high-memory work at a time. (as long as 256/200 is rounded down) Hopefully it does this in a graceful way, (instead of just having one skip stage 2 or something) but in any case, this may play into this.

Sorry for putting so much detail and confusion in to such a simple question. But it's surprisingly confusing when you get down to the nitty gritty (not to mention bugs, like the worker-specific memory limits). Luckily, there isn't too much riding on you getting your settings perfectly right and efficient: just a tiny difference of chance for the numbers you P-1 (which is small even if you are only doing P-1, and nearly insignificant if you're just doing it as part of LL tests).

Last fiddled with by Mini-Geek on 2010-10-01 at 03:26
Mini-Geek is offline   Reply With Quote
Old 2010-10-01, 03:44   #356
Kevin
 
Kevin's Avatar
 
Aug 2002
Ann Arbor, MI

433 Posts
Default

Well right now I have 2 workers out of 4 running P-1 stage 2 on an i5 with global memory settings of 2056 MB, and each one is only using 988MB of memory. Generally, I think the program will say "x MB" of memory available at the start of stage 2, but then only actually uses something like (x-20) MB (or maybe .95x MB, who knows).

Also, I wanted to suggest something a little higher because I thought the actual minimum for P-1 testing was 300MB, not 256MB (and now that I've looked it up, that is indeed the case).
Kevin is offline   Reply With Quote
Old 2010-10-01, 12:35   #357
Mini-Geek
Account Deleted
 
Mini-Geek's Avatar
 
"Tim Sorbera"
Aug 2006
San Antonio, TX USA

10AB16 Posts
Default

Quote:
Originally Posted by Kevin View Post
Well right now I have 2 workers out of 4 running P-1 stage 2 on an i5 with global memory settings of 2056 MB, and each one is only using 988MB of memory. Generally, I think the program will say "x MB" of memory available at the start of stage 2, but then only actually uses something like (x-20) MB (or maybe .95x MB, who knows).
I've noticed that the "x MB used" is always a little less than what you say it can use, even when that P-1 is the only thing running. I think this is because the memory usage goes up in discrete jumps with the number of relative primes being processed. I think that Prime95 chooses a number of relative primes that will make the memory usage close (I think as close as possible without going over, but I don't have a very good definite data point to say it can't be just the closest) to the allowed memory (whether that "allowed memory" is [set number] or [set number]/2 to split between two workers or [set number]-20 to make room for LL or what).
e.g. I recently did a P-1 on M53250707. It was the only worker. When I set the memory to:
500 MB it does 16 relative primes at a time, and reports 491 MB used
800 MB it does 29 relative primes at a time, and reports 788 MB used
1000 MB it does 38 relative primes at a time, and reports 993 MB used
(this was all after the B1 and B2 had been chosen; FFT length for P-1 was 2880K, B1=625000, B2=16250000)
I suspected that the memory needed would directly or linearly relate with the relative primes calculated at a time, and I seem to be right (from these data points). I calculated that [memory usage] ~= 22.85*[number of relative primes at a time] + 125.4. (expect these constants to vary greatly with the p, B1, and B2) It appears that Prime95 always chose the number of relative primes that puts the predicted memory usage as close to the allowed amount without going over.
This can explain why your P-1 workers are only using 988 MB of memory, even without taking memory out for LL: if each considers itself to have 1028 MB available, and the next higher relative prime would take more than (1028-988=)40 MB, then it'd limit itself to that.

Last fiddled with by Mini-Geek on 2010-10-01 at 12:45
Mini-Geek is offline   Reply With Quote
Old 2010-10-01, 13:28   #358
Mini-Geek
Account Deleted
 
Mini-Geek's Avatar
 
"Tim Sorbera"
Aug 2006
San Antonio, TX USA

17×251 Posts
Default

Two more results from more testing I've done:
It picks the closest without going over, not the closest. e.g. if 15 rel. primes will take 291 MB, and that worker is allowed 290 MB, it will go all the way down to 14 rel. primes (277 MB) rather than go over.
You're right, it does reserve some memory for LL threads. It changes depending on the FFT size, of course, but it's roughly (FFT size)*12.5 bytes (e.g. 1792K*12.5 bytes = 22400 KB ~= 22 MB) for each LL. This amount is subtracted from the "Available memory is x MB" message on the P-1 worker.
Between these two effects, I think I get how your memory is distributed, and how all that works.

Last fiddled with by Mini-Geek on 2010-10-01 at 13:43
Mini-Geek is offline   Reply With Quote
Old 2010-10-01, 14:25   #359
Rhyled
 
Rhyled's Avatar
 
May 2010

32×7 Posts
Default

Quote:
Originally Posted by Mini-Geek View Post
Two more results from more testing I've done:
It picks the closest without going over, not the closest. e.g. if 15 rel. primes will take 291 MB, and that worker is allowed 290 MB, it will go all the way down to 14 rel. primes (277 MB) rather than go over.
I concurr with the "not going over" scenario, as that's what I found while doing some similar benchmarking a couple of months ago.

To add another tidbit - additional memory, at least beyond a certain point, does not speed up the P-1 task noticeably. The total number of iterations didn't change, and neither did the time per iteration (beyond the 1% noise level). What did happen with larger memory allocations is that the estimated % of finding a factor went up very slightly. 2000 MB gave me a 6.88% chance, 3000 MB gave 6.91%

Just don't set the memory so high that it starts disk swapping. Swapping will slow your machine to a relative crawl.
Rhyled is offline   Reply With Quote
Old 2010-10-01, 15:17   #360
garo
 
garo's Avatar
 
Aug 2002
Termonfeckin, IE

2·5·251 Posts
Default

Note that the 300MB minimum for P-1 is not the minimum needed for P-1 to be performed effectively. It is the minimum memory you need to have before the server will assign you P-1 tests. You could manually ask for P-1 tests and do perfectly OK with 256MB. This is what George says in the readme.txt:

Quote:
4) Factor in the information below about minimum, reasonable, and
desirable memory amounts for some sample exponents. If you choose a
value below the minimum, that is OK. The program will simply skip
stage 2 of P-1 factoring.

Exponent Minimum Reasonable Desirable
-------- ------- ---------- ---------
20000000 40MB 80MB 120MB
33000000 65MB 125MB 185MB
50000000 85MB 170MB 250MB
garo is offline   Reply With Quote
Old 2010-10-01, 21:07   #361
davieddy
 
davieddy's Avatar
 
"Lucan"
Dec 2006
England

193316 Posts
Default

Quote:
Originally Posted by garo View Post
Note that the 300MB minimum for P-1 is not the minimum needed for P-1 to be performed effectively. It is the minimum memory you need to have before the server will assign you P-1 tests. You could manually ask for P-1 tests and do perfectly OK with 256MB. This is what George says in the readme.txt:
I'm a bit out of touch with bytes of RAM since my ZX81 days
(1K with 16K pack extra), but aren't we talking peanuts here?

David

Yes I do know what cache means.

Last fiddled with by davieddy on 2010-10-01 at 21:33
davieddy is offline   Reply With Quote
Old 2010-10-01, 22:47   #362
Kevin
 
Kevin's Avatar
 
Aug 2002
Ann Arbor, MI

433 Posts
Default

Quote:
Originally Posted by davieddy View Post
I'm a bit out of touch with bytes of RAM since my ZX81 days
(1K with 16K pack extra), but aren't we talking peanuts here?

David

Yes I do know what cache means.
Peanuts in terms of impact to a system, or peanuts in terms of likelihood of finding a factor? The extra 50MB of memory is only going to marginally increase the chances of finding a factor, but there's really no reason to not do it if the impact to your system is negligible. I presume if the person was debating between 256MB and 512MB, that means that 512MB was an option, and they could afford to move up to 300MB (which is closer to where the marginal benefit of additional memory begins to become negligible).
Kevin is offline   Reply With Quote
Old 2010-11-03, 11:46   #363
lorgix
 
lorgix's Avatar
 
Sep 2010
Scandinavia

3×5×41 Posts
Default

Quote:
Originally Posted by Mini-Geek View Post
I don't think it does that. I just tested it, with the memory set to 100 MB: whether another worker is running an LL or not, the P-1 assumes it has 100 MB. But then I tested it again with two P-1s and each assumes it has 100 MB to work with, so...I don't know, I guess when it starts the stage 2s it splits it then. It must be less efficient to let it choose bounds for 100 MB and then force it into 50 MB, but it doesn't look like the worker-specific settings are working yet.
I'd say the best bet would be to get the two workers to do stage 2 at separate times and set the whole instance to 256 MB.
This might be of note:
Code:
You can set MaxHighMemWorkers=n in local.txt.  This tells the program how
wany workers are allowed to use lots of memory.  This occurs doing stage 2
of P-1 or ECM on medium-to-large numbers.  Default is available memory / 200MB.
(emphasis mine)
So, if I'm interpreting this right, it will already automatically choose that only one worker can do high-memory work at a time. (as long as 256/200 is rounded down) Hopefully it does this in a graceful way, (instead of just having one skip stage 2 or something) but in any case, this may play into this.

Sorry for putting so much detail and confusion in to such a simple question. But it's surprisingly confusing when you get down to the nitty gritty (not to mention bugs, like the worker-specific memory limits). Luckily, there isn't too much riding on you getting your settings perfectly right and efficient: just a tiny difference of chance for the numbers you P-1 (which is small even if you are only doing P-1, and nearly insignificant if you're just doing it as part of LL tests).
Quote:
Originally Posted by Mini-Geek View Post
I've noticed that the "x MB used" is always a little less than what you say it can use, even when that P-1 is the only thing running. I think this is because the memory usage goes up in discrete jumps with the number of relative primes being processed. I think that Prime95 chooses a number of relative primes that will make the memory usage close (I think as close as possible without going over, but I don't have a very good definite data point to say it can't be just the closest) to the allowed memory (whether that "allowed memory" is [set number] or [set number]/2 to split between two workers or [set number]-20 to make room for LL or what).
e.g. I recently did a P-1 on M53250707. It was the only worker. When I set the memory to:
500 MB it does 16 relative primes at a time, and reports 491 MB used
800 MB it does 29 relative primes at a time, and reports 788 MB used
1000 MB it does 38 relative primes at a time, and reports 993 MB used
(this was all after the B1 and B2 had been chosen; FFT length for P-1 was 2880K, B1=625000, B2=16250000)
I suspected that the memory needed would directly or linearly relate with the relative primes calculated at a time, and I seem to be right (from these data points). I calculated that [memory usage] ~= 22.85*[number of relative primes at a time] + 125.4. (expect these constants to vary greatly with the p, B1, and B2) It appears that Prime95 always chose the number of relative primes that puts the predicted memory usage as close to the allowed amount without going over.
This can explain why your P-1 workers are only using 988 MB of memory, even without taking memory out for LL: if each considers itself to have 1028 MB available, and the next higher relative prime would take more than (1028-988=)40 MB, then it'd limit itself to that.
I wondered about all this myself a while back.

All types of work require memory. LL only takes a few MB.
The actual amount used for P-1 is displayed only when stage 2 starts.

I deduced linear functions pretty much the same way you did.

I presently have 845MB allowed 24/7, and I'm running two workers. I also use MaxHighMemWorkers=1, which works great.

All workers restart with the new memory setting when highmem work is finished, so that a worker that has moved on in its work queue because another worker is running stage 2 will stop whatever non-stage2-work it's doing and go back to highmem work in its own queue when the other worker finishes its highmem work. That way you don't "waste" memory by not allocating it.

Last fiddled with by lorgix on 2010-11-03 at 11:47 Reason: typo
lorgix is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 13:14.

Tue Nov 24 13:14:43 UTC 2020 up 75 days, 10:25, 4 users, load averages: 1.63, 1.48, 1.74

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.