![]() |
![]() |
#1 |
"Cas Wegkamp"
Sep 2013
The Netherlands
22 Posts |
![]()
Currently I am running 4 jobs on my machine, an i7 3770 quad core HT. Obviously, progress is slow with numbers in the 65999XXX range. The iterations are more than four times slower than when I ran a single job, to be precise 6,375 times (a strangely nice number popped out :S).
So I am assuming it would be better to assign all cores to a single job but how do I stop 3 out of 4 workers from automatically getting more work? Also, where is the helpfile for this program, it seems it wasn't included in the 64bit version. Steps I have currently taken to do this are the following: I've changed ... - minutes between network retries to 300 - days of work to queue up to 0 - days between sending new end dates to 7 Especially the second one seems like what I need to do, getting 0 days worth of work queued up since any job will take longer than a day. But is there a way to tell the program "When this job is done, don't do anything until I tell you to"? |
![]() |
![]() |
![]() |
#2 | |
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
19·232 Posts |
![]() Quote:
However, the harder problem is why it appears that you can do more with 1 4-core job that with 4 1-core jobs. Normally this is not the case, and if it is the case in your settings, then your settings are off. Someone else will tell you how to set the so-called affinity scrambling (you need someone with a system similar to yours). |
|
![]() |
![]() |
![]() |
#3 |
"Jeff"
Feb 2012
St. Louis, Missouri, USA
13·89 Posts |
![]()
Someone smart will undoubtedly respond with more helpful advice, but I would like some clarification. Running four LLs should be slightly faster (lets guess about 12% faster) than running all four cores on one LL four times.
There's a memory hangup which makes it so. If you are seeing something different then we need to figure out why your machine is not optimized for running the four cores. Heat would seem to be the most likely candidate for such a slowdown. What kind of iterations times are you seeing. I'll compare them with my i5-2570k when I get to work. I would guess your seeing around 35 ms? I think I generally see 39-40ish. (edit) see one of the smart guys already beat me to the punch! Last fiddled with by chappy on 2013-10-09 at 01:10 Reason: Sergey's typing skillz exceed my own. |
![]() |
![]() |
![]() |
#4 |
"Kieren"
Jul 2011
In My Own Galaxy!
27AE16 Posts |
![]()
There are others who can answer some of your questions better than I can. I will start from the end and work backward on the stuff I know. I'm not saying you necessarily want to do this, but you can keep more work from being obtained by putting the following in prime.txt:
NoMoreWork=1 There is no file named "help". However, there are several informative .txt files- readme.txt, stress.txt, undoc.txt, and whatsnew.txt. Two other files contain configuration settings- local.txt and prime.txt. Results.txt is exactly that: your results. Prime.log is a complementary record of program operations. EDIT: worktodo.txt is pretty self-explanatory, though the lines in it must be formatted in particular ways which are described in the informative .txt files. Last fiddled with by kladner on 2013-10-09 at 01:18 |
![]() |
![]() |
![]() |
#5 |
"Mr. Meeseeks"
Jan 2012
California, USA
3·52·29 Posts |
![]()
What is the speed of your memory?
EDIT: also to add to chappy, thermals may well be a problem. Last fiddled with by kracker on 2013-10-09 at 01:35 |
![]() |
![]() |
![]() |
#6 |
Aug 2002
North San Diego County
11000111112 Posts |
![]()
[Assuming Windows and hyperthreading enabled]
Don't run more than 4 threads and in local.txt try Code:
AffinityScramble2=02461357 If HT disabled, AffinityScramble2 is not needed. Last fiddled with by sdbardwick on 2013-10-09 at 01:28 |
![]() |
![]() |
![]() |
#7 | ||||
Romulan Interpreter
"name field"
Jun 2011
Thailand
101000001000012 Posts |
![]() Quote:
WorkerThreads=1 ThreadsPerTest=4 Save as text. Edit "worktodo.txt" and put all assignments in the same section. I.e. delete "[worker 2]", "[worker 3]" etc lines and move everything in [worker 1]. Don't delete "[worker 1]" line. The numbers of workers need to match with worker threads. Each thread will use "threads per test" cores (physical - or logical cores if you have more threads than phys cores) The product (mathematical multiplication) of the two numbers (workers times threads) must be equal with the number of cores you want to allocate to P95. Don't allocate logical cores, only physical. P95 is well optimized, so it will generally not take any advantages from logical cores, it can use a physical core to maximum. For example, if you have 4 phys cores (8 logical, with hyperthreading), use 1 and 4 as in the example above. Save. Restart P95. After restarting P95, you have as many windows as workers (only one in your case) so it would be easy to see the results too. (less windows, more space for each). Quote:
Quote:
![]() Assignment problems should be solved by setting the right parameters for threads, cores, etc, as explained above. You may want to use a "queue" of at least 5 days, to have some work to do in case the computer can't connect to internet for few days. Days to send results? Set it to 1, to have P95 exchanging info with the server every day (if it is online; if not, this setting won't matter). Read the text docs. Quote:
Again, accentuating on what other people say: the fact you see a higher speed when you run one worker is contrarily to what we (all other people here) experience with our computers. This could mean (my best bet) a memory band limitation on your computer (4 workers need to exchange more data with the memory than a single worker, you need to read 4 residues at every iteration), or (my second bet) - a heat problem (4 workers heat the cores harder, as a single worker need some time to move the data from one core to another, having "dead time" in which the cores cool). Four workers are always faster than 1 worker, assuming your computer has no band limitation for memory transfer, and it can get rid faster of the produced heat. TL;DR version: 4 workers, on 4 cores: read 4 residues from memory (very big numbers!), each core multiplying its own sh!t, writing back the residues. Need high memory band, producing lots of heat. The most productive. 1 worker on 4 cores: reads 1 residue from memory, but "spread" this residue on all cores, giving to each core a quarter of multiplication. At the end, collects the results from all cores and write one residue to memory. Here one iteration is 4 times faster to compute, because 4 cores participate into computing it, but some time is lost to "share" between the cores, and "collect" the results, therefore if the iteration time for the "4 workers" scenario is about 80ms, than you will not get 20ms in the "1 worker" scenario, but you will get 21, or 22, depending of how efficient your CPU is. "Sharing" time is "cooling time" for the CPU, as the cores don't do too much calculus in this time. In the first scenario you will do 4 exponents in (say) 80 days (it will take 80 times the number of iterations, depending on your exponents), but in the second scenario you will do one exponents in 21 or 22 days, therefore you will need 84 to 88 days to do 4 exponents. (these numbers are fictive, just to make you understand how things work. The real numbers depend on the exponents, and real timing on your computer). More calculus, more heat. Less calculus, less heat. Sounds logically. ![]() Last fiddled with by LaurV on 2013-10-09 at 06:14 |
||||
![]() |
![]() |
![]() |
#8 |
"Cas Wegkamp"
Sep 2013
The Netherlands
48 Posts |
![]()
Well, after reading about the memory thing and the probably my settings being messed up I did some investigation.
I've actually turned down the amount of memory the P95 is allowed to use and it turns out it is now actually calculating a lot faster! Where an iteration took ~0.05 @3500MB mem it is now down to ~0.035 @100MB mem. This is about on par with having four cores calculating 1 number ~0.008 even though that was ran with 3500MB mem available to it which apparently is detrimental to speed. This is a genuine WTF moment for me as with anything else I can think of the motto is the more the merrier. Setting available memory even lower does apperantly not affect iteration times. Not having changed *Anything* else, this begs the question: What is the optimal memory setting for P95? Above memory settings were set by the daytime and nighttime values under "Options" > "CPU...". Last fiddled with by Warlord on 2013-10-10 at 21:27 |
![]() |
![]() |
![]() |
#9 |
May 2013
East. Always East.
32778 Posts |
![]()
As far as I know, the memory is only used in P-1 factoring, but I could be wrong.
Are you running anything else while running prime? It runs with the lowest priority so anything and everything will take precedence over it. What is your memory speed? Try running the Windows Experience Index and see if you have some serious limitation in your memory, perhaps. You shouldn't be bottlenecked at one worker unless your memory is very, very slow. Could you post iteration times for 4 workers / 4 cores and 1 worker / 1 core? I find it hard to believe 1 worker is six thousand times faster. |
![]() |
![]() |
![]() |
#10 | |||||
"Richard B. Woods"
Aug 2002
Wisconsin USA
22·3·641 Posts |
![]() Quote:
When stage 2 starts during a P-1 or ECM run, P95 displays a message with the amount of the "available memory" that it is actually using. (For instance, when I set "available memory" to 1250, I typically see a message that ECM uses ~780-800M during stage 2.) If you don't see any message saying how much memory P95 is using during stage 2 (and you'll never see such a message during L-L testing), then you know that the "available memory" setting is having no effect on whatever type of work P95 is doing. Quote:
Again, P95's L-L testing does not use the "available memory" setting for anything. Quote:
Quote:
Quote:
Last fiddled with by cheesehead on 2013-10-11 at 03:23 |
|||||
![]() |
![]() |
![]() |
#11 |
"Cas Wegkamp"
Sep 2013
The Netherlands
22 Posts |
![]()
Weird, did what you said and the times did indeed stay between 0.032 and 0.037. Wonder why they went that high.
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Pausing CUDA jobs | fivemack | Software | 0 | 2013-11-27 23:50 |
R.I.P. Steve Jobs | ewmayer | Lounge | 40 | 2011-10-23 19:44 |
Jobs | R.D. Silverman | Lounge | 25 | 2009-10-15 05:41 |
How are you running your nfs jobs | schickel | Factoring | 7 | 2009-02-26 01:06 |
Filtering on large NFS jobs, particularly 2^908+1 | bdodson | Factoring | 20 | 2008-11-26 20:45 |