![]() |
![]() |
#1 |
"David Kirkby"
Jan 2021
Althorne, Essex, UK
44810 Posts |
![]()
I'm seeing a large (factor of 3.6 or so) difference between two estimated completion times of a PRP test. This is one typical line of standard output, which I directed to a file
[Worker #7 Apr 11 17:29] Iteration: 36530000 / 110825779 [32.96%], ms/iter: 2.684, ETA: 55:22:57 So that's saying 55 hours from now. However, running mprime -m, followed by checking the status (option 3), I see [Worker thread #7] M110825779, PRP, Mon Apr 19 23:05 2021 M110825543, PRP, Sun May 2 05:28 2021 M110825587, PRP, Fri May 14 12:45 2021 M110825509, PRP, Wed May 26 20:02 2021 M110812549, PRP, Tue Jun 8 03:17 2021 As a rough estimate, the April 19th 23:05 is 198 hours away. So why does stdout indicate 55 hours to completion, but mprimes's status show 198 hours? I've tried a manual connection to the server (option 10 on mprime), and picked the option to upload completion times to the server. The server indicates the exponent will complete on the 19th April, so that agrees with mprimes status, but is very different to what standard output is showing. The reason some of those gaps between the other estimates are unequal, is because some of those exponents have been partially tested on another machine, so ignore them. The data at https://www.mersenne.org/report_expo...exp_hi=&full=1 should not be used to estimate the time, as the exponent has not consistently been running with the same number of cores. Dave Last fiddled with by drkirkby on 2021-04-11 at 17:36 |
![]() |
![]() |
![]() |
#2 |
Undefined
"The unspeakable one"
Jun 2006
My evil lair
11001011010012 Posts |
![]()
You have just encountered the standard time estimation problem.
There are many things that can make estimates bad. Perhaps the most likely is that you are running other higher priority tasks which reduce the time slice of the tests and makes the elapsed time for each iteration vary? Anyhow, I don't know what you do with your system, but anything that uses cycles, or changes the clocks on your system will mess up the estimates. |
![]() |
![]() |
![]() |
#3 |
Jun 2003
28×3×7 Posts |
![]()
ETA is accurate, but since it is based on the current iteration time, it can go up and down.
Status is based on the expected performance of the CPU. It doesn't need the test to be running (obviously). This can be wildly inaccurate if Prime95 doesn't have good data on the CPU, but will become accurate overtime as Prime95 learns the true capability of the CPU (by adjusting a fudge factor called RollingAverage). RollingAverage starts at 1000 (representing nominal CPU performance), but is adjusted twice a day based on the CPU's observed performance. FWIW, my CPU has a RollingAverage currently of 3487, which means the observed performance is about 3.5x of expected. It still isn't showing super accurate status times, but it is pretty close. Last fiddled with by axn on 2021-04-11 at 17:36 |
![]() |
![]() |
![]() |
#4 |
Jan 2021
California
42210 Posts |
![]()
I've found that the first DC runs on a new computer get terrible time estimates on primenet. Usually super optimistic by a factor of 3 or more, but sometimes super pessimistic. By about the 3rd DC run, the time estimates start to get much better. Even so, there are a lot of things that can throw them way off.
Last fiddled with by slandrum on 2021-04-11 at 17:36 |
![]() |
![]() |
![]() |
#5 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
22·5·7·47 Posts |
![]()
There are multiple ways of estimating completion time. One is to observe actual throughput of mprime/prime95 over days or weeks, including effect of other workloads, system shutdowns, etc, and extrapolate that mean throughput to task completion.
Another is to time iterations, currently, perhaps while the system is not doing other things, so mprime/prime95 is achieving near peak throughput possible, and extrapolate that to exponent completion. I think status output uses the first, and the worker window of prime95 the second. They naturally provide different estimates. Last fiddled with by kriesel on 2021-04-11 at 17:42 |
![]() |
![]() |
![]() |
#6 |
"David Kirkby"
Jan 2021
Althorne, Essex, UK
26·7 Posts |
![]()
I have been changing the number of cores for the worker. Currently the worker has 18 cores, with the other 8-cores of a 26-core CPU devoted to other workers. Earlier in the day the had only 3 cores.
What software computes the estimated time to completion? I assumed it was mprime, which sent its estimates to the Primenet server. But from what I am reading above, that assumption is not correct. Dave Last fiddled with by drkirkby on 2021-04-11 at 17:56 |
![]() |
![]() |
![]() |
#7 |
"David Kirkby"
Jan 2021
Althorne, Essex, UK
1C016 Posts |
![]()
I set the benchmarks with a fixed FFT size of 6048 K, as that seems to be what's used with the 110 million assignments I have. The optimal setup is 2 workers, 52 cores, so both CPUs are running flat out with 26-cores in use each. I tried slightly less than 22, 23, 24, 25 and 26 cores per worker, but 26 cores gives the best results.
I think part of my problem was trying to do a 332646233 exponent at the same time as a the 110 million exponent. Obviously the former needs a much bigger FFT. It might be a bit tricky to find the best setup to use with one large exponent and a smaller one. It may be a case of it being better not to do that. But I don't fancy testing another big exponent. I've now got the time per iteration down to about 1.63 ms, and 1.60 ms on a slightly smaller exponent. [Worker #1 Apr 11 22:12] Iteration: 41590000 / 110825779 [37.52%], ms/iter: 1.634, ETA: 31:25:27 [Worker #2 Apr 11 22:12] Iteration: 43590000 / 110274583 [39.52%], ms/iter: 1.592, ETA: 29:28:57 I'm getting more and more tempted to just forget about 332646233, despite I've done more than 44% of it. I think it causes problems with throughput for smaller exponents, even when only one core is given to that exponent Last fiddled with by drkirkby on 2021-04-11 at 21:22 |
![]() |
![]() |
![]() |
#8 |
"David Kirkby"
Jan 2021
Althorne, Essex, UK
26·7 Posts |
![]()
The estimate seems to be getting more accurate. It is currently showing the following as the estimate for the date/time to complete.
M110825779, PRP, Thu Apr 15 21:47 2021 which is more than 4 days earlier than its previous estimate of M110825779, PRP, Mon Apr 19 23:05 2021 I expect it to actually finish around 7 AM tomorrow morning (13th April) as it should finish in a little over 17 hours. [Worker #1 Apr 12 12:52] Iteration: 73780000 / 110825779 [66.57%], ms/iter: 1.666, ETA: 17:08:48 Dave Last fiddled with by drkirkby on 2021-04-12 at 11:57 |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Why can't I extend something that's certain to not complete in time? | drkirkby | Information & Answers | 6 | 2021-02-18 18:35 |
Is Moore's Law wrong, or is it wrong-headed (6th time around) | jasong | jasong | 12 | 2016-05-27 11:01 |
Expected Time To Complete A Quest Function | SaneMur | Information & Answers | 33 | 2012-01-02 08:46 |
Time to complete project | Citrix | Prime Sierpinski Project | 5 | 2006-01-09 03:45 |
Time to complete information | JuanTutors | Software | 3 | 2004-06-28 10:47 |