mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Prime95 v30.3 (https://www.mersenneforum.org/showthread.php?t=25823)

ATH 2020-08-11 02:33

Ok 30.3b2 worked to upload the proof.

Happy5214 2020-08-11 12:42

I had an error when trying to certify a PRP-CF proof (v30.3b2, Ubuntu 20.04):

[code][Worker #1 Aug 11 07:40] Starting certification of M8608507 using FFT length 448K, Pass1=448, Pass2=1K, clm=4
[Comm thread Aug 11 07:40] CURL library error:
[Comm thread Aug 11 07:40] CURL library error:
[Worker #1 Aug 11 07:40] Error getting CERT starting value.
[Worker #1 Aug 11 07:40] Aborting processing of this work unit -- will try again later.
[/code]

Aramis Wyler 2020-08-12 03:31

Running mprime for the first time on a new Ryzen 5 3600.


I used mostly defaults - 2 workers and 3 cores each - with work type 150 (First time Prime checks).


I'm posting because I'm getting an enormous number of potential round off errors on each worker. The build is new, the cpu could be defective, but I haven't seen any errors or heat issues other than these roundoff errors.


[C][Worker #2 Aug 11 23:22] Setting affinity to run helper thread 2 on CPU core #6
[Worker #2 Aug 11 23:22] M110534549 stage 1 is 1.17% complete.
[Worker #2 Aug 11 23:23] Possible roundoff error (0.5), backtracking to last save file.
[Worker #2 Aug 11 23:23] Setting affinity to run helper thread 1 on CPU core #5
[Worker #2 Aug 11 23:23] Using FMA3 FFT length 6M, Pass1=1536, Pass2=4K, clm=1, 3 threads
[Worker #2 Aug 11 23:23] Setting affinity to run helper thread 2 on CPU core #6
[Worker #2 Aug 11 23:23] M110534549 stage 1 is 1.22% complete.
[Worker #1 Aug 11 23:23] M110534311 stage 1 is 1.23% complete. Time: 112.608 sec.
[Worker #1 Aug 11 23:24] Possible roundoff error (0.5), backtracking to last save file.
[Worker #1 Aug 11 23:24] Setting affinity to run helper thread 1 on CPU core #2
[Worker #1 Aug 11 23:24] Using FMA3 FFT length 6M, Pass1=1536, Pass2=4K, clm=1, 3 threads
[Worker #1 Aug 11 23:24] Setting affinity to run helper thread 2 on CPU core #3
[Worker #1 Aug 11 23:24] M110534311 stage 1 is 0.14% complete.
[Worker #2 Aug 11 23:25] Possible roundoff error (0.5), backtracking to last save file.
[Worker #2 Aug 11 23:25] Setting affinity to run helper thread 1 on CPU core #5
[Worker #2 Aug 11 23:25] Setting affinity to run helper thread 2 on CPU core #6
[Worker #2 Aug 11 23:25] Using FMA3 FFT length 6M, Pass1=1536, Pass2=4K, clm=1, 3 threads
[Worker #2 Aug 11 23:25] M110534549 stage 1 is 1.40% complete.
[Worker #2 Aug 11 23:25] Possible roundoff error (0.5), backtracking to last save file.
[Worker #2 Aug 11 23:25] Setting affinity to run helper thread 2 on CPU core #6
[Worker #2 Aug 11 23:25] Setting affinity to run helper thread 1 on CPU core #5
[Worker #2 Aug 11 23:25] Using FMA3 FFT length 6M, Pass1=1536, Pass2=4K, clm=1, 3 threads
[Worker #2 Aug 11 23:25] M110534549 stage 1 is 1.40% complete.
[Worker #1 Aug 11 23:25] M110534311 stage 1 is 0.74% complete. Time: 112.937 sec.
[Worker #1 Aug 11 23:26] Possible roundoff error (0.5), backtracking to last save file.
[Worker #1 Aug 11 23:26] Using FMA3 FFT length 6M, Pass1=1536, Pass2=4K, clm=1, 3 threads
[Worker #1 Aug 11 23:26] Setting affinity to run helper thread 2 on CPU core #3
[Worker #1 Aug 11 23:26] Setting affinity to run helper thread 1 on CPU core #2
[Worker #1 Aug 11 23:26] M110534311 stage 1 is 0.14% complete. [/C]


EDIT: This is with v30.3 build 2 on 64 bit debian.

intelfx 2020-08-12 07:02

[QUOTE=Happy5214;553273]I had an error when trying to certify a PRP-CF proof (v30.3b2, Ubuntu 20.04):

[code][Worker #1 Aug 11 07:40] Starting certification of M8608507 using FFT length 448K, Pass1=448, Pass2=1K, clm=4
[Comm thread Aug 11 07:40] CURL library error:
[Comm thread Aug 11 07:40] CURL library error:
[Worker #1 Aug 11 07:40] Error getting CERT starting value.
[Worker #1 Aug 11 07:40] Aborting processing of this work unit -- will try again later.
[/code][/QUOTE]
Same here.


Two issues:
[LIST=1][*]When I first started mprime today, mprime reported that it got CERT work, but then proceeded to work on previous assignments:
[CODE]
Aug 12 07:28:32 stratofortress.nexus.i.intelfx.name mprime[15264]: [Comm thread Aug 12 07:28] Sending expected completion date for M110701609: Aug 23 2020
Aug 12 07:28:33 stratofortress.nexus.i.intelfx.name mprime[15264]: [Work thread Aug 12 07:28] Running Jacobi error check. [Aug 12 07:28] PrimeNet success code with additional info:
Aug 12 07:28:33 stratofortress.nexus.i.intelfx.name mprime[15264]: [Comm thread Aug 12 07:28] Server assigned CERT work.
Aug 12 07:28:33 stratofortress.nexus.i.intelfx.name mprime[15264]: [Comm thread Aug 12 07:28] Got assignment 005E9C2063038514CF0D0DD5E4DCFCAE: CERT M10447057
Aug 12 07:28:33 stratofortress.nexus.i.intelfx.name mprime[15264]: [Comm thread Aug 12 07:28] Done communicating with server.
Aug 12 07:28:58 stratofortress.nexus.i.intelfx.name mprime[15264]: [Work thread] Passed. Time: 25.646 sec.
Aug 12 07:28:58 stratofortress.nexus.i.intelfx.name mprime[15264]: [Work thread Aug 12 07:28] Resuming primality test of M109983959 using FMA3 FFT length 6M, Pass1=1536, Pass2=4K, clm=1, 16 threads
[/CODE][*]When I manually edited my [c]worktodo.txt[/c] to place the cert work in front of the queue, mprime entered a failure loop:
[code]
Aug 12 07:30:06 stratofortress.nexus.i.intelfx.name mprime[16107]: [Work thread Aug 12 07:30] Starting certification of M10447057 using FMA3 FFT length 560K, Pass1=448, Pass2=1280, clm=1, 16 threads
Aug 12 07:30:06 stratofortress.nexus.i.intelfx.name mprime[16107]: [Comm thread Aug 12 07:30] CURL library error:
Aug 12 07:30:06 stratofortress.nexus.i.intelfx.name mprime[16107]: [Comm thread Aug 12 07:30] CURL library error:
Aug 12 07:30:06 stratofortress.nexus.i.intelfx.name mprime[16107]: [Work thread Aug 12 07:30] Error getting CERT starting value. Will try again later.
Aug 12 07:30:06 stratofortress.nexus.i.intelfx.name mprime[16107]: [Work thread Aug 12 07:30] Aborting processing of this work unit.
[/code](mprime went on to repeat these messages indefinitely)[/LIST]Are those bugs, server problems or misconfigurations on my part?


[B]Edit:[/B] when I subsequently edited [c]worktodo.txt[/c] to put the CERT assignment in the back of the queue and restarted mprime, it still attempted to pick up the CERT assignment, despite it was at the end of the queue (which suggests priority behavior for CERT work). Hence I conclude that (1) is a bug.

Prime95 2020-08-12 10:35

Due to a server issue, Linux clients can neither get the CERT starting value, nor upload proofs. Aaron or I will have a fix today.

Prime95 2020-08-12 10:38

[QUOTE=Aramis Wyler;553383]Running mprime for the first time on a new Ryzen 5 3600.

I used mostly defaults - 2 workers and 3 cores each - with work type 150 (First time Prime checks).

I'm posting because I'm getting an enormous number of potential round off errors on each worker. The build is new, the cpu could be defective, but I haven't seen any errors or heat issues other than these roundoff errors.[/QUOTE]

Hardware issues. Do the standard remedies, try lowering memory frequencies, or CPU speed, or increase voltages. Find a combination that can pass the torture test.

S485122 2020-08-12 19:03

Updated the software to the latest version.

Received a Cert work unit ... to do some certifying of a cofactor for the factored Mersenne number 10482449. The configured work preference is double checking primality testing.

Working with the configuration I will be spared from cofactor certifying (and any other CERT jobs.) When viewing the result of the cert work done : "n/a" one has to go to the status of the number [url]https://www.mersenne.org/report_exponent/?exp_lo=10482449&full=1[/url] to see that the cofactor work has been certified OK (Verified). But it is not clear what the the status of the exponent is : fully factored ? Anyway that type of work (cofactors) is absolutely not something I signed up for. The imposing of that kind of work is at ends with the "let the user decide" philosophy of the project.

There obviously remains a wee bit of tuning to do on PrimeNet.

Jacob

Prime95 2020-08-12 19:39

[QUOTE=S485122;553468]Updated the software to the latest version.

Received a Cert work unit ... to do some certifying of a cofactor for the factored Mersenne number 10482449. The configured work preference is double checking primality testing.

Working with the configuration I will be spared from cofactor certifying (and any other CERT jobs.) When viewing the result of the cert work done : "n/a" one has to go to the status of the number [url]https://www.mersenne.org/report_exponent/?exp_lo=10482449&full=1[/url] to see that the cofactor work has been certified OK (Verified). But it is not clear what the the status of the exponent is : fully factored ? Anyway that type of work (cofactors) is absolutely not something I signed up for. The imposing of that kind of work is at ends with the "let the user decide" philosophy of the project.

There obviously remains a wee bit of tuning to do on PrimeNet.[/QUOTE]

CERT for PRP-CF is trivially quick work. I'm not sure why you found it so distasteful.

CERT for PRP is really a kind of PRP-DC. It is not a separate work preference choice as the server does not have much of that work type to hand out.

I'm glad you figured out how to disable CERT work. Kriesel also disabled CERT work because of the impact on his LL testing -- a Jacobi check to save his LL test and another Jacobi check on resume. I'm thinking the ability to turn off CERT work needs to be more prominent -- perhaps a checkbox at the bottom of the Worker Windows dialog box.

I agree, the server web pages need a lot of work due to proofs.

kriesel 2020-08-12 20:52

To clarify, I set download rate to 0 on prime95 on most of my systems but not all, to a point where I think I'll be doing my fair share. Doing an order of magnitude more CERTs than I do primality tests was a small drag on throughput/efficiency and made my testing throughput unpredictable. There were others that were interested in doing more CERTs than they were being assigned. So throttling my CERT throughput down considerably from the initial disparity created a win-win. And I am appreciative of those who are doing CERTs on my PRP or PRPDC in 120M-200M. These runs are to possibly detect any issues with fft length cutoffs etc, well ahead of the wavefront. ([URL]https://www.mersenneforum.org/showpost.php?p=501181&postcount=6;[/URL] similar lower priority effort with LL/LLDC at [URL]https://www.mersenneforum.org/showpost.php?p=501178&postcount=4[/URL])

GIMPS is going through a complicated transition currently, and more rapidly it seems than originally projected. Software bugs are being identified and dealt with, in server and client code. Good bug reports, and patience, are recommended.

It will take a long time to get the bulk of the clients updated. Early adopters of prime95/mprime v30.x are bearing the brunt of CERT for both mprime/prime95 and gpuowl production. (Either curtisc or Ben Delo updating a fraction of their fleet would help a lot. But like for everyone in this all-volunteer project, their kit, their call. And if they had started already, we wouldn't know without doing some checking.)

ATH 2020-08-12 22:22

[QUOTE=Prime95;553471]CERT for PRP-CF is trivially quick work. I'm not sure why you found it so distasteful.[/QUOTE]

CERT for PRP-CF for a 10.48M exponent took 18 sec on 8 cores, so 2.5 min tops if running it on a single core, and maybe 3-5 minutes at most if you have a very old cpu and running it on 1 core.

Xyzzy 2020-08-12 23:03

Is there a worktype option to select proof work?


All times are UTC. The time now is 15:53.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.