mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   PrimeNet (https://www.mersenneforum.org/forumdisplay.php?f=11)
-   -   OFFICIAL "SERVER PROBLEMS" THREAD (https://www.mersenneforum.org/showthread.php?t=5758)

kriesel 2021-09-17 22:04

[QUOTE=Uncwilly;588060]Sure, but that is the same work type. Only a Factor Found should cancel a PRP or LL assignment.[/QUOTE]The common ground is processing a P-1 no-factor result, causes a valid assignment held by the submitter to get marked expired when it should not. In the one subcase the P-1 assignment should be considered completed not expired; in the other the primality test should persist as a valid and pending assignment.

kriesel 2021-09-20 20:52

A detailed example is at [URL]https://mersenneforum.org/showpost.php?p=588256&postcount=89[/URL] from a February occurrence I just discovered today, of P-1 no factor found reporting causing the PRP assignment to disappear.
If not addressed, this may create trouble more frequently, after prime95 or mprime begin to support the same low cost stage 1 P-1 by using PRP generated powers of 3. Or Mlucas.

kriesel 2021-09-20 21:02

Moebius reports a [URL="https://mersenneforum.org/showpost.php?p=588258&postcount=91"]case[/URL] where two PRPs, two proof generations, two certs were done on same day same exponent different users.

kriesel 2021-09-20 21:22

[QUOTE=kriesel;588259]A detailed example is at [URL]https://mersenneforum.org/showpost.php?p=588256&postcount=89[/URL] from a February occurrence[/QUOTE]That one is relating to manual assignment and reporting re gpuowl V7.2-21. I think it likely the issue is more widespread.
Gpuowl v6.11-380 and others split an assignment
PRP=<AID>,blah,blah,2
into
PFactor=<AID>...
and
PRP=<AID>...
Same AID, different work, different results, that will get reported at different times/dates by the same user or the primenet.py script.

chalsall 2021-09-20 21:38

[QUOTE=kriesel;588262]Same AID, different work, different results, that will get reported at different times/dates by the same user or the primenet.py script.[/QUOTE]

Non-conformant to the API specs.

James Heinrich 2021-09-20 21:50

[QUOTE=chalsall;588263]Non-conformant to the API specs.[/QUOTE]I'm not sure that it is -- Prime95 does the same thing. Picking a random example [m]106223153[/m], the work was assigned as PRP, a NF-PM1 was reported but the PRP assignment is still active.

chalsall 2021-09-20 21:59

[QUOTE=James Heinrich;588264]I'm not sure that it is -- Prime95 does the same thing. Picking a random example [m]106223153[/m], the work was assigned as PRP, a NF-PM1 was reported but the PRP assignment is still active.[/QUOTE]

OK... I was thinking about splitting the AID into different work types to run in parallel, and then not having IPC between the workers to ensure the first to report doesn't set the DONE flag.

Prime95 / mprime will always do the P-1'ing work first in the case you've described. And, clearly, it understands the API.

kriesel 2021-09-20 23:50

[QUOTE=kriesel;588262]into
PFactor=<AID>...[B],2[/B]
and
PRP=<AID>...[B],0[/B]
Same AID, different work, different results, that will get reported at different times/dates by the same user or the primenet.py script.[/QUOTE]And those go sequentially at the end of the same worktodo.txt for a single Gpuowl instance, so get done sequentially. Given that manually assigned 106M wavefront PRP take ~27 hours now on my power-reduced Radeon VIIs, and default periodic reporting is daily, the P-1 result will report ~1 day before the PRP it precedes, or occasionally ~2 days (when the P-1 just makes it before a daily reporting time, and the next day the PRP just misses).

chalsall 2021-09-20 23:59

[QUOTE=kriesel;588268]And those go sequentially at the end of the same worktodo.txt for a single Gpuowl instance, so get done sequentially.[/QUOTE]

OK. We're just trying to figure out what isn't working in the various workflows. As has been reported here.

Are the "humans getting into the loop when they shouldn't" the problem? Manually submitting results, for example.

Few appreciate just how tricky software is. Putting humans into the equation just adds a few extra dimensions of uncertainty (read: "fun"). :wink:

kriesel 2021-09-21 00:06

5 Attachment(s)
Attempted a couple manually assigned test wavefront PRPs which both needed P-1 first.
V6.11-380 gpuowl manually. With lots of notes and screen captures along the way.

On the [URL="https://www.mersenne.org/report_exponent/?exp_lo=106303147&exp_hi=&full=1#"]first one[/URL], I did some progress updating using CURL which sort of converts an assignment from manual.
Used curl to report 99% s2 progress.
Then manually reported the completed P-1 NF ~5 minutes later.
A check of the exponent status showed a P-1 result report in the history, and a 99% complete S2, a contradiction.
Then used curl to report its brief PRP progress to correct the status.

[URL="https://www.mersenne.org/report_exponent/?exp_lo=106304603&exp_hi=&full=1"]Second one[/URL], no curl progress reporting ever, completed and reported the P-1 NF for the PRP assignment.
The PRP assignment remained.
Assignment status shows it as PRP, no stage, no %. It would seem reasonable to assume it at stage PRP 0% after getting P-1 NF. And reasonable to take the stance the server should assume nothing.

So, was unable to reproduce the PRP-assignment-disappearance, but found something new, a contradictory status creation method I guess. Server seems not prepared for a mix of manual and primenet activity on the same assignment. Not surprising really. I would probably not have gone looking for that kind of trouble either, while coding or debugging server scripts.

And maybe that fail to reproduce the issue is because [URL="https://mersenneforum.org/showpost.php?p=588267&postcount=93"]George already attempted a fix[/URL]. (Dueling threads, for more fun!)

chalsall 2021-09-21 01:39

[QUOTE=kriesel;588271](Dueling threads, for more fun!)[/QUOTE]

Please forgive me for this. But some call it Agile Development...


All times are UTC. The time now is 05:45.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.