mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2020-07-26, 15:06   #12
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

1101101010002 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
Uploading ability seems to be intermittent. 401 error. I will try later.
Aha! I have to do the manual results submission in order to enable upload, so it seems.

Last fiddled with by paulunderwood on 2020-07-26 at 15:07
paulunderwood is offline   Reply With Quote
Old 2020-07-26, 16:11   #13
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

1BF616 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
Aha! I have to do the manual results submission in order to enable upload, so it seems.
Yes. The results contain an MD5 hash of the proof file. A bad actor cannot upload a garbage proof for your run without that MD5 hash value.
Prime95 is offline   Reply With Quote
Old 2020-07-28, 13:32   #14
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

53016 Posts
Default primenet.py and upload.py scripts for proof upload

The proof upload in GpuOwl has been improved in a recent commit:
https://github.com/preda/gpuowl/comm...3d8b65cb7657a1

Previously the upload data used an inefficient HTTP Content-Type (basically base64) which was almost doubling the size of the upload. This has been fixed.

So if you're doing a lot of proof uploads it may be worth updating the tools/upload.py
preda is online now   Reply With Quote
Old 2020-10-09, 18:26   #15
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

22·7·11·29 Posts
Default

I have read this somewhere around, about moving the files here and there and renaming them to something, but it seems I am getting more stupid with the age...
So... How do I upload a proof file generated by the v6.11.x of the Owl, using Prime95?
(the result was manually reported already, then I moved the proof file to P95 folder, but it seems not doing anything with it).
Tried menu options, advanced, manual connection, blah blah, to no result.


Last fiddled with by LaurV on 2020-10-09 at 18:35
LaurV is offline   Reply With Quote
Old 2020-10-09, 18:37   #16
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·13·181 Posts
Default

Quote:
Originally Posted by LaurV View Post
I have read this somewhere around, about moving the files here and there and renaming them to something, but it seems I am getting more stupid with the age...
So... How do I upload a proof file generated by the v6.11.x of the Owl, using Prime95?
(the result was manually reported already, I moved the file to P95 folder, but it seems not doing anything with it).
Tried menu options, advanced, manual connection, blah blah, to no result.

Just move or copy the proof file into the prime95 working directory, and wait. It periodically checks the folder and uploads what's there. If there is an MD5 mismatch it will retry and re-fail until the user manually intervenes.
For a list of ways of uploading, see https://www.mersenneforum.org/showpo...0&postcount=26
kriesel is online now   Reply With Quote
Old 2020-10-11, 06:01   #17
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

22×7×11×29 Posts
Default

Right. Moving the file worked, but I wasn't patient enough (transferring of a hundred MB over my connection would be a matter of seconds, as I didn't limit the uplink, but I forgot to consider other aspects of the problem). CurtisC is already certifying it.

Thanks.


Edit: Verified. We are good.


Edit 2: Something was indeed wrong, I am not totally stupid. Looking to the p95's results file, it shows it couldn't open the file for a while, albeit I am sure it wasn't accessed from somewhere else.

Code:
[Sat Oct 10 08:48:43 2020]
Cannot open proof file 105913139-8.proof
[Sat Oct 10 09:53:43 2020]
Cannot open proof file 105913139-8.proof
[Sat Oct 10 10:58:43 2020]
Cannot open proof file 105913139-8.proof
[Sat Oct 10 11:16:52 2020]
Cannot open proof file 105913139-8.proof
[Sat Oct 10 12:21:52 2020]
Cannot open proof file 105913139-8.proof
[Sat Oct 10 13:26:52 2020]
Cannot open proof file 105913139-8.proof
[Sat Oct 10 14:31:52 2020]
Cannot open proof file 105913139-8.proof
[Sat Oct 10 15:36:52 2020]
Cannot open proof file 105913139-8.proof
[Sat Oct 10 16:41:52 2020]
Cannot open proof file 105913139-8.proof
[Sat Oct 10 17:46:52 2020]
Cannot open proof file 105913139-8.proof
[Sat Oct 10 18:51:52 2020]
Cannot open proof file 105913139-8.proof
[Sat Oct 10 20:00:02 2020]
Cannot open proof file 105913139-8.proof
[Sat Oct 10 21:05:02 2020]
Cannot open proof file 105913139-8.proof
[Sat Oct 10 22:10:02 2020]
Cannot open proof file 105913139-8.proof
[Sat Oct 10 23:15:02 2020]
Cannot open proof file 105913139-8.proof
[Sun Oct 11 00:20:02 2020]
Cannot open proof file 105913139-8.proof
[Sun Oct 11 01:25:07 2020]
Cannot open proof file 105913139-8.proof
[Sun Oct 11 02:30:07 2020]
Cannot open proof file 105913139-8.proof
[Sun Oct 11 03:35:07 2020]
Cannot open proof file 105913139-8.proof
[Sun Oct 11 04:40:07 2020]
Cannot open proof file 105913139-8.proof
[Sun Oct 11 05:45:07 2020]
Cannot open proof file 105913139-8.proof
[Sun Oct 11 06:50:07 2020]
Cannot open proof file 105913139-8.proof
[Sun Oct 11 07:55:07 2020]
Cannot open proof file 105913139-8.proof
[Sun Oct 11 09:00:07 2020]
Cannot open proof file 105913139-8.proof
[Sun Oct 11 10:05:07 2020]
Cannot open proof file 105913139-8.proof
[Sun Oct 11 11:10:07 2020]
Cannot open proof file 105913139-8.proof
[Sun Oct 11 12:15:07 2020]
Cannot open proof file 105913139-8.proof
[Sun Oct 11 13:20:38 2020]
Cannot open proof file 105913139-8.proof
I just deleted the file by hand, restarted p95, and everything seems ok now. I recopied the files (meantime, one proof becomes two proofs) just to be on the safe side, and this is what I get (which is quite normal now):
Code:
[Comm thread Oct 11 13:50] MD5 of 105865523-8.proof is b8db136c154990b6d9517e86b3e391e0
[Comm thread Oct 11 13:50] Proof file exponent is 105865523
[Comm thread Oct 11 13:50] Filesize of 105865523-8.proof is 119098777
[Comm thread Oct 11 13:50] Proof 105865523-8.proof already uploaded ({"error_status":409,"error_message":"Conflict","error_description":"Proof already uploaded"})
[Comm thread Oct 11 13:50] MD5 of 105913139-8.proof is 609975da563e123b9a7d0e7c18d6e6a5
[Comm thread Oct 11 13:50] Proof file exponent is 105913139
[Comm thread Oct 11 13:50] Filesize of 105913139-8.proof is 119152345
[Comm thread Oct 11 13:50] Proof 105913139-8.proof already uploaded ({"error_status":409,"error_message":"Conflict","error_description":"Proof already uploaded"})
We are still good.

Thanks again.

Last fiddled with by LaurV on 2020-10-11 at 07:06
LaurV is offline   Reply With Quote
Old 2020-10-11, 08:28   #18
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

53016 Posts
Default

Quote:
Originally Posted by LaurV View Post
Right.
Laur, you'll be pleased (I hope) to find that in the latest GpuOwl 7.x I switched to using savefiles with explicit iteration numbers in filenames. They still don't have the res64 as part of the filename, but well it's a step in the right direction..

Last fiddled with by preda on 2020-10-11 at 08:29
preda is online now   Reply With Quote
Old 2020-10-11, 12:24   #19
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

22×7×11×29 Posts
Default

I am TOTALY pleased about that, for sure, don't get me wrong, and I am happy that things are progressing in the right direction, to speed up testing and help Gimps find the next prime much faster. You have a great merit in this, to which I take my hat out and bow , and moreover, I am happy and proud that a fellow Romanian guy did that! However, I am still grumpy about it due to the fact that I was only talking about those file names and shifts in the context of LL+DC on the same machine time.

Now, (1) v7 doesn't allow LL anymore, (2) the DC makes sense no more since VDF was introduced, and moreover, (3) DC itself didn't ever make sense without shifts. Shifts had their well-defined, additional purpose, to protect against software errors (those embedded in the program, or algorithm itself), as opposed to all the other "tricks" (DC included) which had the purpose of protecting against hardware errors or trickery (malefic users). If your FFT implementation has a bug which manifests itself only in extreme rare cases (similar to intel bug, say, which at the end, proved to be a micro-coding bug, not a hardware issue), then repeating the same test a number of times, in the same very-very-reliable hardware, will always give the same (correct or incorrect) results. Unless you do the DC with different hardware and different tool (like using P95+CPU, after xxxProgram+GPU, or vice-versa) you can't ever be sure that the result was right. Using shifts makes the FFT deal with DIFFERENT data all the time, so it would be impossible for two runs to make a mistake and yet produce the same residue at the end of the test, regardless of the fact that the mistake was in the program, algorithm, hardware, cosmic ray, whatever, it is just not possible.

My run scenario was two overclocked, overpowered, liquid-cooled, pushed-beyond-the-limits, high-end GPUs, running side by side doing LL for the same exponent, with different shifts. As long as I could compare the residues on the way and they matched, the run was ok, and I could report the two results (as LL and DC) for the same exponent at the end. OTOH, if a mismatch occurred, I could detect it in very early stage (files will have different names, as the residue was part of the name) and resume from the last known good point, without wasting any time. No need to wait till the test is finished, and waste computing power, time, and nerves.

New fashion of prime hunting, with PRP+Pm1 combined, no LL, no DC, is not to my liking. The LL lost its meaning. Even P-1 is not fun anymore, as it comes now "integrated" even if you want it or not...

My whole world is shattered... hihi... But well, I am just an old grumpy guy, and I understand that the progress kicks you in the ass from time to time and you have to move with its pace, even if you would like to stay behind...

Last fiddled with by LaurV on 2020-10-11 at 12:55
LaurV is offline   Reply With Quote
Old 2020-10-11, 15:16   #20
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

23×3×59 Posts
Default

Quote:
Originally Posted by LaurV View Post
If your FFT implementation has a bug which manifests itself only in extreme rare cases (similar to intel bug, say, which at the end, proved to be a micro-coding bug, not a hardware issue), then repeating the same test a number of times, in the same very-very-reliable hardware, will always give the same (correct or incorrect) results.
What processor bug you would detect with my error check, actually it would not give any "correct or incorrect" answer because in a non-random FFT implementation it would not pass the error check with incredible high probability. Even when there is no shift you can use a larger/(slightly smaller) FFT size for a given p to get an error checked prp residue in these cases. To get the actual line say so x+y, x*y (for double x,y) where the computation of the result has a bug would be a tough, but not impossible task.

Also notice that you need double error in the error check to remain undetected. Having only a single error in one block is always detected (unless it is squared out: having -x^2 instead of x^2 is also good since (-x^2)^2=x^4). And after the double error to see for a matching residue you have only 1/2^p chance..., for such real tests see my original post: https://mersenneforum.org/showpost.p...1&postcount=88 for an example where we have double error in a block, and with roughly 1/2^p probability had an errored final residue, what theory says. Prefering such smaller examples.

Not speaking about these sofware errors are detected in the same way how you catch errors with gpuowl. A software error would generate also the same totally trash residues.
R. Gerbicz is offline   Reply With Quote
Old 2020-10-17, 06:51   #21
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

22×7×11×29 Posts
Default

Quote:
Originally Posted by R. Gerbicz View Post
A software error would generate also the same totally trash residues.
Yep. That's why we need randomization (shifts). To be clear (again!) I was only talking about LL tests (not PRP) and my "arguments" precede the Gerbicz check (in time, I mean, the GC is a very new thing, my arguments for shifts are since first cudaLucas was born). Tell me the version of the gpuOwl which implements proper Gerbicz checks for LL, or any other check by the way, in such a way that the errors are detected (with some reasonable high probability) and the LL test properly resumes from the last known good point, so I won't need to run two cards in parallel, and I promise you that we have no "argument" and I will only use THAT version of gpuOwl for ever.
My productivity in that case will double, so everybody will be happy.
But tell me fast, before I completely switch to PRP with VDF beyond the point of no return

Last fiddled with by LaurV on 2020-10-17 at 06:55
LaurV is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Gpuowl Windows builds kriesel GpuOwl 23 2020-11-25 05:21
gpuOWL for Wagstaff GP2 GpuOwl 22 2020-06-13 16:57
Gpuowl / Linux question Prime95 GpuOwl 13 2020-01-03 22:44
gpuowl tuning M344587487 GpuOwl 14 2018-12-29 08:11

All times are UTC. The time now is 22:59.

Wed Nov 25 22:59:05 UTC 2020 up 76 days, 20:10, 3 users, load averages: 1.06, 1.36, 1.44

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.