mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2021-07-25, 20:32   #1
Jan S
 
Oct 2018
Slovakia

83 Posts
Default Unsuccessful LL-test.

63880507

I started LL-doublecheck with GPUowl 6.11.380 on Radeon RX 5500..

2021-07-19 16:17:27 gfx1012:xnack--0 63880507 FFT: 3.25M 256:13:512 (18.74 bpw)
2021-07-19 16:17:27 gfx1012:xnack--0 Expected maximum carry32: 58000000
2021-07-19 16:17:27 gfx1012:xnack--0 OpenCL args "-DEXP=63880507u -DWIDTH=256u -DSMALL_HEIGHT=512u -DMIDDLE=13u -DPM1=0 -DAMDGPU=1 -DMM_CHAIN=3u -DMM2_CHAIN=3u -DMAX_ACCURACY=1 -DULTRA_TRIG=1 -DWEIGHT_STEP_MINUS_1=0xc.5fd37d00ad748p-6 -DIWEIGHT_STEP_MINUS_1=-0xa.5e918e3d5878p-6 -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2021-07-19 16:17:32 gfx1012:xnack--0 OpenCL compilation in 4.91 s
2021-07-19 16:17:32 gfx1012:xnack--0 63880507 LL 0 loaded: 0000000000000004
2021-07-19 16:21:38 gfx1012:xnack--0 63880507 LL 100000 0.16%; 2458 us/it; ETA 1d 19:33; c961ee7defe04913
2021-07-19 16:25:44 gfx1012:xnack--0 63880507 LL 200000 0.31%; 2459 us/it; ETA 1d 19:30; fdec5e33b5da8410
2021-07-19 16:29:50 gfx1012:xnack--0 63880507 LL 300000 0.47%; 2459 us/it; ETA 1d 19:25; bc72e6a7e59fec3d
2021-07-19 16:33:56 gfx1012:xnack--0 63880507 LL 400000 0.63%; 2459 us/it; ETA 1d 19:21; 7ac988da5906c1d4
2021-07-19 16:38:01 gfx1012:xnack--0 63880507 LL 500000 0.78%; 2458 us/it; ETA 1d 19:17; af9e63f3bf0d747d
2021-07-19 16:42:07 gfx1012:xnack--0 63880507 LL 600000 0.94%; 2459 us/it; ETA 1d 19:13; e9aa0a0461d9178a
2021-07-19 16:42:07 gfx1012:xnack--0 63880507 OK 500000 (jacobi == -1)
.
.
.
.
2021-07-20 22:05:20 gfx1012:xnack--0 63880507 LL 43500000 68.10%; 2459 us/it; ETA 0d 13:55; a95069fc1d853490
2021-07-20 22:09:26 gfx1012:xnack--0 63880507 LL 43600000 68.25%; 2459 us/it; ETA 0d 13:51; e0d873f9991c78ed
2021-07-20 22:09:26 gfx1012:xnack--0 63880507 OK 43500000 (jacobi == -1)
2021-07-20 22:13:32 gfx1012:xnack--0 63880507 LL 43700000 68.41%; 2459 us/it; ETA 0d 13:47; aadba169a0fb01ce
2021-07-20 22:17:38 gfx1012:xnack--0 63880507 LL 43800000 68.57%; 2459 us/it; ETA 0d 13:43; c71d8e6c310f74cc
2021-07-20 22:21:44 gfx1012:xnack--0 63880507 LL 43900000 68.72%; 2459 us/it; ETA 0d 13:39; b2fc2f2daa3e3304
2021-07-20 22:25:50 gfx1012:xnack--0 63880507 LL 44000000 68.88%; 2459 us/it; ETA 0d 13:35; 77b647b7c7989c85
2021-07-20 22:29:56 gfx1012:xnack--0 63880507 LL 44100000 69.04%; 2459 us/it; ETA 0d 13:31; 9d64f2ee61d98420
2021-07-20 22:29:56 gfx1012:xnack--0 63880507 EE 44000000 (jacobi == 1)
2021-07-20 22:29:56 gfx1012:xnack--0 63880507 LL 43500000 loaded: a95069fc1d853490
2021-07-20 22:34:02 gfx1012:xnack--0 63880507 LL 43600000 68.25%; 2458 us/it; ETA 0d 13:51; e0d873f9991c78ed
2021-07-20 22:38:07 gfx1012:xnack--0 63880507 LL 43700000 68.41%; 2458 us/it; ETA 0d 13:47; aadba169a0fb01ce
2021-07-20 22:42:13 gfx1012:xnack--0 63880507 LL 43800000 68.57%; 2459 us/it; ETA 0d 13:43; c71d8e6c310f74cc
2021-07-20 22:46:19 gfx1012:xnack--0 63880507 LL 43900000 68.72%; 2459 us/it; ETA 0d 13:39; b2fc2f2daa3e3304
2021-07-20 22:50:25 gfx1012:xnack--0 63880507 LL 44000000 68.88%; 2458 us/it; ETA 0d 13:35; 77b647b7c7989c85
2021-07-20 22:54:31 gfx1012:xnack--0 63880507 LL 44100000 69.04%; 2459 us/it; ETA 0d 13:31; 9d64f2ee61d98420
2021-07-20 22:54:31 gfx1012:xnack--0 63880507 EE 44000000 (jacobi == 1)
2021-07-20 22:54:31 gfx1012:xnack--0 63880507 LL 43500000 loaded: a95069fc1d853490

...and again...and again, it took ~16 hours, until i noticed it.

I ran it again from beginning with the same result. LL-DC with Prime95 was successful.
Could anyone try this exponent with GPUowl? (LL-test)

PS:Then i successfully doublechecked 64670303.
Jan S is offline   Reply With Quote
Old 2021-07-25, 21:03   #2
Viliam Furik
 
"Viliam Furík"
Jul 2018
Martin, Slovakia

23·5·17 Posts
Default

I think (unless a re-run shows otherwise) that it could have been caused by some previous residue being incorrect, but undetected, and it resulted in detected bad residue later. Jacobi check has only a 50% chance of catching an error, IIRC.

It could also be some weird program bug in FFT or Jacobi itself, but that's less likely.
Viliam Furik is offline   Reply With Quote
Old 2021-07-26, 04:53   #3
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

24×13×47 Posts
Default

This happens not only with LL, but with PRP too, albeit not so often (the jacobi check is more prone to undetected errors than the GC). In the past I try to convince Mihai to keep a history with all checkpoints in gpuOwl (the same way cudaLucas is doing) and not only the last checkpoints, so you could resume from an older one, in case the newest one fails the same way your failed. But the argument was not strong enough so he wasn't convinced

My solution was (and still is) a simple batch file which runs in parallel (launched from a separate cmd window) which mainly checks every few minutes if there is a new checkpoint, and if so, it will rename it, to avoid gpuOwl deleting it in the future. The simplest version, like in the code below, just renames them 1, 2, 3, 4, etc, so there is no correspondence between the number of iteration and the file name. You can manually sort it out if sh!t happens. A more complex one will read the beginning of the file to get the iteration number and will create files on the same manner like cudaLucas does, with the iteration number in the name of the file.

Code:
@echo off
set /a exponent = %1 2>nul

:: if no parameter provided, exit
if [%exponent%] == [] goto error

:: if the parameter is not an exponent (i.e. numeric) exit
:: (trick to avoid using val() or isnumeric() which may not exist
::  in all windoze installs)
if [%exponent%] neq [%1] goto error

:: have a counter to keep the strike (not sync'd with iteration number)
set /a cnt = %2 2>nul

:: as batch files' if condition won't support an OR in win7 and before
if [%cnt%] == [] (
   set /a cnt = 0
) else (
   if [%cnt%] neq [%2] set /a cnt = 0
)

set d=%exponent%\%exponent%-old.ll.owl

:redo0

if exist %d% goto exists

:: wait about 10 minutes and re-check

::echo No file. Waiting...
timeout /t 600 /nobreak
goto redo0

:exists

:: if file exists, then rename it
:: make a 5-digit file counter (not sync with LL iteration number!)
::echo File found. Renaming...
if %cnt% lss 10 (
   set bb=0000
) else (
   if %cnt% lss 100 (
      set bb=000
   ) else (
      if %cnt% lss 1000 (
         set bb=00
      ) else (
         if %cnt% lss 10000 (
            set bb=0
         ) else (
            set bb=
         )
      )
   )
)
::echo %bb%%cnt%
del /q /f %exponent%\%exponent%.%bb%%cnt%.ckp 2>nul
ren %exponent%\%exponent%-old.ll.owl %exponent%.%bb%%cnt%.ckp
set /a cnt+=1
::echo %cnt%
goto redo0

:error

echo.
echo  - Ussage: 
echo.
echo    ^> collect_ckpoints ^<exponent^> ^[^<counter^>^]
echo.
echo    with numeric exponent and numeric ^(optional^) counter.
echo.
echo  - If no counter is supplied, zero is assumed, and in that case
echo    some of your old checkpoint files may be overwritten.
echo.
echo  - Some validation is done, but this is not fool-proof. 
echo    Try being honest, it is your best interest. :P
echo.

 :eof
Save this in a "collect_ckpoints.bat" file and use it / modify it, as you wish. Of course, the history will take space on disk and it has to be deleted from time to time by hand (like once per week, or when it is not needed anymore,like the test finished, etc.). This make sense for assignments taking days, weeks, months, that is why an exponent is provided, but you can easily modify it to work for any exponent, just search for what's new in the folder and rename it. When you have a crash similar with the reported one above, try an older checkpoint (rename it first, then relaunch gpuOwl), so you won't waste weeks of former work.

Last fiddled with by LaurV on 2021-07-26 at 05:05
LaurV is offline   Reply With Quote
Old 2021-07-26, 18:53   #4
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

1,493 Posts
Default

Quote:
Originally Posted by LaurV View Post
This happens not only with LL, but with PRP too, albeit not so often (the jacobi check is more prone to undetected errors than the GC). In the past I try to convince Mihai to keep a history with all checkpoints in gpuOwl (the same way cudaLucas is doing) and not only the last checkpoints, so you could resume from an older one, in case the newest one fails the same way your failed. But the argument was not strong enough so he wasn't convinced
Quite pointless for a good implementation. But you have to be a little careful, because you can get also an FFT error in the error check [and also when you update the ladder prod(t=0,k,b**(2**(t*L))) ]
, if there is absolutely no randomization in your code then you get the same FFT error (so error>0.5), and you fall in a cycle.
R. Gerbicz is offline   Reply With Quote
Old 2021-07-28, 01:50   #5
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

24×13×47 Posts
Default

Yep, that is a different point, which I also argued for. You may remember my two argument points were history, and random shifts. Glad you agree with me this time (for the first time in your life! , now I can boast to my friends that I convinced a Hungarian guy to accept my argument )
Next step for me is to catch Mihai in a dark corner of a bar, fill him with beer and get him drunk, then cheat him into implementing a new multiplication algorithm, and I bet we will be able to do LL tests with no error in O(1).
LaurV is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
I found the primality test, there seems to be no composite numbers that pass the test sweety439 sweety439 7 2020-02-11 19:49
Modifying the Lucas Lehmer Primality Test into a fast test of nothing Trilo Miscellaneous Math 25 2018-03-11 23:20
Successful TF worth more than unsuccessful TF?! NBtarheel_33 PrimeNet 5 2010-06-17 00:17
Double check LL test faster than first run test lidocorc Software 3 2008-12-03 15:12
A primality test for Fermat numbers faster than Pépin's test ? T.Rex Math 0 2004-10-26 21:37

All times are UTC. The time now is 09:40.


Sun Oct 17 09:40:47 UTC 2021 up 86 days, 4:09, 0 users, load averages: 1.13, 1.55, 1.73

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.