![]() |
![]() |
#848 |
6809 > 6502
"""""""""""""""""""
Aug 2003
101×103 Posts
101010100011112 Posts |
![]()
Can you at least submit all of your "bad" results? That way if someone runs it and it matches one of those, it will be complete. I might queue it up on Prime95 on a machine that will take about 85 days to run it.
|
![]() |
![]() |
#849 |
Romulan Interpreter
"name field"
Jun 2011
Thailand
101000001010012 Posts |
![]()
You both didn't get it. Read again my post. It is not about this particular exponent. Neither about using cudaLucas in the future. We know it is slower. And now, I proved it is buggy too. Not the first time I did that either (see 2012).
I don't know if other FFTs are affected. There may be. Therefore, there may be exponents which were both LL and DC with cudaLucas (the server accepted such results, with different shifts) and the residues matched, yet, they are wrong. Is not about "fixing" cudaLucas either, as long as we have gpuOwl and PRP with certs. But such exponents, if they exists, we need to find them and redo the tests. If they are too many to re-test "in bulk", then we need to debug cudaLucas to see which FFTs are affected, which versions are affected, etc., to eventually reduce the list. I would be quite happy to be wrong, and no test to be affected. But putting my nose into cudaLucas internals (FFT) is not what I can do. What I can do, I can insulate the point where the residues start differing, and make a checkpoint file close to it. Then I can pass that to somebody who knows the trade (George, Mihai, Ernst, etc). The bug can be reproduced with colab script from Teal/Daniel on A100 and V100. Going to bed, 1:45 AM here. Need to work today, too, in few hours... Last fiddled with by LaurV on 2022-11-13 at 18:45 |
![]() |
![]() |
#850 |
6809 > 6502
"""""""""""""""""""
Aug 2003
101×103 Posts
5·2,179 Posts |
![]()
Spin up a thread about this in either GPU computing or in Software.
I understand that there is an issue. But, getting a sanity check via Prime95 should show what the right result is. That can point to the answer WRT to the FFT issue. |
![]() |
![]() |
#851 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
163168 Posts |
![]()
PMed James & George asking for the CUDALucas-both-times list if feasible. (Would require reporting program to be stored in the database per LL result reported, or a sufficient set of clues to deduce it.)
And all the more reason to DC LL via PRP/GEC/proof generation and upload and cert. @LaurV, if the 20000K fft deviating residues are reproducible in CUDALucas, please isolate it to the granularity gpuowl accepts on logging intervals (10,000) or finer. Another possibility is a bug in the NVIDIA CUDA dlls. Pentium fdiv microcode bugs went undetected for a long time, and were operand dependent. Last fiddled with by kriesel on 2022-11-13 at 19:49 |
![]() |
![]() |
#852 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2×3×1,229 Posts |
![]()
LaurV, if you are interested in trying to reproduce the problem in other exponents, you could try 20000K fft on CUDALucas for M332196607, for which I have Jacobi-checked and matching final residue, and full log at 50K iterations spacing for interim residues. And probably could round up a few others in gpuowl logs.
|
![]() |
![]() |
#853 | |
P90 years forever!
Aug 2002
Yeehaw, FL
5·23·71 Posts |
![]() Quote:
P.S. The database knows which LL results were produced by CudaLucas. I'd be extremely surprised if shift count doesn't protect GIMPS from a bad result getting flagged as DCed. Last fiddled with by Prime95 on 2022-11-13 at 21:02 |
|
![]() |
![]() |
#855 |
Einyen
Dec 2003
Denmark
19×181 Posts |
![]() |
![]() |
![]() |
#856 |
Aug 2002
North San Diego County
22·3·67 Posts |
![]() |
![]() |
![]() |
#857 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
11100110011102 Posts |
![]() Quote:
One of the issues with CUDALucas is the absence of either readback or error/success value checking after some CUDA library calls. As one example: CUDAMemcpy performs copies between host and gpu memory. https://developer.download.nvidia.co...2e9930741.html Returns:cudaSuccess, cudaErrorInvalidValue, cudaErrorInvalidDevicePointer, cudaErrorInvalidMemcpyDirection Gpuowl copies host>gpu,gpu>host, and does a compare on the host to verify correctness of the gpu copy. CUDALucas does not do readback and does not IIUC check for success or error return values from the call either. In CUDALucas.cu routine void write_gpu_data(int q, int n), Code:
// Square kernel data for (j = (n >> 2) - 1; j > 0; j--) s_ct[j] = 0.5 * cospi (j * d); cudaMemcpy (g_ct, s_ct, sizeof (double) * (n / 4), cudaMemcpyHostToDevice); It does this for most calls, for speed, yet is slower than gpuowl on same hardware and inputs. Similarly, in the LL Iteration loop, Code:
cufftExecZ2Z (g_plan, (cufftDoubleComplex *) g_x, (cufftDoubleComplex *) g_x, CUFFT_INVERSE); Code:
if (error_flag & 3) { err = cutilSafeCall1 (cudaMemcpy (&terr, g_err, sizeof (float), cudaMemcpyDeviceToHost)); if(terr > *maxerr) *maxerr = terr; //if( g_pf && g_sl) usleep(g_sv);//, nanosleep sleep(1); } else if (g_pf && (iter % g_po) == 0) { err = cutilSafeThreadSync(); //if(g_sl) usleep(g_sv);//, nanosleep sleep(1); } if(err != cudaSuccess) terr = -1.0f; return (terr); } Last fiddled with by kriesel on 2022-11-13 at 22:00 |
|
![]() |
![]() |
#858 |
Aug 2002
North San Diego County
22×3×67 Posts |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Posts that seem less than useless, or something like that | jasong | Forum Feedback | 1054 | 2022-06-20 22:34 |
Posts in limbo | 10metreh | Forum Feedback | 6 | 2013-01-10 09:50 |
Ton of spam posts | jasonp | Forum Feedback | 9 | 2009-07-19 17:35 |
Exponents assigned to me but not processed yet? | edorajh | Data | 10 | 2003-11-18 11:26 |
2000 posts! | Xyzzy | Lounge | 10 | 2002-11-21 00:04 |