mersenneforum.org > Data processed dc and tc posts
 Register FAQ Search Today's Posts Mark Forums Read

 2022-11-13, 18:19 #848 Uncwilly 6809 > 6502     """"""""""""""""""" Aug 2003 101×103 Posts 101010100011112 Posts Can you at least submit all of your "bad" results? That way if someone runs it and it matches one of those, it will be complete. I might queue it up on Prime95 on a machine that will take about 85 days to run it.
 2022-11-13, 18:42 #849 LaurV Romulan Interpreter     "name field" Jun 2011 Thailand 101000001010012 Posts You both didn't get it. Read again my post. It is not about this particular exponent. Neither about using cudaLucas in the future. We know it is slower. And now, I proved it is buggy too. Not the first time I did that either (see 2012). I don't know if other FFTs are affected. There may be. Therefore, there may be exponents which were both LL and DC with cudaLucas (the server accepted such results, with different shifts) and the residues matched, yet, they are wrong. Is not about "fixing" cudaLucas either, as long as we have gpuOwl and PRP with certs. But such exponents, if they exists, we need to find them and redo the tests. If they are too many to re-test "in bulk", then we need to debug cudaLucas to see which FFTs are affected, which versions are affected, etc., to eventually reduce the list. I would be quite happy to be wrong, and no test to be affected. But putting my nose into cudaLucas internals (FFT) is not what I can do. What I can do, I can insulate the point where the residues start differing, and make a checkpoint file close to it. Then I can pass that to somebody who knows the trade (George, Mihai, Ernst, etc). The bug can be reproduced with colab script from Teal/Daniel on A100 and V100. Going to bed, 1:45 AM here. Need to work today, too, in few hours... Last fiddled with by LaurV on 2022-11-13 at 18:45
 2022-11-13, 18:51 #850 Uncwilly 6809 > 6502     """"""""""""""""""" Aug 2003 101×103 Posts 5·2,179 Posts Spin up a thread about this in either GPU computing or in Software. I understand that there is an issue. But, getting a sanity check via Prime95 should show what the right result is. That can point to the answer WRT to the FFT issue.
 2022-11-13, 18:56 #851 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 163168 Posts PMed James & George asking for the CUDALucas-both-times list if feasible. (Would require reporting program to be stored in the database per LL result reported, or a sufficient set of clues to deduce it.) And all the more reason to DC LL via PRP/GEC/proof generation and upload and cert. @LaurV, if the 20000K fft deviating residues are reproducible in CUDALucas, please isolate it to the granularity gpuowl accepts on logging intervals (10,000) or finer. Another possibility is a bug in the NVIDIA CUDA dlls. Pentium fdiv microcode bugs went undetected for a long time, and were operand dependent. Last fiddled with by kriesel on 2022-11-13 at 19:49
 2022-11-13, 20:06 #852 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 2×3×1,229 Posts LaurV, if you are interested in trying to reproduce the problem in other exponents, you could try 20000K fft on CUDALucas for M332196607, for which I have Jacobi-checked and matching final residue, and full log at 50K iterations spacing for interim residues. And probably could round up a few others in gpuowl logs.
2022-11-13, 20:59   #853
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

5·23·71 Posts

Quote:
 Originally Posted by LaurV But putting my nose into cudaLucas internals (FFT) is not what I can do.
Quote:
 Originally Posted by kriesel Another possibility is a bug in the NVIDIA CUDA dlls.
IIRC, there are no CudaLucas FFT internals, simply a call to the CUDA FFT library. That doesn't mean the bug isn't in CudaLucas, there is the weighting and carry propagation code to consider.

P.S. The database knows which LL results were produced by CudaLucas. I'd be extremely surprised if shift count doesn't protect GIMPS from a bad result getting flagged as DCed.

Last fiddled with by Prime95 on 2022-11-13 at 21:02

2022-11-13, 21:31   #854
dcheuk

Jan 2019
Florida

35 Posts

Quote:
 Originally Posted by Uncwilly This is for the list of needed triple (and higher order) checks. Code: Exponents with 2 Unverified results: Cat 1 DoubleCheck=62858629,74,1 DoubleCheck=62871643,75,1
queued, mprime won't let me reserve these

2022-11-13, 21:37   #855
ATH
Einyen

Dec 2003
Denmark

19×181 Posts

Quote:
 Originally Posted by Prime95 P.S. The database knows which LL results were produced by CudaLucas. I'd be extremely surprised if shift count doesn't protect GIMPS from a bad result getting flagged as DCed.
How many exponents where all tests, 2 or more, were done with CUDALucas?

2022-11-13, 21:55   #856
sdbardwick

Aug 2002
North San Diego County

22·3·67 Posts

Quote:
 Originally Posted by dcheuk queued, mprime won't let me reserve these
Is the machine you tried to reserve them from qualified to do Cat 1 exponents?

I took these:
Cat 2
DoubleCheck=63563179,75,1
DoubleCheck=67467457,75,1

2022-11-13, 21:57   #857
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

11100110011102 Posts

Quote:
 Originally Posted by ATH How many exponents where all tests, 2 or more, were done with CUDALucas?
TBD. As earlier indicated, I requested by PM, James or George query the database for the list. It's Sunday afternoon in North America. Please be patient.

One of the issues with CUDALucas is the absence of either readback or error/success value checking after some CUDA library calls. As one example:
Returns:cudaSuccess, cudaErrorInvalidValue, cudaErrorInvalidDevicePointer, cudaErrorInvalidMemcpyDirection
Gpuowl copies host>gpu,gpu>host, and does a compare on the host to verify correctness of the gpu copy.
CUDALucas does not do readback and does not IIUC check for success or error return values from the call either.
In CUDALucas.cu routine void write_gpu_data(int q, int n),
Code:
  // Square kernel data
for (j = (n >> 2) - 1; j > 0; j--) s_ct[j] = 0.5 * cospi (j * d);
cudaMemcpy (g_ct, s_ct, sizeof (double) * (n / 4), cudaMemcpyHostToDevice);
then continues on without checking for success or errors, as if such things never could happen.
It does this for most calls, for speed, yet is slower than gpuowl on same hardware and inputs.

Similarly, in the LL Iteration loop,
Code:
  cufftExecZ2Z (g_plan, (cufftDoubleComplex *) g_x, (cufftDoubleComplex *) g_x, CUFFT_INVERSE);
Gpu to host transfer is sometimes checked:
Code:
  if (error_flag & 3)
{
err = cutilSafeCall1 (cudaMemcpy (&terr, g_err, sizeof (float), cudaMemcpyDeviceToHost));
if(terr > *maxerr) *maxerr = terr;
//if( g_pf && g_sl) usleep(g_sv);//, nanosleep sleep(1);
}
else if (g_pf && (iter % g_po) == 0)
{
//if(g_sl) usleep(g_sv);//, nanosleep sleep(1);
}
if(err != cudaSuccess) terr = -1.0f;
return (terr);
}

Last fiddled with by kriesel on 2022-11-13 at 22:00

2022-11-13, 22:51   #858
sdbardwick

Aug 2002
North San Diego County

22×3×67 Posts

Quote:
 Originally Posted by ric Requesting TC (as before, LL-only, please) for: Code: DoubleCheck=68891453,75,1 TIA
We matched.

 Similar Threads Thread Thread Starter Forum Replies Last Post jasong Forum Feedback 1054 2022-06-20 22:34 10metreh Forum Feedback 6 2013-01-10 09:50 jasonp Forum Feedback 9 2009-07-19 17:35 edorajh Data 10 2003-11-18 11:26 Xyzzy Lounge 10 2002-11-21 00:04

All times are UTC. The time now is 12:40.

Tue Feb 7 12:40:20 UTC 2023 up 173 days, 10:08, 1 user, load averages: 1.53, 1.40, 1.35