mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Data > Marin's Mersenne-aries

Reply
 
Thread Tools
Old 2022-11-13, 18:42   #12
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

2×47×109 Posts
Default

You both didn't get it. Read again my post. It is not about this particular exponent. Neither about using cudaLucas in the future. We know it is slower. And now, I proved it is buggy too. Not the first time I did that either (see 2012).
I don't know if other FFTs are affected.
There may be.
Therefore, there may be exponents which were both LL and DC with cudaLucas (the server accepted such results, with different shifts) and the residues matched, yet, they are wrong. Is not about "fixing" cudaLucas either, as long as we have gpuOwl and PRP with certs. But such exponents, if they exists, we need to find them and redo the tests. If they are too many to re-test "in bulk", then we need to debug cudaLucas to see which FFTs are affected, which versions are affected, etc., to eventually reduce the list.
I would be quite happy to be wrong, and no test to be affected.
But putting my nose into cudaLucas internals (FFT) is not what I can do. What I can do, I can insulate the point where the residues start differing, and make a checkpoint file close to it. Then I can pass that to somebody who knows the trade (George, Mihai, Ernst, etc).
The bug can be reproduced with colab script from Teal/Daniel on A100 and V100.
Going to bed, 1:45 AM here. Need to work today, too, in few hours...

Last fiddled with by LaurV on 2022-11-13 at 18:45
LaurV is offline   Reply With Quote
Old 2022-11-13, 18:51   #13
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

2×3×1,801 Posts
Default

Spin up a thread about this in either GPU computing or in Software.

I understand that there is an issue. But, getting a sanity check via Prime95 should show what the right result is. That can point to the answer WRT to the FFT issue.
Uncwilly is online now   Reply With Quote
Old 2022-11-13, 18:56   #14
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

711410 Posts
Default

PMed James & George asking for the CUDALucas-both-times list if feasible. (Would require reporting program to be stored in the database per LL result reported, or a sufficient set of clues to deduce it.)

And all the more reason to DC LL via PRP/GEC/proof generation and upload and cert.


@LaurV, if the 20000K fft deviating residues are reproducible in CUDALucas, please isolate it to the granularity gpuowl accepts on logging intervals (10,000) or finer.

Another possibility is a bug in the NVIDIA CUDA dlls. Pentium fdiv microcode bugs went undetected for a long time, and were operand dependent.

Last fiddled with by kriesel on 2022-11-13 at 19:49
kriesel is offline   Reply With Quote
Old 2022-11-13, 20:06   #15
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

11011110010102 Posts
Default

LaurV, if you are interested in trying to reproduce the problem in other exponents, you could try 20000K fft on CUDALucas for M332196607, for which I have Jacobi-checked and matching final residue, and full log at 50K iterations spacing for interim residues. And probably could round up a few others in gpuowl logs.
kriesel is offline   Reply With Quote
Old 2022-11-13, 20:59   #16
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

3·5·72·11 Posts
Default

Quote:
Originally Posted by LaurV View Post
But putting my nose into cudaLucas internals (FFT) is not what I can do.
Quote:
Originally Posted by kriesel View Post
Another possibility is a bug in the NVIDIA CUDA dlls.
IIRC, there are no CudaLucas FFT internals, simply a call to the CUDA FFT library. That doesn't mean the bug isn't in CudaLucas, there is the weighting and carry propagation code to consider.

P.S. The database knows which LL results were produced by CudaLucas. I'd be extremely surprised if shift count doesn't protect GIMPS from a bad result getting flagged as DCed.

Last fiddled with by Prime95 on 2022-11-13 at 21:02
Prime95 is offline   Reply With Quote
Old 2022-11-13, 21:37   #17
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

23×7×61 Posts
Default

Quote:
Originally Posted by Prime95 View Post
P.S. The database knows which LL results were produced by CudaLucas. I'd be extremely surprised if shift count doesn't protect GIMPS from a bad result getting flagged as DCed.
How many exponents where all tests, 2 or more, were done with CUDALucas?
ATH is offline   Reply With Quote
Old 2022-11-13, 21:57   #18
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·3,557 Posts
Default

Quote:
Originally Posted by ATH View Post
How many exponents where all tests, 2 or more, were done with CUDALucas?
TBD. As earlier indicated, I requested by PM, James or George query the database for the list. It's Sunday afternoon in North America. Please be patient.


One of the issues with CUDALucas is the absence of either readback or error/success value checking after some CUDA library calls. As one example:
CUDAMemcpy performs copies between host and gpu memory. https://developer.download.nvidia.co...2e9930741.html
Returns:cudaSuccess, cudaErrorInvalidValue, cudaErrorInvalidDevicePointer, cudaErrorInvalidMemcpyDirection
Gpuowl copies host>gpu,gpu>host, and does a compare on the host to verify correctness of the gpu copy.
CUDALucas does not do readback and does not IIUC check for success or error return values from the call either.
In CUDALucas.cu routine void write_gpu_data(int q, int n),
Code:
  // Square kernel data
  for (j = (n >> 2) - 1; j > 0; j--) s_ct[j] = 0.5 * cospi (j * d);
  cudaMemcpy (g_ct, s_ct, sizeof (double) * (n / 4), cudaMemcpyHostToDevice);
then continues on without checking for success or errors, as if such things never could happen.
It does this for most calls, for speed, yet is slower than gpuowl on same hardware and inputs.

Similarly, in the LL Iteration loop,
Code:
  cufftExecZ2Z (g_plan, (cufftDoubleComplex *) g_x, (cufftDoubleComplex *) g_x, CUFFT_INVERSE);
Gpu to host transfer is sometimes checked:
Code:
  if (error_flag & 3)
  {
    err = cutilSafeCall1 (cudaMemcpy (&terr, g_err, sizeof (float), cudaMemcpyDeviceToHost));
    if(terr > *maxerr) *maxerr = terr;
    //if( g_pf && g_sl) usleep(g_sv);//, nanosleep sleep(1);
  }
  else if (g_pf && (iter % g_po) == 0)
  {
    err = cutilSafeThreadSync();
    //if(g_sl) usleep(g_sv);//, nanosleep sleep(1);
  }
  if(err != cudaSuccess) terr = -1.0f;
  return (terr);
}

Last fiddled with by kriesel on 2022-11-13 at 22:00
kriesel is offline   Reply With Quote
Old 2022-11-14, 04:10   #19
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

2×47×109 Posts
Default

Quote:
Originally Posted by kriesel View Post
@LaurV, if the 20000K fft deviating residues are reproducible in CUDALucas, please isolate it to the granularity gpuowl accepts on logging intervals (10,000) or finer.
Will do this, please give me a day or two.
LaurV is offline   Reply With Quote
Old 2022-11-14, 04:18   #20
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

176258 Posts
Default

Quote:
Originally Posted by ATH View Post
How many exponents where all tests, 2 or more, were done with CUDALucas?
If I did the query correctly:

Code:
34643591
34696567
35184673
35381377
35478853
36142801
36211067
36313813
36497473
36532159
36717713
36841111
37018711
37047167
38093491
38208713
38276081
38363993
38931791
38976211
38976221
39052267
39258293
39839603
40123351
40404289
40413371
40473841
40501819
40641659
41508253
41518229
41856721
42791519
43883923
44932729
45243557
45285043
48073099
48075583
48122471
48429497
48555343
48677777
49404263
49457687
53998811
54009271
54010013
55831921
56294479
56309111
57766307
57954781
58370549
58370563
72366587
73604719
73612841
73614041
73642033
73684073
73684703
73685609
73798027
73802059
73812071
73901071
77075387
77143147
88680457
132000191
137362691
666666667
Prime95 is offline   Reply With Quote
Old 2022-11-14, 07:47   #21
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

23×7×61 Posts
Default

Hmm I'm involved in 23 of the 74 exponents. I'm triple checking my lowest one now with Prime95 30.8 b17: 36532159
ATH is offline   Reply With Quote
Old 2022-11-14, 10:38   #22
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

2×47×109 Posts
Default

Hmm... they are by far not so many as I expected. I thought there are more of them, especially in 332M, where I did myself some, but probably those which were LL and DC by myself were killed by Madpoo with Prime95, already.

I could "owl-LL" all those, except the bigger ones. As George said, I would be surprised a lot, if the random shift wouldn't catch this bug (and disappointed a lot too , because there it goes to the drain my advocacy for random shift ).

On the other hand, meantime, on a 2080 Ti, Windows 10:

Code:
FFT = 20000k  (wrong)

|  Nov 14  16:48:00  | M332329111 121482200  0x72ac700df6edc14e  | 20000K  0.05566 267.1888    2.67s  |   6:04:40:47  36.55%  |

|  Nov 14  17:10:34  | M332329111 121482201  0x1d15bc664e50aa21  | 20000K  0.25000   1.#INF    0.03s  |   6:04:40:47  36.55%  |
|  Nov 14  17:10:34  | M332329111 121482202  0x495c10fac3cb687b  | 20000K  0.12500  37.6750    0.03s  |   6:04:40:48  36.55%  |
|  Nov 14  17:10:34  | M332329111 121482203  0x87be878e3f8a71ba  | 20000K  0.06250  36.5630    0.03s  |   6:04:40:48  36.55%  |
|  Nov 14  17:10:34  | M332329111 121482204  0xfa90c31f9f4db434  | 20000K  0.05371  36.3340    0.03s  |   6:04:40:49  36.55%  |
|  Nov 14  17:10:34  | M332329111 121482205  0xbe66bb2afd9a4d8a  | 20000K  0.05371  36.7410    0.03s  |   6:04:40:49  36.55%  |
|  Nov 14  17:10:34  | M332329111 121482206  0xd9db0fb42ccfebae  | 20000K  0.05103  31.4970    0.03s  |   6:04:40:50  36.55%  |

FFT = 19600k (correct, I mean, like gpuOwl, and like other FFTs I tried at this size)

|  Nov 14  16:55:52  | M332329111 121482200  0x72ac700df6edc14e  | 19600K  0.09570 271.3789    2.71s  |  37:04:05:54  36.55%  |

|  Nov 14  17:13:36  | M332329111 121482201  0x1d15bc664e50aa21  | 19600K  0.25000   1.#INF    0.03s  |  37:04:06:07  36.55%  |
|  Nov 14  17:13:36  | M332329111 121482202  0x495c10fac3cb687b  | 19600K  0.12500  39.0810    0.03s  |  37:04:07:09  36.55%  |
|  Nov 14  17:13:36  | M332329111 121482203  0x87be878e3f8a71ba  | 19600K  0.06250  36.9730    0.03s  |  37:04:08:06  36.55%  |
|  Nov 14  17:13:36  | M332329111 121482204  0xfa90c31f9f4db434  | 19600K  0.07324  36.5400    0.03s  |  37:04:09:01  36.55%  |
|  Nov 14  17:13:36  | M332329111 121482205  0xbe66bb2afd9a4d8a  | 19600K  0.07324  36.8750    0.03s  |  37:04:09:57  36.55%  |
|  Nov 14  17:13:36  | M332329111 121482206  0xb3b61f68599fd75e  | 19600K  0.06958  36.6120    0.03s  |  37:04:10:53  36.55%  |
All tests were run till the next checkpoint matched, to make sure it is not an error. I mean, next checkpoints on both branches (which were different as in the former post). Then, where the residues started to differ, the range split in 10 and ran again, full range, so the checkpoint at the end matches. When the split reached "1", I ran every branch twice to make sure it is not a hardware error.

Once we switched to smaller ranges, all tests were done with error checking for every iteration. No error catch.

I will share residue file(s) at 121482200 with George (cudaLucas can show every residue on screen, but the smallest granulation for checkpoints is 10, even if you set it to 1). I mean, it is no big secret, just they are 40MB+.

Last fiddled with by LaurV on 2022-11-14 at 10:47
LaurV is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Strategic Double Clicking Madpoo Marin's Mersenne-aries 1841 2019-07-16 03:30
A couple of 15e candidates fivemack NFS@Home 1 2014-11-30 07:52
new here with a couple questions theshark Information & Answers 21 2014-08-30 17:36
A couple questions from a new guy Optics Information & Answers 8 2009-04-25 18:23
A couple things PHinker Software 3 2004-12-18 17:08

All times are UTC. The time now is 01:29.


Tue Dec 6 01:29:53 UTC 2022 up 109 days, 22:58, 0 users, load averages: 0.51, 0.76, 0.80

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔