mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Blogorrhea > kriesel

Reply
 
Thread Tools
Old 2022-11-16, 03:41   #12
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

24×643 Posts
Default

Quote:
Originally Posted by kruoli View Post
Or is LL preferred here?
I would prefer LL, to see if any more cudaLucas bugs could be found.
But at the end, the goal is to see if a prime was missed, so if it is better/faster for you to do PRP, then do PRP.
Somebody will Cert, and we can rest this issue.

@ATH, the shift is not necessary. A fast gpuOwl LL with 0-shift is as good as any, it finishes in half time, and it is safe in the sense most probably the cudaLucas bug is in the 20M FFT in the library, and as Mihai did his own FFT, there should be impossible to get the same error in the same place.

On the other hands, I finished all my former assignments except for the last one which is 25%, and everything matched until now. I think we are safe, and most probably the bug is only for 20M FFT, indeed.

Oh, and sorry if I stepped on your toes with one of the assignments (I just seen kriesel's comment in the first post), but my lame excuse is that at the time I got it, it was nothing written in the table.

Last fiddled with by LaurV on 2022-11-16 at 10:04
LaurV is offline   Reply With Quote
Old 2022-11-16, 12:14   #13
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

2·3·52·23 Posts
Default

Quote:
Originally Posted by LaurV View Post
@ATH, the shift is not necessary. A fast gpuOwl LL with 0-shift is as good as any, it finishes in half time, and it is safe in the sense most probably the cudaLucas bug is in the 20M FFT in the library, and as Mihai did his own FFT, there should be impossible to get the same error in the same place.
What I meant was, due to shift count any bugs in CUDALucas results would have been caught earlier, so I don't think triple checking all these are any high priority or really needed.
If there is a bug in CUDALucas it is probably only 20000K FFT (and maybe higher).

Last fiddled with by ATH on 2022-11-16 at 12:15
ATH is offline   Reply With Quote
Old 2022-11-16, 15:00   #14
sdbardwick
 
sdbardwick's Avatar
 
Aug 2002
North San Diego Coun

81710 Posts
Default

Another match on 36142801
sdbardwick is offline   Reply With Quote
Old 2022-11-17, 02:39   #15
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

24·643 Posts
Default

Quote:
Originally Posted by ATH View Post
...
Fully agree. George will have to study what's going on, up to his own convenience. If there is indeed a bug in the lib we just kicked NV's ass, haha .

From other points of view, it is no urgency to check these exponents, neither to fix cudaLucas, as long as it is slower than gpuOwl, people should use the last (even if not counting the speedup due to Certs). I just believed they were many more, but it seems that cudaLucas was not as popular as I thought, or the users still have comparable more CPU power. For me, the GPU power was always more than CPU could reasonably handle, so I thought a lot of people LL/DC with cudaLucas.

Meantime, as we are at it, it is better we check them, to make sure. I already matched all <40M, I did not edit the post because I saw kriesel online in the same time and I didn't want to create a mess, in case he is editing too. I think is best to let him do the edits.

Reserving the 137M exponent - queued but not started yet, it will take a while.

I cannot do the 666M, as both other tests are mine. If anybody adventures, there is a residue file that can be downloaded from my blog, to check if everything is on shape. My opinion is that we should just forget about this for now. At least for few years, when we can have better hardware and can test 666M in 5 minutes...
LaurV is offline   Reply With Quote
Old 2022-11-17, 03:54   #16
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

24·643 Posts
Default

@kriesel, I have seen you added FFT sizes. To be specific, I did my current <40 like that:

37047167 inclusive, and below, were done with 1920k FFT, gpuOwl
all the rest were done with 2M = 2048k, gpuOwl

However, cudaLucas is different, and FFT boundaries depend on the card too. I will collect some boundaries from the old FFT files I have (colab included, FFT optimizations done by Teal's team).

BTW, going back in history to check that, I realized that I was wrong in the past talking about 20000k like being 20M FFT. That was a mistake. Sorry for that, and please assume that everywhere where I said "20M" in this thread and the other one related to the cudaLucas bug, I meant 20000k FFT.

Last fiddled with by LaurV on 2022-11-17 at 03:58
LaurV is offline   Reply With Quote
Old 2022-11-24, 06:45   #17
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

24×643 Posts
Default

As per PM discussion, I queue up to do with gpuOwl:

73612841 -
73614041 -
73642033 -
73684073 -
73684703 -
73685609 -
73798027 -
73802059 -
73812071 -
73901071 -

It will take some time, but they will be done, eventually.

Also, I understood (separate PM talk) that there is no interest in maintaining cudaLucas, and as of now, there is no "curator" for it, and George is quite busy (in fact, he is in holiday for two weeks, but please don't tell to these guys, before I discover the next mersenne prime! Now it is my best opportunity! ) so he won't go into debugging cudaLucas soon (or not at all). I fully understand the lack of interest, considering things we already discussed here on the forum, like we have now the PRP check with GC and Cert, which double the testing speed, and moreover, even for pure-LL, cudaLucas is slower than the Owl on the same hardware. So, people wants to put cudaLucas into museum section. OK... fine with me.

For now.

Later, I may be willing to put my own nose into it, but the process will be long - my encounters with GPU programming is scarce (only simple things, playing around, never made a serious cuda "product") and albeit I have the understanding of the math, I never got my hands dirty into implementing it (GPU or no GPU). But what always intrigued me, and continue to intrigue me, was the fact that native cuda in cudaLucas is slower than emulated (micro-programmed) openCL in gpuOwl. I understand openCL running like hell in AMD cards, which are made for it, and where it wouldn't matter, because you can not run cuda in AMD cards anyhow. But openCL being faster than cuda in Nvidia cards, this can not be! Of course, there is no mystery here, Mihai did his own fine-tuned stuff, while Msft (the initial daddy of cudaLucas - anybody still remembers him?) used general FFT libraries from Nvidia, which are for sure "general", i.e. slower. So I always wondered how fast would be a native-cuda-fine-tuned-FFT cudaLucas. But the time/knowledge/mood were always scarce. Now with this FFT bug, maybe <insert deity> is kicking my ass to decide in making that step forward...

Last fiddled with by LaurV on 2022-11-24 at 06:54
LaurV is offline   Reply With Quote
Old 2022-11-24, 09:44   #18
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

59·131 Posts
Default

What is truly remarkable is that gpuowl, developed in Linux for OpenCL on AMD GPUs, is more reliable at LL on NVIDIA GPUs in Windows or Linux, (usually), than CUDALucas, while also being substantially faster on identical hardware. Msft pioneered GPU primality testing for GIMPS with CUDALucas. Preda followed some years after, with very rapid development of gpuowl, for both Linux and Windows from the first week, with the eventual benefit of optimization assistance from George, and perhaps benefited from lessons from CUDALucas, CUDAPm1, and CLLucas experience. On CPU also, prime95 and mlucas do not use library fft routines. They use well crafted and performance tuned heavily tested custom fft code developed by George and Ernst. Preda was willing to try and throw away approaches and code relentlessly in the pursuit of better reliability and performance. We users are indebted to them all.

Last fiddled with by kriesel on 2022-11-24 at 11:27
kriesel is offline   Reply With Quote
Old 2022-11-24, 09:51   #19
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

24·643 Posts
Default

Amen!
LaurV is offline   Reply With Quote
Old 2022-11-28, 08:46   #20
kruoli
 
kruoli's Avatar
 
"Oliver"
Sep 2017
Porta Westfalica, DE

156010 Posts
Default

My four exponents are all done and all matched the original results. I used v6.11-380-g79ea0cc. All of them auto-selected a 3M FFT. A detailed log is available on request.
kruoli is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
inconsistencies between results.txt and results.json.txt ixfd64 Software 3 2021-07-17 08:32
Statistical properties of categories of GIMPS results and interim results kriesel Probability & Probabilistic Number Theory 1 2019-05-22 22:59
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
How to: send results back after using CUDALucas Koyaanisqatsi Information & Answers 5 2013-02-06 10:08

All times are UTC. The time now is 10:59.


Thu Jun 8 10:59:13 UTC 2023 up 294 days, 8:27, 0 users, load averages: 0.80, 0.86, 0.90

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔