![]() |
![]() |
#12 |
Romulan Interpreter
"name field"
Jun 2011
Thailand
24×643 Posts |
![]()
I would prefer LL, to see if any more cudaLucas bugs could be found.
But at the end, the goal is to see if a prime was missed, so if it is better/faster for you to do PRP, then do PRP. Somebody will Cert, and we can rest this issue. @ATH, the shift is not necessary. A fast gpuOwl LL with 0-shift is as good as any, it finishes in half time, and it is safe in the sense most probably the cudaLucas bug is in the 20M FFT in the library, and as Mihai did his own FFT, there should be impossible to get the same error in the same place. On the other hands, I finished all my former assignments except for the last one which is 25%, and everything matched until now. I think we are safe, and most probably the bug is only for 20M FFT, indeed. Oh, and sorry if I stepped on your toes with one of the assignments (I just seen kriesel's comment in the first post), but my lame excuse is that at the time I got it, it was nothing written in the table. Last fiddled with by LaurV on 2022-11-16 at 10:04 |
![]() |
![]() |
![]() |
#13 | |
Einyen
Dec 2003
Denmark
2·3·52·23 Posts |
![]() Quote:
If there is a bug in CUDALucas it is probably only 20000K FFT (and maybe higher). Last fiddled with by ATH on 2022-11-16 at 12:15 |
|
![]() |
![]() |
![]() |
#14 |
Aug 2002
North San Diego Coun
81710 Posts |
![]()
Another match on 36142801
|
![]() |
![]() |
![]() |
#15 |
Romulan Interpreter
"name field"
Jun 2011
Thailand
24·643 Posts |
![]()
Fully agree. George will have to study what's going on, up to his own convenience. If there is indeed a bug in the lib we just kicked NV's ass, haha
![]() From other points of view, it is no urgency to check these exponents, neither to fix cudaLucas, as long as it is slower than gpuOwl, people should use the last (even if not counting the speedup due to Certs). I just believed they were many more, but it seems that cudaLucas was not as popular as I thought, or the users still have comparable more CPU power. For me, the GPU power was always more than CPU could reasonably handle, so I thought a lot of people LL/DC with cudaLucas. Meantime, as we are at it, it is better we check them, to make sure. I already matched all <40M, I did not edit the post because I saw kriesel online in the same time and I didn't want to create a mess, in case he is editing too. I think is best to let him do the edits. Reserving the 137M exponent - queued but not started yet, it will take a while. I cannot do the 666M, as both other tests are mine. If anybody adventures, there is a residue file that can be downloaded from my blog, to check if everything is on shape. My opinion is that we should just forget about this for now. At least for few years, when we can have better hardware ![]() |
![]() |
![]() |
![]() |
#16 |
Romulan Interpreter
"name field"
Jun 2011
Thailand
24·643 Posts |
![]()
@kriesel, I have seen you added FFT sizes. To be specific, I did my current <40 like that:
37047167 inclusive, and below, were done with 1920k FFT, gpuOwl all the rest were done with 2M = 2048k, gpuOwl However, cudaLucas is different, and FFT boundaries depend on the card too. I will collect some boundaries from the old FFT files I have (colab included, FFT optimizations done by Teal's team). BTW, going back in history to check that, I realized that I was wrong in the past talking about 20000k like being 20M FFT. That was a mistake. Sorry for that, and please assume that everywhere where I said "20M" in this thread and the other one related to the cudaLucas bug, I meant 20000k FFT. Last fiddled with by LaurV on 2022-11-17 at 03:58 |
![]() |
![]() |
![]() |
#17 |
Romulan Interpreter
"name field"
Jun 2011
Thailand
24×643 Posts |
![]()
As per PM discussion, I queue up to do with gpuOwl:
73612841 - 73614041 - 73642033 - 73684073 - 73684703 - 73685609 - 73798027 - 73802059 - 73812071 - 73901071 - It will take some time, but they will be done, eventually. Also, I understood (separate PM talk) that there is no interest in maintaining cudaLucas, and as of now, there is no "curator" for it, and George is quite busy (in fact, he is in holiday for two weeks, but please don't tell to these guys, before I discover the next mersenne prime! Now it is my best opportunity! ![]() For now. Later, I may be willing to put my own nose into it, but the process will be long - my encounters with GPU programming is scarce (only simple things, playing around, never made a serious cuda "product") and albeit I have the understanding of the math, I never got my hands dirty into implementing it (GPU or no GPU). But what always intrigued me, and continue to intrigue me, was the fact that native cuda in cudaLucas is slower than emulated (micro-programmed) openCL in gpuOwl. I understand openCL running like hell in AMD cards, which are made for it, and where it wouldn't matter, because you can not run cuda in AMD cards anyhow. But openCL being faster than cuda in Nvidia cards, this can not be! Of course, there is no mystery here, Mihai did his own fine-tuned stuff, while Msft (the initial daddy of cudaLucas - anybody still remembers him?) used general FFT libraries from Nvidia, which are for sure "general", i.e. slower. So I always wondered how fast would be a native-cuda-fine-tuned-FFT cudaLucas. But the time/knowledge/mood were always scarce. Now with this FFT bug, maybe <insert deity> is kicking my ass to decide in making that step forward... Last fiddled with by LaurV on 2022-11-24 at 06:54 |
![]() |
![]() |
![]() |
#18 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
59·131 Posts |
![]()
What is truly remarkable is that gpuowl, developed in Linux for OpenCL on AMD GPUs, is more reliable at LL on NVIDIA GPUs in Windows or Linux, (usually), than CUDALucas, while also being substantially faster on identical hardware. Msft pioneered GPU primality testing for GIMPS with CUDALucas. Preda followed some years after, with very rapid development of gpuowl, for both Linux and Windows from the first week, with the eventual benefit of optimization assistance from George, and perhaps benefited from lessons from CUDALucas, CUDAPm1, and CLLucas experience. On CPU also, prime95 and mlucas do not use library fft routines. They use well crafted and performance tuned heavily tested custom fft code developed by George and Ernst. Preda was willing to try and throw away approaches and code relentlessly in the pursuit of better reliability and performance. We users are indebted to them all.
Last fiddled with by kriesel on 2022-11-24 at 11:27 |
![]() |
![]() |
![]() |
#19 |
Romulan Interpreter
"name field"
Jun 2011
Thailand
24·643 Posts |
![]()
Amen!
![]() ![]() |
![]() |
![]() |
![]() |
#20 |
"Oliver"
Sep 2017
Porta Westfalica, DE
156010 Posts |
![]()
My four exponents are all done and all matched the original results. I used v6.11-380-g79ea0cc. All of them auto-selected a 3M FFT. A detailed log is available on request.
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
inconsistencies between results.txt and results.json.txt | ixfd64 | Software | 3 | 2021-07-17 08:32 |
Statistical properties of categories of GIMPS results and interim results | kriesel | Probability & Probabilistic Number Theory | 1 | 2019-05-22 22:59 |
Don't DC/LL them with CudaLucas | LaurV | Data | 131 | 2017-05-02 18:41 |
settings for cudaLucas | fairsky | GPU Computing | 11 | 2013-11-03 02:08 |
How to: send results back after using CUDALucas | Koyaanisqatsi | Information & Answers | 5 | 2013-02-06 10:08 |