![]() |
![]() |
#1816 | |
Romulan Interpreter
Jun 2011
Thailand
23·19·61 Posts |
![]() Quote:
First, CudaLucas was never intended to run on AMD cards. For native/cuda/nvidia cards is still faster than anything else. At least, for everything I run in my rigs, old cards (like 580 and clasic/black Titans) and new cards (like 1080Ti and 2080Ti) included. Second, there is "almost nothing" to improve in CudaLucas (well, there are some minor things, that's why the quotes, but the big picture won't change much), this toy is just a "square, subtract 2, repeat" tool, which uses Nvidia cuda FFT libraries (cuFFT) to do the squaring. These libraries, indeed, fell behind, as you said. They were not updated by Nvidia for ages, and if we can convince them to make (or make by ourselves ![]() ![]() Last fiddled with by LaurV on 2020-02-02 at 05:22 |
|
![]() |
![]() |
![]() |
#1817 |
"Eric"
Jan 2018
USA
22·53 Posts |
![]()
This statement is a bit misleading since with the new gpuowl updates it has became significantly more efficient on memory bandwidth usage. I am seeing significant speedups on GPUs with high DP ratio like K80, P100, V100, Titan V. There is indeed not much difference for the GTX and RTX cards due to most of them being DP bound instead of memory.
Last fiddled with by xx005fs on 2020-02-02 at 05:26 |
![]() |
![]() |
![]() |
#1818 |
"Sam Laur"
Dec 2018
Turku, Finland
14A16 Posts |
![]()
Nope, on my RTX 2080 at least, the current version of gpuowl is about 20-30% faster than cudalucas, varying a bit from FFT size to another. The big improvement came in the beginning of December 2019, and smaller optimizations have accumulated since then, so if you've tested gpuowl before that, please test again.
|
![]() |
![]() |
![]() |
#1819 |
Romulan Interpreter
Jun 2011
Thailand
23×19×61 Posts |
![]() ![]() ![]() You may be totally right... We didn't move to such new fancy things yet.. ![]() Edit @nomead, crosspost, I was replying to xx, but what you say is really tempting, BRB soon ![]() Last fiddled with by LaurV on 2020-02-02 at 08:00 |
![]() |
![]() |
![]() |
#1820 | |
"Eric"
Jan 2018
USA
21210 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#1821 | |
"Mihai Preda"
Apr 2015
2·11·61 Posts |
![]()
It seems that your OpenCL compiler does not like __attribute__((opencl_unroll_hint(1))). To work around that, simply pass "-use UNROLL_ALL" (and none of the other UNROLL_ options), or, if running on a Nvidia card, don't pass any UNROLL option at all.
Quote:
|
|
![]() |
![]() |
![]() |
#1822 | |
"Mihai Preda"
Apr 2015
101001111102 Posts |
![]()
As the error says, you can't use "WORKINGOUT4" with that FFT size.
Did you try running the program without any -use options? does that work? Quote:
|
|
![]() |
![]() |
![]() |
#1823 | |
"Jorge Coveiro"
Nov 2006
Moura, Portugal
2×13 Posts |
![]() Quote:
I was just testing the "optimized settings" for Nvidia cards, but it seems that I can't use WORKINGOUT4. Going to test again and publish the results for the GTX1660. Last fiddled with by JCoveiro on 2020-02-02 at 20:45 |
|
![]() |
![]() |
![]() |
#1824 | |
"William Garnett III"
Oct 2002
Bensalem, PA
2×43 Posts |
![]() Quote:
However even with the iteration times being a couple milleseconds slower on gpuOwL versus CUDALucas (plus a couple millesecond slowdown to Prime95 if it is running too) since gpuOwL eliminates the need for a double-check that makes gpuOwL the overall time saver winner over CUDALucas for me. I only did one PRP double-check with gpuOwL and I occasionally do LL double-checks with CUDALucas. Since the 1/32 double-precision ratio is terrible I mostly stick with Trial Factoring using mfaktc. Last fiddled with by wfgarnett3 on 2020-02-06 at 09:34 |
|
![]() |
![]() |
![]() |
#1825 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3·1,637 Posts |
![]()
But it doesn't. There is a PRP DC work type for good reasons;
1) errors may occur outside the code that the GEC occurs, both in the software and in the manual reporting process, and some have already been confirmed to occur; 2) PRP DC guards against someone forging PRP first test submissions; 3) PRP GEC itself has a very low error rate, but not zero. Gerbicz himself has given error rate estimates. Quote:
|
|
![]() |
![]() |
![]() |
#1826 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
491110 Posts |
![]()
CUDALucas still has its place;
faster on a few gpu models than gpuowl; will run on older NVIDIA gpus that are entirelly incapable of running gpuowl because they don't support the required OpenCL level for gpuowl; relatively current gpuowl versions don't do LL so can't do LLDC (although v0.5 and v0.6 gpuowl can with 4M fft) It would be great if CUDALucas had the Jacobi check. |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1668 | 2020-12-22 15:38 |
GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
Testing an expression for primality | 1260 | Software | 17 | 2015-08-28 01:35 |
Testing Mersenne cofactors for primality? | CRGreathouse | Computer Science & Computational Number Theory | 18 | 2013-06-08 19:12 |
Primality-testing program with multiple types of moduli (PFGW-related) | Unregistered | Information & Answers | 4 | 2006-10-04 22:38 |