mersenneforum.org CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW)
 Register FAQ Search Today's Posts Mark Forums Read

 2016-10-01, 11:18 #2531 ATH Einyen     Dec 2003 Denmark 3·17·59 Posts The one I'm using that works is CUDA6.5. But they claim now that the bugs in CUDA7 and 7.5 should be fixed in CUDA8. That's why I wanted to try it, but I guess it is not that important.
2016-10-02, 04:55   #2532
flashjh

"Jerry"
Nov 2011
Vancouver, WA

1,123 Posts

Quote:
 Originally Posted by ATH The one I'm using that works is CUDA6.5. But they claim now that the bugs in CUDA7 and 7.5 should be fixed in CUDA8. That's why I wanted to try it, but I guess it is not that important.
Hello everyone!

I uploaded a new Windows file to sourceforge with the CUDA 8.0 version of CUDALucas. It's here.

Please note that I couldn't locate the cufft32_80.dll file, so I didn't compile a 32 bit version for CUDA 8.0. As far as I can tell nVidia didn't include it with this version of CUDA. If someone has it, get it to me and I'll compile and upload the other files.

I ran several tests on a GTX 750ti, but I don't have any of the newer cards to see if this works or not. Can someone test this and let me know if it's producing errors or zero results similar to the other problems? Thanks!

Last fiddled with by flashjh on 2016-10-02 at 04:59 Reason: Can't spell

2016-10-02, 12:09   #2533
Karl M Johnson

Mar 2010

3×137 Posts

Hello, Jerry!

Thanks for the new binary, I can confirm that on a GTX 1080 it's a bit faster than the old v8.0 RC binary & the exponents match.
II did a quick test on M1000003, and the runtime was 76s for the old binary and 66s for the new one.

Quote:
 Originally Posted by flashjh Hello everyone! I uploaded a new Windows file to sourceforge with the CUDA 8.0 version of CUDALucas. It's here. Remember to download the CUDA 8.0 Libs from here. Please note that I couldn't locate the cufft32_80.dll file, so I didn't compile a 32 bit version for CUDA 8.0. As far as I can tell nVidia didn't include it with this version of CUDA. If someone has it, get it to me and I'll compile and upload the other files. I ran several tests on a GTX 750ti, but I don't have any of the newer cards to see if this works or not. Can someone test this and let me know if it's producing errors or zero results similar to the other problems? Thanks!

 2016-10-09, 09:58 #2534 mognuts     Sep 2008 Bromley, England 2·3·7 Posts CUDA binary benchmarks Just run a benchmark using my new card (GTX 780ti) and the various CUDA binaries. Driver = 373.06. Test number = M2976221. Card not driving monitor. CUDA 4.2 32bit = 13min 11sec CUDA 4.2 64bit = 13min 01sec CUDA 5.0 32bit = 12min 18sec CUDA 5.0 64bit = 12min 24sec CUDA 5.5 32bit = 12min 24sec CUDA 5.5 64bit = 12min 36sec CUDA 6.0 32bit = 12min 13sec CUDA 6.0 64bit = 12min 19sec CUDA 6.5 32bit = 12min 19sec CUDA 6.5 64bit = 11min 38sec CUDA 8.0 64bit = 13 min 56sec I'm not that surprised at CUDA 8.0, but CUDA 6.5 64bit was a bit out of character compared to how by GTX580 behaves. Last fiddled with by mognuts on 2016-10-09 at 10:32 Reason: Rounded off values for clarity
 2016-10-09, 10:47 #2535 ATH Einyen     Dec 2003 Denmark 3·17·59 Posts I did a -cufftbench 2592 8192 20 on the different versions (only the 64 bit versions) so it does 20x 50iterations on each FFT and takes the average. In most cases 6.5 is fastest but a few of them has 8.0 as the fastest (on a Titan Black, but this is probably GPU dependent). CUDA 4.2 was quite a bit slower on all of them, so I left it out. Code:  8.0 6.5 6.0 5.5 5.0 2592 48471289 1.6135 1.6683 1.6897 1.6145 1.6166 2744 51250889 2.0056 1.8606 1.8682 1.9980 1.8714 3136 58404433 2.0937 2.0480 2.0710 2.0201 2.0337 3200 59570449 2.4195 2.3907 2.4056 2.4150 2.4175 3240 60298969 2.4266 2.4388 2.4404 2.4435 3375 62756279 2.5147 2.5301 3888 72075517 2.5348 2.5584 2.4558 4000 74106457 2.4631 2.5498 2.5821 2.4590 2.4639 4096 75846319 2.5115 2.5800 2.5976 2.5200 2.5375 4320 79902611 3.2614 3.2573 3.2760 3.2685 4374 80879779 3.3753 3.2784 3.2924 3.2946 3.3003 4500 83158811 3.3535 3.3845 4536 83809729 3.3810 3.4006 5184 95507747 3.4279 3.3836 3.4208 3.2994 3.3273 5292 97454309 3.9568 5488 100984691 3.8843 3.8345 3.8336 3.9979 3.7484 5600 103000823 4.1630 4.3082 4.3430 4.1882 4.1943 5832 107174381 4.5283 4.3644 4.3842 4.3847 4.3903 6000 110194363 4.5328 6048 111056879 4.5338 4.5138 4.5275 6075 111541967 4.5586 6125 112440191 4.5530 4.5645 4.5780 4.5867 6144 112781477 4.6858 6250 114685037 4.8079 4.6418 4.6620 4.6714 4.6787 6272 115080019 4.6678 4.6820 4.6835 4.6908 6400 117377567 4.9088 4.7531 4.7721 4.7719 4.7809 6480 118813021 4.9438 4.8401 4.8670 4.8669 6561 120266023 5.1164 4.8818 4.9012 4.8968 6750 123654943 5.1792 5.0356 5.0432 6912 126558077 5.1957 7776 142017539 5.2343 5.0001 5.0441 4.8219 8000 146019329 5.2537 5.1762 5.2350 5.0527 5.0997 8192 149447533 5.3838 5.2219 5.2593 5.1617 5.2106 Last fiddled with by ATH on 2016-10-09 at 10:49
 2016-10-09, 11:47 #2536 kladner     "Kieren" Jul 2011 In My Own Galaxy! 236468 Posts Many thanks for the info above. I had forgotten what version of CuLu I was running, and I'm still not sure. However, I switched in the CUDA 6.5, 64 bit version, and the GTX460 went from 7.4304 to 7.3026 ms/it.
 2016-10-09, 12:33 #2537 airsquirrels     "David" Jul 2015 Ohio 11×47 Posts I upgraded a couple systems from 6.5 to 8.0 last night. My Titan/Black based systems saw a 4.6% gain after letting everything burn up to normal temps. I'm not sure how effective any one off benchmark will be given the amount of active thermal/power management on most GPUs. Titan X from 7.5->8.0 saw no improvement.
2016-11-06, 04:06   #2538
storm5510
Random Account

Aug 2009
U.S.A.

2·3·13·23 Posts

Quote:
 Originally Posted by flashjh I uploaded a new Windows file to sourceforge with the CUDA 8.0 version of CUDALucas. It's here. Remember to download the CUDA 8.0 Libs from here.
Thank you! I had been looking for the libraries for a day or so. I'm only lacking an INI file. It replies "using defaults for non-specified options." then goes on. Where can I find the INI file?

Thanks,
Dwayne.

 2016-11-06, 05:30 #2539 storm5510 Random Account     Aug 2009 U.S.A. 2×3×13×23 Posts Disregard the request above. I found it with a bit more searching. I changed the screen output options so I could see what was going on. I reserved one doublecheck from PrimNet. CUDALucas reports it can complete it in a little under six days. Now for my quandary: Prime95 indicates it can do the test in the same amount of time. Six days for a LL test is nothing to sneeze at. CUDALucas did not seem to be utilizing my GPU as much as I thought it might. It's a GTX-750Ti. I could tell by observing the core temperature. mfaktc runs it in the upper 50's one the C scale. CUDALucas only made it into the low 40's. Just in case anyone wonders about my setup, it all runs with CUDA 8.
2016-11-06, 06:54   #2540

"Kieren"
Jul 2011
In My Own Galaxy!

236468 Posts

Quote:
 Originally Posted by storm5510 Disregard the request above. I found it with a bit more searching. I changed the screen output options so I could see what was going on. I reserved one doublecheck from PrimNet. CUDALucas reports it can complete it in a little under six days. Now for my quandary: Prime95 indicates it can do the test in the same amount of time. Six days for a LL test is nothing to sneeze at. CUDALucas did not seem to be utilizing my GPU as much as I thought it might. It's a GTX-750Ti. I could tell by observing the core temperature. mfaktc runs it in the upper 50's one the C scale. CUDALucas only made it into the low 40's. Just in case anyone wonders about my setup, it all runs with CUDA 8.
That card should do better. I run CuLu on a (very overclocked) GTX 460 with the 6.5 libraries. You would probably do better with 6.5, as well. CUDA 8.0 mainly seems to benefit GTX 10-series architecture. The 750ti has 640 CUDA cores, at a base clock of 1020 MHz. The 460 has 336 CUDA cores, and mine is running at 848 MHz. I did have to slow the memory from 1900 to 1700 MHz to get reliable results with CuLu.

Temperature is not a good guide comparing mfaktc and CUDALucas. In my experience, mfaktc runs a card hotter. This may have to do with greater (throttled) floating point usage under CuLu. (I could be wrong on this point.)

The 460 mentioned above, with all conditions (voltage, clock, no competition from P95, same ambient) stabilizes at 64 C with CuLu, at 7.2521 ms/it.. It does a 40.8M LLDC in 3-4 days. This is about twice as fast as an FX-8350 worker, when running 2 workers with 4 threads each. Just about any i5 or i7 chip from Sandy Bridge on would beat the snot out of the AMD CPU.

The 460 holds at 67 C running mfaktc, where it delivers ~206 GHz-d/d.. In both cases, usage was 100%, according to MSI Afterburner. This is a secondary card. It is not driving the display.

If you are running Windows, I really recommend Afterburner. Even if you don't use it to OC, it has nice, configurable monitoring functions. Finding out what the actual usage is would be a good start to analyzing your performance.

Here is a question: have you run CUFFTbench and threadbench on this card?

Last fiddled with by kladner on 2016-11-06 at 07:31

 2016-11-06, 13:48 #2541 henryzz Just call me Henry     "David" Sep 2007 Cambridge (GMT/BST) 132428 Posts Doesn't the 460 have a much better single precision/double precision ratio than the 750ti?

 Similar Threads Thread Thread Starter Forum Replies Last Post LaurV Data 131 2017-05-02 18:41 Brain GPU Computing 13 2016-02-19 15:53 Karl M Johnson GPU Computing 15 2015-10-13 04:44 fairsky GPU Computing 11 2013-11-03 02:08 Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 05:09.

Thu Jan 21 05:09:36 UTC 2021 up 49 days, 1:20, 0 users, load averages: 2.20, 2.40, 2.41