Prime95 vs. CUDALucas
This is in regard to the following assignment:
Edit: I am using an nVidia GTX750Ti and CUDA 8. 

And what processor are you comparing it against?
Modern graphics cards do not have exceptionally good doubleprecision performance; a GTX1080 is 256 gigaflops peak, which is the same peak as a quadcore 4GHz Haswell. The GTX750Ti is about 40 gigaflops peak, so slower than a single core of a 4GHz Haswell. Last fiddled with by fivemack on 20161128 at 08:45 
The last Nvidia cards with "good" double precision performance was like GTX 580/590 and then the original Titan from 2013 and Titan Black / Titan Z from 2014 in the 700 series.
By "good" I mean 1/3rd of its single precision performance. All consumer cards since has DP performance of 1/24th or 1/32th of its SP performance. http://www.mersenne.ca/cudalucas.php Maybe you should use the SP performance for factoring with mfaktc instead. Your 750Ti has 1306 GFLOPs SP and 40.8 GFLOPs DP: https://en.wikipedia.org/wiki/GeForce_700_series Last fiddled with by ATH on 20161128 at 10:56 
i53570 @ 3.4 GHz.
I think that ATH was suggesting that you use mfaktc instead of Culu, rather than switching modes of one or the other. 

Yes, CUDALucas requires double precision, and it is therefore slow because it is running only 1/32 of your cards single precision performance.
It would probably be more beneficial for GIMPS and for the amount of GHzdays accumulating on your account (if you care about that) if you do factoring on the card with mfaktc (single precision) instead of LL tests with CUDALucas (double precision). Last fiddled with by ATH on 20161128 at 17:30 
This is primarily what I have been doing. I wanted to see how CUDALucas would perform on this hardware. Obviously, not as good as others. Case closed.

Quote:
It's not impossible, just less efficient. 

Per iteration, slower. That's what I mean by "less efficient". Otherwise it would have been implemented by now.

