View Single Post
Old 2018-06-25, 10:46   #429
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

1,453 Posts
Default cudaOwl

I added an initial CUDA backend to gpuOwl. I expect this to be rough, buggy and not-optimized yet, but it's a start.

The approach I ended with was to use most of the same codebase, but split out two backends, OpenCL and CUDA.

[I'm thinking, should I rename the previous gpuOwl to openOwl for symmetry with cudaOwl?]

So, the savefile format, and much of the logic, is shared between the cudaOwl and gpuOwl.

There are some notable differences though:
- gpuOwl supports "offset extension", which means varying the offset (aka "shift") when a PRP error is encountered. Not a big deal unfortunately, this trick achieves about 0.5% exponent extension for a given FFT size. This was motivated by the severe lack of FFT size choice in openOwl. (cudaOwl doesn't have "offset").

- cudaOwl has a rich choice of FFT sizes (unlike openOwl). FFT selection is controlled with the "-fft" argument, allowing to specify hard sizes such as 4096K or 4M, or delta steps from the "default" size for the exponent, such as +1 or -1.

A few nice things:
- it's possible to switch the savefile between CUDA/OpenCL in midflight.
- it's possible to change the FFT size in midflight.

Not so nice:
the performance on GTX 1080 is disappointing. 5.9ms/it at the PRP wavefront, 4480K FFT. (thus I don't think it's such a good idea to do PRP or LL on Nvidia yet. Probably TF is a better fit for the 32bit-oriented hardware).

Last fiddled with by preda on 2018-06-25 at 10:46
preda is offline   Reply With Quote