cudaOwl
I added an initial CUDA backend to gpuOwl. I expect this to be rough, buggy and not-optimized yet, but it's a start.
The approach I ended with was to use most of the same codebase, but split out two backends, OpenCL and CUDA.
[I'm thinking, should I rename the previous gpuOwl to openOwl for symmetry with cudaOwl?]
So, the savefile format, and much of the logic, is shared between the cudaOwl and gpuOwl.
There are some notable differences though:
- gpuOwl supports "offset extension", which means varying the offset (aka "shift") when a PRP error is encountered. Not a big deal unfortunately, this trick achieves about 0.5% exponent extension for a given FFT size. This was motivated by the severe lack of FFT size choice in openOwl. (cudaOwl doesn't have "offset").
- cudaOwl has a rich choice of FFT sizes (unlike openOwl). FFT selection is controlled with the "-fft" argument, allowing to specify hard sizes such as 4096K or 4M, or delta steps from the "default" size for the exponent, such as +1 or -1.
A few nice things:
- it's possible to switch the savefile between CUDA/OpenCL in midflight.
- it's possible to change the FFT size in midflight.
Not so nice:
the performance on GTX 1080 is disappointing. 5.9ms/it at the PRP wavefront, 4480K FFT. (thus I don't think it's such a good idea to do PRP or LL on Nvidia yet. Probably TF is a better fit for the 32bit-oriented hardware).
Last fiddled with by preda on 2018-06-25 at 10:46
|