Originally Posted by diep View Post
Yes i want search a couple of low weights at same time there, because for my gpgpu proggie i wrote half a dozen of them can be sieved at same time on the gpu. The small primes p smaller than 64 bits (63 bits in length and shorter - though for 64 bits i intend make a special kernel) get generated then fed to gpu where i wrote some code to sieve for Nvidia GPU's.

The slowest thing is the thing i didn't write - generating the small primes on the CPU. Though i did write a siever for cpu it's not ready production usage and it's single core and not using SSE2 (let alone AVX) versus what's there on the net is with SSE2 (SSSE on my oldie Xeons) and such great optimizations.

After that LLR.

Maybe i should revive my siever for cpu there and optimize it to feed faster small primes than a perfect siever there.
Is that code public? I wonder if Mark Rodenkirch (rogue, the maintainer of srsieve) would want to take a peek at that. He didn't seem to think that a GPU Riesel siever (or even AVX) would be viable due to the inconsistency in the baby-step giant-step discrete log runtimes.

