Edit to above:
There was no need to do the Lucas test at all. If it is a 2-PSP it is known to be composite. You could check if it is a 2-PSP before doing the PRP test. I would probably do this by adding the list of 2-PSPs into the sieve.

Also, it might be inefficient to do a SPRP test and Lucas test in the GPU. The parallel threads in a block need to execute the same instruction so some threads would probably sit idle while others were doing a Lucas test.
