It probably wouldn't be too hard to convert my Fermat code to a MR test. The time complexity should be the same. I think the O(log
2n*loglog n) you wrote for Fermat uses an FFT multiply which is only faster for very large numbers.
http://www.janfeitsma.nl/math/psp2/statistics
From the Fietsma list there is about 1 2-PRP every 3E11 from 1E19 to 2
64. This is about 4x more than the 2-SPRP. I think there will be about 500 gaps >= 1400 from 2
64 to 2
65 and each will require about 70 prime checks on average. The chance that a gap above 1400 from 2
64 to 2
65 is missed due to a PRP is about 1 in 3E11/500/70 = 1 in 8E6.
I'm currently saving gaps >= 1300 using an approach similar to yours. All gaps found in the GPU above 1300 are sent back to the CPU for a more thorough check using the gmp library.