Originally Posted by petrw1 View Post
I realize P1 as a separate task is discontinued ... however ...

I am still running the version that allows it:
Does it seems reasonable that for the various Colab GPUs available I am seeing relative Stage1 iteration times of (based on my specific B1 but still relative):

P4: 3,600
T4: 2,630
K80: 1,800
P100: 470 (yes 4 to 8 times faster)
us/iteration for ~100M exponents? Time required for any fft-based multiplication mod m is strongly related to log2(m); roughly p1.1 for Mersenne number m=2p-1. Some data for Colab gpus at, showing the P4 & T4 have 1/32 SP/DP ratio, making them better suited for TF, not well suited for LL, PRP, P-1.
