I realize P1 as a separate task is discontinued ... however ...
I am still running the version that allows it:
Does it seems reasonable that for the various Colab GPUs available I am seeing relative Stage1 iteration times of (based on my specific B1 but still relative):
P4: 3,600
T4: 2,630
K80: 1,800
P100: 470 (yes 4 to 8 times faster)

us/iteration for ~100M exponents? Time required for any fftbased multiplication mod m is strongly related to log2(m); roughly p
^{1.1} for Mersenne number m=2
^{p}1. Some data for Colab gpus at
https://www.mersenneforum.org/showpo...5&postcount=15, showing the P4 & T4 have 1/32 SP/DP ratio, making them better suited for TF, not well suited for LL, PRP, P1.