mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   Relative performance of GPUs for P1 (https://www.mersenneforum.org/showthread.php?t=26087)

petrw1 2020-10-16 04:53

Relative performance of GPUs for P1
 
I realize P1 as a separate task is discontinued ... however ...

I am still running the version that allows it:
Does it seems reasonable that for the various Colab GPUs available I am seeing relative Stage1 iteration times of (based on my specific B1 but still relative):

P4: 3,600
T4: 2,630
K80: 1,800
P100: 470 (yes 4 to 8 times faster)

tServo 2020-10-16 12:15

[QUOTE=petrw1;560015]I realize P1 as a separate task is discontinued ... however ...

I am still running the version that allows it:
Does it seems reasonable that for the various Colab GPUs available I am seeing relative Stage1 iteration times of (based on my specific B1 but still relative):

P4: 3,600
T4: 2,630
K80: 1,800
P100: 470 (yes 4 to 8 times faster)[/QUOTE]

Yes, these times make perfect sense.

Neither the P4 nor the T4 have many FP64 cores available. These cores are essential for performance doing Stage1. Their specs are fairly close but since the T4 is newer with faster memory & a few other things, it should be faster than the P4.
Even tho the K80 is quite old, it still has decent FP64 performance AND it has 2 GPUs.
The P100 has lots of FP64 cores and they will yield the best performance.

AFAIK the P4 and T4 are touted as being designed explicitly for training AIs since they do not require high percision computations.

chalsall 2020-10-16 13:46

[QUOTE=tServo;560044]Even tho the K80 is quite old, it still has decent FP64 performance AND it has 2 GPUs.[/QUOTE]

The K80 itself has two (2#) GPUs on the card, but only one (1#) is given to each VM.

kriesel 2020-10-16 14:12

[QUOTE=petrw1;560015]I realize P1 as a separate task is discontinued ... however ...

I am still running the version that allows it:
Does it seems reasonable that for the various Colab GPUs available I am seeing relative Stage1 iteration times of (based on my specific B1 but still relative):

P4: 3,600
T4: 2,630
K80: 1,800
P100: 470 (yes 4 to 8 times faster)[/QUOTE]
us/iteration for ~100M exponents? Time required for any fft-based multiplication mod m is strongly related to log2(m); roughly p[SUP]1.1[/SUP] for Mersenne number m=2[SUP]p[/SUP]-1. Some data for Colab gpus at [url]https://www.mersenneforum.org/showpost.php?p=533245&postcount=15[/url], showing the P4 & T4 have 1/32 SP/DP ratio, making them better suited for TF, not well suited for LL, PRP, P-1.

petrw1 2020-10-16 18:19

[QUOTE=kriesel;560057]us/iteration for ~100M exponents?[/QUOTE]

44.6M
B1=1,250,000
B2=25,000,000


All times are UTC. The time now is 11:57.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.