![]() |
![]() |
#2520 | |
"Viliam Furík"
Jul 2018
Martin, Slovakia
17916 Posts |
![]() Quote:
FP32 TFLOPS for RTX 2080Ti is 11,75 TFLOPS, which translates to 5875 GHz-D/D, which really is the most I can observe on stock settings. |
|
![]() |
![]() |
![]() |
#2521 |
Jul 2009
Germany
22316 Posts |
![]()
This one have similiar speed to GeForce GTX 980 Ti... (If I have all the comparison values together, I should create a Top 100 ranking list.)
Code:
020-10-21 23:33:27 Tesla T4-0 OpenCL compilation in 1.81 s 2020-10-21 23:33:29 Tesla T4-0 77936867 OK 0 loaded: blockSize 400, 0000000000000003 2020-10-21 23:33:29 Tesla T4-0 validating proof residues for power 8 2020-10-21 23:33:29 Tesla T4-0 Proof using power 8 2020-10-21 23:33:34 Tesla T4-0 77936867 OK 800 0.00%; 4247 us/it; ETA 3d 19:57; 1579c241dc63eca6 (check 1.82s) 2020-10-21 23:47:52 Tesla T4-0 77936867 OK 200000 0.26%; 4299 us/it; ETA 3d 20:50; f0b04b45b0855bd2 (check 1.85s) 2020-10-22 00:02:15 Tesla T4-0 77936867 OK 400000 0.51%; 4304 us/it; ETA 3d 20:43; c03f94396a5aa29e (check 1.85s) 2020-10-22 00:16:37 Tesla T4-0 77936867 OK 600000 0.77%; 4300 us/it; ETA 3d 20:22; b9decd65ca71b629 (check 1.84s) Last fiddled with by moebius on 2020-10-22 at 02:05 |
![]() |
![]() |
![]() |
#2522 | |
"/X\(‘-‘)/X\"
Jan 2013
55618 Posts |
![]() Quote:
In the RTX 30xx series is the same, but the INT32 cores can also do FP32, so it can give up to double the FP32 performance of the RTX 20xx series, but only equivalent INT32 performance for the same number of cores at the same frequency. |
|
![]() |
![]() |
![]() |
#2523 | |
"Viliam Furík"
Jul 2018
Martin, Slovakia
13×29 Posts |
![]() Quote:
But shouldn't then the code be reworked to work with FP32? It seems like it should work - has a lot higher maximum value. Thus could potentially extend the range for the maximal exponent. (If so, please remove the minimal limit, too.) This above is my view on how it could work, I may be absolutely wrong. If it would be successfully reworked, and the DPbySP experiment turns out to also be successful, GIMPS would buy out all RTX 3080s and RTX 3090s (those maybe not, very expensive) within few days. |
|
![]() |
![]() |
![]() |
#2524 |
"Composite as Heck"
Oct 2017
761 Posts |
![]()
There is potential, it's been discussed a little on the forum but from the sounds of it it's not straightforward. There's no rush to buy or to experiment with an implementation, it's not like the R7 which may only have had a production run measured in tens of thousands, there will eventually be millions of the 30 series.
You may be mildly overestimating the buying power of GIMPSters ;) |
![]() |
![]() |
![]() |
#2525 | |
"Eric"
Jan 2018
USA
22·53 Posts |
![]() Quote:
Stock: 1600MHz Core, 850MHz HBM2 memory, 250W Code:
gpuowl-win -prp 77936867 -maxAlloc 8192 -nospin 2020-10-22 19:30:10 gpuowl v7.0-66-gebe49cc 2020-10-22 19:30:10 Note: not found 'config.txt' 2020-10-22 19:30:10 config: -prp 77936867 -maxAlloc 8192 -nospin 2020-10-22 19:30:10 device 0, unique id '' 2020-10-22 19:30:10 TITAN V-0 77936867 FFT: 4M 1K:8:256 (18.58 bpw) 2020-10-22 19:30:10 TITAN V-0 77936867 OpenCL args "-DEXP=77936867u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=8u -DCARRY64=1 -DCARRYM64=1 -DMM_CHAIN=1u -DMM2_CHAIN=2u -DMAX_ACCURACY=1 -DWEIGHT_STEP_MINUS_1=0xa.c42d0d7cec038p-5 -DIWEIGHT_STEP_MINUS_1=-0x8.0e50c8817ddf8p-5 -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only " 2020-10-22 19:30:10 TITAN V-0 77936867 2020-10-22 19:30:10 TITAN V-0 77936867 OpenCL compilation in 0.01 s 2020-10-22 19:30:10 TITAN V-0 77936867 maxAlloc: 8.0 GB 2020-10-22 19:30:10 TITAN V-0 77936867 P1(0) 0 bits 2020-10-22 19:30:10 TITAN V-0 77936867 PRP starting from beginning 2020-10-22 19:30:10 TITAN V-0 77936867 OK 0 loaded: blockSize 400, 0000000000000003 2020-10-22 19:30:10 TITAN V-0 77936867 validating proof residues for power 8 2020-10-22 19:30:10 TITAN V-0 77936867 Proof using power 8 2020-10-22 19:30:11 TITAN V-0 77936867 OK 800 0.00% 1579c241dc63eca6 596 us/it + check 0.27s + save 0.11s; ETA 12:54 2020-10-22 19:30:16 TITAN V-0 77936867 10000 0.01% fc4f135f7cf4ad29 588 us/it 2020-10-22 19:30:22 TITAN V-0 77936867 20000 0.03% 3cd1bd9d5e09cbc5 589 us/it 2020-10-22 19:30:28 TITAN V-0 77936867 30000 0.04% c4e0ff35e3290d98 590 us/it 2020-10-22 19:30:34 TITAN V-0 77936867 40000 0.05% dffe1b1b0d748128 590 us/it 2020-10-22 19:30:40 TITAN V-0 77936867 50000 0.06% 52e286945371ed29 590 us/it 2020-10-22 19:30:46 TITAN V-0 77936867 60000 0.08% 0945da4dc08bdd95 590 us/it 2020-10-22 19:30:52 TITAN V-0 77936867 70000 0.09% 7131fa4eb77f4bb2 590 us/it 2020-10-22 19:30:58 TITAN V-0 77936867 80000 0.10% 8d76071d27ee4221 591 us/it 2020-10-22 19:31:04 TITAN V-0 77936867 90000 0.12% 0bacff453b2f470e 590 us/it 2020-10-22 19:31:10 TITAN V-0 77936867 100000 0.13% 6d7296b9e2830f50 591 us/it 2020-10-22 19:31:12 TITAN V-0 77936867 Stopping, please wait.. 2020-10-22 19:31:13 TITAN V-0 77936867 OK 104400 0.13% 587552d3b9350467 592 us/it + check 0.27s + save 0.11s; ETA 12:48 2020-10-22 19:31:13 TITAN V-0 Exiting because "stop requested" 2020-10-22 19:31:13 TITAN V-0 Bye Code:
gpuowl-win -prp 77936867 -maxAlloc 8192 -nospin 2020-10-22 19:34:11 gpuowl v7.0-66-gebe49cc 2020-10-22 19:34:11 Note: not found 'config.txt' 2020-10-22 19:34:11 config: -prp 77936867 -maxAlloc 8192 -nospin 2020-10-22 19:34:11 device 0, unique id '' 2020-10-22 19:34:11 TITAN V-0 77936867 FFT: 4M 1K:8:256 (18.58 bpw) 2020-10-22 19:34:11 TITAN V-0 77936867 OpenCL args "-DEXP=77936867u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=8u -DCARRY64=1 -DCARRYM64=1 -DMM_CHAIN=1u -DMM2_CHAIN=2u -DMAX_ACCURACY=1 -DWEIGHT_STEP_MINUS_1=0xa.c42d0d7cec038p-5 -DIWEIGHT_STEP_MINUS_1=-0x8.0e50c8817ddf8p-5 -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only " 2020-10-22 19:34:11 TITAN V-0 77936867 2020-10-22 19:34:11 TITAN V-0 77936867 OpenCL compilation in 0.01 s 2020-10-22 19:34:11 TITAN V-0 77936867 maxAlloc: 8.0 GB 2020-10-22 19:34:11 TITAN V-0 77936867 P1(0) 0 bits 2020-10-22 19:34:11 TITAN V-0 77936867 PRP starting from beginning 2020-10-22 19:34:12 TITAN V-0 77936867 OK 0 loaded: blockSize 400, 0000000000000003 2020-10-22 19:34:12 TITAN V-0 77936867 validating proof residues for power 8 2020-10-22 19:34:12 TITAN V-0 77936867 Proof using power 8 2020-10-22 19:34:12 TITAN V-0 77936867 OK 800 0.00% 1579c241dc63eca6 500 us/it + check 0.23s + save 0.11s; ETA 10:49 2020-10-22 19:34:17 TITAN V-0 77936867 10000 0.01% fc4f135f7cf4ad29 494 us/it 2020-10-22 19:34:22 TITAN V-0 77936867 20000 0.03% 3cd1bd9d5e09cbc5 495 us/it 2020-10-22 19:34:27 TITAN V-0 77936867 30000 0.04% c4e0ff35e3290d98 496 us/it 2020-10-22 19:34:32 TITAN V-0 77936867 40000 0.05% dffe1b1b0d748128 497 us/it 2020-10-22 19:34:37 TITAN V-0 77936867 50000 0.06% 52e286945371ed29 497 us/it 2020-10-22 19:34:42 TITAN V-0 77936867 60000 0.08% 0945da4dc08bdd95 498 us/it 2020-10-22 19:34:47 TITAN V-0 77936867 70000 0.09% 7131fa4eb77f4bb2 499 us/it 2020-10-22 19:34:52 TITAN V-0 77936867 80000 0.10% 8d76071d27ee4221 499 us/it 2020-10-22 19:34:57 TITAN V-0 77936867 90000 0.12% 0bacff453b2f470e 500 us/it 2020-10-22 19:35:02 TITAN V-0 77936867 100000 0.13% 6d7296b9e2830f50 500 us/it 2020-10-22 19:35:07 TITAN V-0 77936867 110000 0.14% 8cbfd4435622bda7 500 us/it 2020-10-22 19:35:08 TITAN V-0 77936867 Stopping, please wait.. 2020-10-22 19:35:09 TITAN V-0 77936867 OK 113600 0.15% fb675f1fc2063c9b 501 us/it + check 0.23s + save 0.11s; ETA 10:50 2020-10-22 19:35:09 TITAN V-0 Exiting because "stop requested" 2020-10-22 19:35:09 TITAN V-0 Bye It seems that the new version doesn't let me use CARRY32, which the older 6.11 version did and appears to run faster. Here's the result for 6.11 on the same exponent Code:
gpuowl -device 0 -carry short -use CARRY32,ORIG_SLOWTRIG,IN_WG=128,IN_SIZEX=16,IN_SPACING=4,OUT_WG=128,OUT_SIZEX=16,OUT_SPACING=4 -nospin -block 100 -maxAlloc 10000 -B1 750000 -rB2 20 -prp 77936867 2020-10-22 19:36:40 gpuowl v6.11-364-g36f4e2a 2020-10-22 19:36:40 Note: not found 'config.txt' 2020-10-22 19:36:40 config: -device 0 -carry short -use CARRY32,ORIG_SLOWTRIG,IN_WG=128,IN_SIZEX=16,IN_SPACING=4,OUT_WG=128,OUT_SIZEX=16,OUT_SPACING=4 -nospin -block 100 -maxAlloc 10000 -B1 750000 -rB2 20 -prp 77936867 2020-10-22 19:36:40 device 0, unique id '' 2020-10-22 19:36:40 TITAN V-0 77936867 FFT: 4M 1K:8:256 (18.58 bpw) 2020-10-22 19:36:40 TITAN V-0 Expected maximum carry32: 583B0000 2020-10-22 19:36:40 TITAN V-0 OpenCL args "-DEXP=77936867u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=8u -DPM1=0 -DMM_CHAIN=1u -DMM2_CHAIN=2u -DMAX_ACCURACY=1 -DWEIGHT_STEP_MINUS_1=0xa.c42d0d7cec038p-5 -DIWEIGHT_STEP_MINUS_1=-0x8.0e50c8817ddf8p-5 -DCARRY32=1 -DIN_SIZEX=16 -DIN_SPACING=4 -DIN_WG=128 -DORIG_SLOWTRIG=1 -DOUT_SIZEX=16 -DOUT_SPACING=4 -DOUT_WG=128 -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only " 2020-10-22 19:36:40 TITAN V-0 2020-10-22 19:36:40 TITAN V-0 OpenCL compilation in 0.01 s 2020-10-22 19:36:40 TITAN V-0 77936867 OK 0 loaded: blockSize 100, 0000000000000003 2020-10-22 19:36:40 TITAN V-0 validating proof residues for power 8 2020-10-22 19:36:40 TITAN V-0 Proof using power 8 2020-10-22 19:36:41 TITAN V-0 77936867 OK 200 0.00%; 502 us/it; ETA 0d 10:53; 2619e0f0cb78fe50 (check 0.09s) 2020-10-22 19:38:16 TITAN V-0 77936867 OK 200000 0.26%; 478 us/it; ETA 0d 10:19; f0b04b45b0855bd2 (check 0.20s) 2020-10-22 19:39:52 TITAN V-0 77936867 OK 400000 0.51%; 480 us/it; ETA 0d 10:21; c03f94396a5aa29e (check 0.09s) 2020-10-22 19:40:50 TITAN V-0 Stopping, please wait.. 2020-10-22 19:40:50 TITAN V-0 77936867 OK 519700 0.67%; 480 us/it; ETA 0d 10:20; 19d648e17333ad91 (check 0.09s) 2020-10-22 19:40:50 TITAN V-0 Exiting because "stop requested" 2020-10-22 19:40:50 TITAN V-0 Bye Last fiddled with by xx005fs on 2020-10-23 at 02:47 |
|
![]() |
![]() |
![]() |
#2526 |
Jul 2009
Germany
547 Posts |
![]()
Thank you very much, according to my expectations, the Titan V is so far the second best with 478 us/it to 442 us/it compared to a Tesla V100-SXM2-16GB. I'm already working on an application-oriented top list for gpuowl, which I will publish here in the forum.
|
![]() |
![]() |
![]() |
#2527 | |
P90 years forever!
Aug 2002
Yeehaw, FL
11100101101102 Posts |
![]() Quote:
undervolted, underclocked to sclk=3, mem overclocked to 1200: Code:
2020-10-23 04:23:47 gfx906+sram-ecc-0 77936867 OK 800 0.00%; 556 us/it; ETA 0d 12:02; 1579c241dc63eca6 (check 0.39s) 2020-10-23 04:24:04 gfx906+sram-ecc-0 77936867 OK 30000 0.04%; 561 us/it; ETA 0d 12:08; c4e0ff35e3290d98 (check 0.39s) 2020-10-23 04:24:21 gfx906+sram-ecc-0 77936867 OK 60000 0.08%; 560 us/it; ETA 0d 12:07; 0945da4dc08bdd95 (check 0.39s) Code:
2020-10-23 04:30:52 gfx906+sram-ecc-0 77936867 OK 270000 0.35%; 985 us/it; ETA 0d 21:15; dc349756c5f05abf (check 0.57s) 2020-10-23 04:31:01 gfx906+sram-ecc-0 77936867 OK 270000 0.35%; 986 us/it; ETA 0d 21:16; dc349756c5f05abf (check 0.57s) 2020-10-23 04:32:22 gfx906+sram-ecc-0 77936867 OK 360000 0.46%; 985 us/it; ETA 0d 21:14; 992df79b843f90de (check 0.57s) 2020-10-23 04:32:32 gfx906+sram-ecc-0 77936867 OK 360000 0.46%; 985 us/it; ETA 0d 21:14; 992df79b843f90de (check 0.57s) undervolted, underclocked (slightly) to sclk=4, mem overclocked to 1200: Code:
2020-10-23 04:26:43 gfx906+sram-ecc-0 77936867 OK 90000 0.12%; 526 us/it; ETA 0d 11:22; 0bacff453b2f470e (check 0.38s) 2020-10-23 04:26:47 gfx906+sram-ecc-0 77936867 OK 97200 0.12%; 525 us/it; ETA 0d 11:22; ddaaad369befab47 (check 0.36s) Code:
2020-10-23 04:27:51 gfx906+sram-ecc-0 77936867 OK 150000 0.19%; 920 us/it; ETA 0d 19:53; 127631386c6a9b17 (check 0.55s) 2020-10-23 04:28:01 gfx906+sram-ecc-0 77936867 OK 150000 0.19%; 920 us/it; ETA 0d 19:53; 127631386c6a9b17 (check 0.54s) 2020-10-23 04:28:19 gfx906+sram-ecc-0 77936867 OK 180000 0.23%; 920 us/it; ETA 0d 19:53; 6bee5d054f770861 (check 0.54s) 2020-10-23 04:28:29 gfx906+sram-ecc-0 77936867 OK 180000 0.23%; 920 us/it; ETA 0d 19:53; 6bee5d054f770861 (check 0.56s) |
|
![]() |
![]() |
![]() |
#2528 | |
Romulan Interpreter
Jun 2011
Thailand
5×17×109 Posts |
![]() Quote:
Say for example you want to rewrite mfaktc (which uses int32) to use FP32, to speed it up in some cards which have "pure FP32" hardware. For the most of the cards, the same units do either integer, either fp32 processing, so you won't get anything, but some gaming cards have dedicated fp32 cores inside, which suck at integer arithmetic, and you may get a speedup doing so. But... A 32 bit register can only store a number of 2^32 different values, regardless of how you "see" this register (i.e. regardless of the codification you associate to it). In the "unsigned int32" codification, you can put there a number from 0 to 2^32-1 exactly, i.e. lossless. Without losing information. It means, when you write 89, yo read back 89. In the "fp32" codification, you can only put there a much lower number of numbers from this range, lossless. Actually, only about 0.4% of them can be stored exact. For all the other "larger" numbers (or smaller than 1, fractional, by the way), you write "x", but when you read back, you read an "x+epsilon" or "x-epsilon". The codification is not "exact". It is the same idea as when you count to a hundred, yo do it one by one, but when you get higher, you say "few hundred", or "few thousand", or "the budget of this project is about five millions and half", you are not anymore interested on the exact value, and look only to the most significant digits, as many as you can remember (store in your "space" in your brain). That's not useful for integer arithmetic, you will need to use two FP32 registers, to store the same information as you store in one int32 register, and that is worth only if you can achieve a double speed (well, about, in rough terms, the things are more complex than that). All the issue is the fact that, in 32 bit floats, numbers are represented as "sign*1.fraction*2^exponent", where the sign, fraction, and exponent are stored inside of the 32 bit register, therefore they take 32 bits in total, but their positions and sizes are fixed. As the sign is 1 bit, you can only have 8 bits for the exponent, and 23 bits for the fraction. Therefore, you can represent a very large number, like 618970019642690137449562112 (which is 2^89), by setting the exponent to 89 and the fraction to zero, but you will not be able to store the most of the numbers in between, like for example 33556688, which is just a 25 bit number. If you google "the smallest positive integer that can't be stored in fp32" (or just go to wikipedia and read the theory), you will find out a lot of interesting things. For a smaller scale, imagine you have a 3 bit register. You can store inside a number between 000 binary (decimal zero) and 111 binary (decimal 7). You can see this as an "unsigned integer on 3 bits", and then the information inside represents a number between 0 and 7, in order in binary: 000=0, 001=1, 010=2, 011=3, 100=4, 101=5, 110=6, 111=7. No other possibility. You can also consider this as "signed integer on 3 bits", and in that case, you need a bit to store the sign, let's consider first bit is for sign, then the larger integer you can store there will be 3 (using the two remaining bits) and your values will be, in order: 100=-4, 101=-3, 110=-2, 111=-1, 000=0, 001=1, 010=2, 011=3, there is no other possibility (and yes, there is a reason to put them in that order, to have the additions and multiplications work properly, without changing the addition and multiplication rules). You could see the 3 bits also like a "unsigned float on 3 bits", and in that case, the information inside will represent (I use letters for decimal numbers to avoid confusion with 0 and 1 binary): 000=100=zero*, 110=0.25, 111=0.5, 001=one, 010=3, 011=7. The advantage is that you can store "higher numbers", as well as numbers which are not integers, but you lose the accuracy, as you can't store all the numbers in between. To store the integer 5 exactly, you will need two of these "3 bit registers". So. here you can store a "larger" number (as well as a smaller, fractional) compared with unsigned integer, but every time you will write a 4, you will read back a 3, and every time you will write a 6 you will read back a 7. But yes, you can store a "larger" number, for sure. ![]() -------- *Edit: note that here you have 2 possibilities to store the value "zero", this is deliberate, because floats, in theory, NEVER represent exact values, therefore you may consider zero as being an infinitesimal small value, and it makes sense to have a positive and a negative one (like an "epsilon", in math, or even in programming). Last fiddled with by LaurV on 2020-10-23 at 07:55 |
|
![]() |
![]() |
![]() |
#2529 |
Jul 2009
Germany
547 Posts |
![]()
Thanks for the trouble, I'll take the best value for one instance,because it should be a fair comparison. It is only important that gpuowl runs stable without errors with the selected settings.
Last fiddled with by moebius on 2020-10-23 at 05:22 |
![]() |
![]() |
![]() |
#2530 | |
Aug 2020
37 Posts |
![]() Quote:
108980089 and the result was refused, though it was assigned to me through Primenet. As I have seen the name of the PRP tester mentioned before, did this proof certification succeed? |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1668 | 2020-12-22 15:38 |
GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
Testing an expression for primality | 1260 | Software | 17 | 2015-08-28 01:35 |
Testing Mersenne cofactors for primality? | CRGreathouse | Computer Science & Computational Number Theory | 18 | 2013-06-08 19:12 |
Primality-testing program with multiple types of moduli (PFGW-related) | Unregistered | Information & Answers | 4 | 2006-10-04 22:38 |