![]() |
![]() |
#1 |
"Mihai Preda"
Apr 2015
2·23·29 Posts |
![]()
In light of Nvidia's new GPU launch, it appears we need to find a way of doing big convolutions using SP FP (FP32). This has been an elusive task in the past.
That new GPU has 2x FP32 vs. INT32, and 64x FP32 vs. FP64. |
![]() |
![]() |
![]() |
#2 |
"Mihai Preda"
Apr 2015
2×23×29 Posts |
![]()
AKA "The Holy Grail" :)
|
![]() |
![]() |
![]() |
#3 |
Random Account
Aug 2009
U.S.A.
32·199 Posts |
![]()
Sorry, I cannot make a connection to 24-bit. FP32 seems to represent 32-bit. FP64 is 64-bit. 2x FP32 suggests 64-bit as wall. Would you care to elaborate a little?
|
![]() |
![]() |
![]() |
#4 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
113538 Posts |
![]() Last fiddled with by kriesel on 2020-09-19 at 00:26 |
![]() |
![]() |
![]() |
#5 |
"/X\(‘-‘)/X\"
Jan 2013
24·3·61 Posts |
![]()
Normal FP32 has 23 bits for the fraction component, 8 for the exponent, and 1 for the sign (+/-). The exponent bits effectively give one more bit of precision, either being all zero or not, meaning FP32 can do INT24 math.
|
![]() |
![]() |
![]() |
#6 |
"Mihai Preda"
Apr 2015
133410 Posts |
![]()
Some previous discussion:
https://www.mersenneforum.org/showthread.php?t=23926 |
![]() |
![]() |
![]() |
#7 |
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
579410 Posts |
![]()
It sounds like the additional memory usage(and hence memory bandwidth) may be an issue. Would 64x be enough that arithmetic using double-floats would be useful?
|
![]() |
![]() |
![]() |
#8 |
P90 years forever!
Aug 2002
Yeehaw, FL
2×41×89 Posts |
![]()
Years ago I toyed with using two or three 32-bit ints to create a 64 or 96-bit float (no exponent bits -- all mantissa).
I did enough work to prove to myself it was feasible and, at the time, would be about a fast as a double-precision FFT. As nVidia has lowered and lowered the DP-to-SP ratio, it would be a substantial winner now. An awful lot of code to write though. |
![]() |
![]() |
![]() |
#9 |
"Composite as Heck"
Oct 2017
3×5×72 Posts |
![]()
This post ( https://mersenneforum.org/showpost.p...4&postcount=85 ) suggests that the doubling of fp32 is because they upgraded the int32 units to also do fp32. If int32 and fp32 operations can be freely mixed or if the workload can be split into int32-only and fp32-only operations then there's more bits up for grabs. A split solution should also work on the 20 series as that can do fp32 and int32 concurrently but that is highly memory limited so there may not be a benefit.
|
![]() |
![]() |
![]() |
#10 | |
"Mihai Preda"
Apr 2015
24668 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#11 |
Random Account
Aug 2009
U.S.A.
32×199 Posts |
![]()
I got to thinking about color palettes. A 24-bit palette is capable of 16,777,215 unique values. This has been in use a long time. Before was 16-bit capable of only 65,536 colors. Whether this is any way relative to the discussion here, I don't know.
![]() |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
What does net neutrality mean for the future? | jasong | jasong | 1 | 2015-04-26 08:55 |
The future of Msieve | jasonp | Msieve | 23 | 2008-10-30 02:23 |
Future of Primes. | mfgoode | Lounge | 3 | 2006-11-18 23:43 |
The future of NFSNET | JHansen | NFSNET Discussion | 15 | 2004-06-01 19:58 |
15k Future? | PrimeFun | Lounge | 21 | 2003-07-25 02:50 |