mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2016-02-21, 22:21   #1
firejuggler
 
firejuggler's Avatar
 
Apr 2010
Over the rainbow

2·32·137 Posts
Default Nvidia Pascal, a third of DP

http://www.techtimes.com/articles/13...sing-power.htm

Based on a number of slides from an independent researcher, the Nvidia Pascal GPU100 features Stacked DRAM (1 TB/s) giving it as much as 12 TFLOPs of Single-Precision (FP32) compute performance. The flagship GPU is purportedly able to provide four TFLOPs of Double-Precision (FP64) compute performance as well.

Last fiddled with by wblipp on 2016-02-21 at 23:40
firejuggler is online now   Reply With Quote
Old 2016-02-22, 02:13   #2
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

22·23·97 Posts
Default

Yarrrr !!!
LaurV is offline   Reply With Quote
Old 2016-02-22, 04:04   #3
0PolarBearsHere
 
0PolarBearsHere's Avatar
 
Oct 2015

2×7×19 Posts
Default

Quote:
So far, we know the following aspects about Nvidia's upcoming flagship Pascal GP100 graphics processing unit:
- Pascal graphics architecture
- When we compare it to Maxwell, Pascal rolls two-fold performance per watt
- Successor to the GM200 GPU found in the GTX Titan X and GTX 980 Ti
- Features about 17 billion transistors, two times more than the GM200
- Uses 16nm FinFET base from TSMC
- 4 Mb bus interface, comparable to the Fiji GPU power from the AMD Fury models
- Has half-precision FP16 compute at double the rate of full-precision FP32
- Will sport four 4-Hi HBM2 stacks, giving it 16 GB of VRAM, as well as 8-Hi stacks mounting to 32 GB for professional computations in SKUs
- It will sport exclusive compatibility with next-gen IBM PowerPC server processors due to NVLink
- DirectX 12 feature will be at 12_1 or higher levels
- Launch is scheduled for the second half of 2016
So it'll have more VRAM than many computers have normal RAM.
0PolarBearsHere is offline   Reply With Quote
Old 2016-02-22, 07:58   #4
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

32×331 Posts
Default

12 TFLOPS FP32 and 4 TFLOPS FP64 *drool* It will be hard to decide whether to run factoring or LL on it.

Quote:
- Has half-precision FP16 compute at double the rate of full-precision FP32
Can FP16 be used for anything useful? That would be crazy at ~24 TFLOPS.
ATH is offline   Reply With Quote
Old 2016-02-22, 08:37   #5
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

213348 Posts
Default

Quote:
Originally Posted by ATH View Post
Can FP16 be used for anything useful? That would be crazy at ~24 TFLOPS.
Games... 3D rotations, translation, tesselation.. whatever...
Trial Factoring... (maybe.... you can use 8 of those to keep the precision of 80 to 88 bits, depending of content, but you will need about 72 multiplications to multiply those 8x8 "digits", with some karatsuba-like stuff (edit: can it be done in 72 multiplications?? don't forget that you multiply 11 bits times 11 bits and get 11 bits result, not 22 bits), so you will only get a third of a teraflop, like a gtx 560 or so. OTOH, I assume they will make a kill at integer arithmetic too, so FP16 will not be the best choice for TF either ...)

Last fiddled with by LaurV on 2016-02-22 at 08:40
LaurV is offline   Reply With Quote
Old 2016-02-22, 09:27   #6
axn
 
axn's Avatar
 
Jun 2003

52·191 Posts
Default

I am gonna go out on a limb and predict that there will be no consumer/prosumer version that offers 4 TFLOPS DP (i.e. not even a 1000$ Titan variety will offer 4 TFLOPS)
axn is offline   Reply With Quote
Old 2016-02-22, 12:23   #7
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

18916 Posts
Default

Looking at previous high rate DP cards and their release dates:
R9 280X, ~1 TFLOP, Oct. 2013
Titan ~1.5 TFLOP, Feb. 2013
Titan Black 1.7 TFLOP, Feb. 2014

Could they manage to get it up to 4 in 2 years? I think there's more than a possibility they can, if they want to, in a higher end card. Especially now they're finally moving onto smaller manufacturing process again.
mackerel is offline   Reply With Quote
Old 2016-02-22, 12:43   #8
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

32×331 Posts
Default

Quote:
Originally Posted by axn View Post
I am gonna go out on a limb and predict that there will be no consumer/prosumer version that offers 4 TFLOPS DP (i.e. not even a 1000$ Titan variety will offer 4 TFLOPS)
Yeah unfortunately if it seems too good to be true, it most often is. But we can hope....
ATH is offline   Reply With Quote
Old 2016-02-22, 12:47   #9
axn
 
axn's Avatar
 
Jun 2003

52×191 Posts
Default

Quote:
Originally Posted by mackerel View Post
Could they manage to get it up to 4 in 2 years?
Yes, they could.
Quote:
Originally Posted by mackerel View Post
if they want to
I am saying they don't want to, since it will cannibalize their compute line of offerings.
axn is offline   Reply With Quote
Old 2016-02-22, 15:19   #10
tServo
 
tServo's Avatar
 
"Marv"
May 2009
near the Tannhäuser Gate

2×271 Posts
Default

Quote:
Originally Posted by axn View Post
I am gonna go out on a limb and predict that there will be no consumer/prosumer version that offers 4 TFLOPS DP (i.e. not even a 1000$ Titan variety will offer 4 TFLOPS)
I agree completely with AXN on this, as I explained in my post of a week ago:
http://www.mersenneforum.org/showpos...&postcount=604

Do you want 3 Tflops FP64 from an Nvidia board? It's been available for a year on their K80 Tesla! The catch is it costs 5,000 dollars and requires intense, cooling found in servers located in frigid computer rooms. You can bet your boots Pascal chips with gobs of FP64 are destined for Teslas and not for the great unwashed masses ( us ). Even Nvidia's expected April announcement will probably be a "tease" in that regard.

BTW, FP16 is there for Deep Learning Neural Nets, which is the hottest thing in AI right now. Researchers have done some truly amazing things with these such as driving cars and beating a GO master. Nvidia has very nice libraries for these. They require zillions of small FP values for all the weights used during training. They can tolerate loss of precision FP16 provides and are willing to trade that for having twice as many in memory as FP32 values.
tServo is offline   Reply With Quote
Old 2016-02-22, 15:20   #11
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

18916 Posts
Default

Go on then, I'll take the optimistic route that they will put this in a consumer device, perhaps a future Titan something.

The fastest single-chip compute device they do is the K40, which appears to use the same chip as the Titan Black. They can still differentiate between the products in other ways. It's not like they're going to stand still in compute either, and they can't risk AMD not crippling their offering and looking bad.
mackerel is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Pascal's OPN roadblock files ThomRuley Factoring 794 2020-11-23 15:29
Passive Pascal Xyzzy GPU Computing 1 2017-05-17 20:22
Tesla P100 — 5.4 DP TeraFLOPS — Pascal Mark Rose GPU Computing 52 2016-07-02 12:11
Calculating perfect numbers in Pascal Elhueno Homework Help 5 2008-06-12 16:37
No Notice- Binomial Coefficients, Pascal's triangle Vijay Miscellaneous Math 5 2005-04-09 20:36

All times are UTC. The time now is 19:58.

Mon Nov 23 19:58:06 UTC 2020 up 74 days, 17:09, 3 users, load averages: 2.53, 2.34, 2.43

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.