mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2020-01-22, 01:00   #23
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22·1,109 Posts
Default

Quote:
Originally Posted by diep View Post
(yeah those small 193 nm lines or something which they optimistically call '7 nm technology').
193nm is the ArF excimer laser lithography light source wavelength for DUV lithography, related only by diffraction relations to produced feature sizes. 7nm uses EUV ~13.5nm wavelength light sources. https://en.wikipedia.org/wiki/Extrem...ut,_and_uptime
kriesel is online now   Reply With Quote
Old 2020-01-22, 03:14   #24
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

23·41 Posts
Default

Quote:
Originally Posted by diep View Post
(edit; the overclocking is for games - which is very short term. for gpgpu i would advice to not overclock at all of course - you burn up the wires of the gpu (yeah those small 193 nm lines or something which they optimistically call '7 nm technology').
I don't know where you get all this "information".

The TSMC 7nm process has a minimum metal pitch of 40 nm. Pitch means that you could have a wire of 20 nm, then an empty space of 20 nm between them, for example. But it is not the smallest feature on those chips. The FinFET gate has a fin pitch of 30 nm, and a fin width (at the top of the fin) of just 6 nm. The fin is somewhat thicker at its base because the structure is so high relative to its thickness.

Now, the specific process as used for AMD 7 nm products thus far (N7) only uses deep ultraviolet (DUV) lithography at 193 nm wavelength. There are several neat tricks that have been piled up on top the traditional lithography process. High numerical aperture optics, immersion lithography, and most recently multiple patterning. With just the first two, the resolution limit is about 36 nm. But with multiple patterning you can make much smaller structures on the chip, with the tradeoff of adding many processing steps for the densest layers. Also the resulting pattern fidelity suffers a bit.

The next step is extreme ultraviolet (EUV) lithography at 13.5 nm for the densest layers. But there were several technical obstacles to overcome before it could be put into production use. The masks are now reflective, and the light sources are highly inefficient and barely have enough power to make mass production viable. Problems with the photoresist (the material exposed to light) producing secondary electrons because of the high energy photons, and these electrons bounce around in the material and reduce resolution. Shot noise due to insufficient exposure, again it in effect reduces resolution. And so on. Also the introduction of EUV has been delayed for so long that multiple patterning may become necessary again, sooner than expected.

So the next step is TSMC 7 nm with EUV, N7+ process. The feature sizes haven't changed, but the better patterning fidelity still gives some density and performance advantages. It is not yet used for any AMD devices on the market, the first ones are likely to be the next generation of processors based on the Zen 3 architecture (Epyc "Milan").
nomead is offline   Reply With Quote
Old 2020-01-22, 20:17   #25
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

677 Posts
Default

Thanks for clearing that up kriesel.

Phil : yes as it seems they used the very much best gpu's they had produced from which we know they can pump out 6.7 Tflops double precision - as they had been tested to do that.

That's probably gpu's that were in the middle of the round silicon wafers. The center ones, which is a small minority, they clock historically far higher than what gets produced at the edges. Not seldom memory chips get produced there.

Now i don't know how new this 7 nm process from ASML is. Usually first few years there is massive improvements in production quality each so many months. Yet because all the mi50 gpu's had to be perfect they must've been produced in the center of wafers with maybe memory chips at the edges intended to get clocked a lot lower. This means there would be a possibility to clock them higher than 1.4Ghz - but this can only be done with watercooling. The modern processtechnology as i understood is supposed to be run nearby room temperature - read 19C. So you want to watercool it to nearby that temperature. A difference of some dozens of Celcius, or in short better watercooling means they eat up to 10% less power.

Now for us here the sad thing is of course that multiplications which get done massively is the thing eating most power and nowadays that 300 watt isn't called TDP anymore but TBD - in short it's 300 watt when running moderated loads. Even under perfect conditions running DWT's or FFT's on them that have been efficient implemented will easily overshoot with up to 200 watt.

So probably you want to watercool it anyway.

Aircooling is a problem. The amount of CFM that those tiny fans manage to push through those small ridges of the heatsinks is pretty little compared to what you want to push through.

Yet those initial 5000 gpu's are the interesting gpu's to get.

AMD threw in say 25 million dollar what they didn't make now - in order to promote radeon VII. Note intel historically threw in way more there during their glorydays. Arguably intel is still shining - so what AMD throws in is pretty much peanuts.

What i didn't realize but figured out past few days is that newer games - some of them also seem to profit from faster fp64.

So taking care the first batch of gpu's from which some will be used to benchmark at testsites it's easy to enable more fp64 resources.

As some of my software was in testsets and others used to test as a game at websites past 20 years (not so much past couple of years as i'm busy releasing a 3d printer now and moved into autonomeous robotics and autonomeous attack drones past 8 years), i did get logfiles back from them usually under the constraint i would keep those private until they had posted their article (which sometimes could take 6 months or so in some special cases). Without wanting to accuse any manufacturer of cheating - let's put it this way - they all have special teams for testing which prepare the hardware that gets shipped away to testers. This is expensive teams that use the most expensive fast low latencry ram. Not seldom CPU's or GPU's that get sold for couple of hundreds of bucks equipped with $10k RAM which you can't buy in a store practical spoken at the time they equip it with. A good example is start this century introduction of the P4's with hyperthreading.

The very first capable of that tested by Johan de Gelas.

The testmachien simply was 15% - 20% faster than anyone who bought 6 months up to a year later the same cpu (newer batches in fact).

The hyperthreading at all those p4's in the stores got 10% out of it (same version) whereas Johan de Gelas box on that clockrate mentionned got 20-25% out of it.

Effectively that box was 15-20% faster than anyone building it himself at home (edit: at the same reported clock) - and that wasn't beginners and also dudes with expensive RAM.

It wasn't the single core speed though that was that much faster. It was the hyperthreading that was so much faster. Unexplainable faster.
(edit it wasn't until many many years later with i7-990x watercooled overclocked to 4.5Ghz, very dubious for its cache latencies to overclock that much, which got a similar or better hyperthreading speed with 6 cores @ 12 threads though - which we could explain by that the chip was that much faster than the RAM could deliver data to it because of overclocking - so the logical explanation for Johan de Gelas timings would be that the chip in fact ran at a higher clock with 2 hyperthreading cores than it reported to Johan).

A good explanation would be special editions or unlocking 'dubious' features of that chip in the very first batches.

Other manufacturers aren't better there.

So my advice to those interested in this gpu: get one from that first 5000.

I'm betting it works better than the fiat panda's that are in the stores soon :)

Last fiddled with by diep on 2020-01-22 at 20:27
diep is offline   Reply With Quote
Old 2020-01-22, 20:42   #26
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

677 Posts
Default

Please note but this is a personal opinion on gpu's is that if a chip clocks higher than 1 Ghz the manufacturer is doing something wrong. Because if it can clock like 1.5Ghz or whatever above 1 Ghz - they could've equipped the gpu also with more cores (SIMDs). It's better to have 120 SIMDs at 1Ghz than to have 60 at 1.4Ghz - but this is just my 2 cents :)

Of course historically many games profit more from higher GPU clock than from even more cores - which is why they're doing what they do.
diep is offline   Reply With Quote
Old 2020-01-23, 08:11   #27
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

23·41 Posts
Default

Quote:
Originally Posted by diep View Post
So my advice to those interested in this gpu: get one from that first 5000.

I'm betting it works better than the fiat panda's that are in the stores soon :)
Are you still stuck in 2019, or what are these cards that are in the stores "soon" ? Radeon VII is EOL, no more are getting manufactured. Whatever is still on sale is old stock.

Anandtech article from February 2019 clarifying the FP64 performance with direct quotes from AMD :
https://www.anandtech.com/show/13923...n-vii-review/3
So, AMD can limit FP64 performance through vBIOS and drivers, and back then, finally decided upon 1:4 FP64 = 3.46 TFLOPS. So unless you can hack the vBIOS, you're stuck at 1:4. Haven't heard about anyone even trying...

Or, by "fiat panda" do you refer to the Navi cards, that have 1:16 FP64, and have been on the market since July 2019, starting with 5700 XT?
nomead is offline   Reply With Quote
Old 2020-01-23, 08:47   #28
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10001010101002 Posts
Default

I don't see what https://en.wikipedia.org/wiki/Fiat_Panda has to do with gpus. Yield is not 100% to design spec. It's long been SOP for chip manufacturers to test bare dies and sort according to performance. Intel sorts and sells chips with fewer than full core complement functional. The 486SX was 486DX dies with working integer but broken FP. AMD made MI50 chips, and very likely sorted according to performance, selling the well performing ones for big bucks. But they likely don't grind up or throw away the underperforming chips. They would stockpile them until there are enough for making a little profit in the consumer market, in a product called Radeon VII, that outperforms the consumer-grade competition. To have pumped out that volume of lower performance hardware into the server market instead probably would have lowered prices and profits. Making the consumer king gpu is good for the AMD brand too.

Last fiddled with by kriesel on 2020-01-23 at 09:04
kriesel is online now   Reply With Quote
Old 2020-01-23, 08:54   #29
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

7×173 Posts
Default

Quote:
Originally Posted by nomead View Post
Anandtech article from February 2019 clarifying the FP64 performance with direct quotes from AMD :
https://www.anandtech.com/show/13923...n-vii-review/3
So, AMD can limit FP64 performance through vBIOS and drivers, and back then, finally decided upon 1:4 FP64 = 3.46 TFLOPS. So unless you can hack the vBIOS, you're stuck at 1:4. Haven't heard about anyone even trying...
If that would be possible... to double FP64 through a BIOS change, that'd be amazing! I would hope that either somebody finds how to edit the BIOS, or maybe AMD reaches the conclusion, in 2020, that RadeonVII is no longer a threat for the other products (being EOL) and publishes a new BIOS that unlocks the hardware. Otherwise.. it fees silly for us to try so hard for every little 1% of performance improvement, while the hardware stays locked at half-capacity. Let's ask AMD for an Easter gift -- double my GPU through a software update :)
preda is offline   Reply With Quote
Old 2020-01-23, 08:55   #30
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

7·173 Posts
Default

Quote:
Originally Posted by kriesel View Post
I don't see what https://en.wikipedia.org/wiki/Fiat_Panda has to do with gpus.
Must be a methaphor for GPUs..

"What did you need another Fiat Panda for? where are you gonna run it?"

Last fiddled with by preda on 2020-01-23 at 08:58
preda is offline   Reply With Quote
Old 2020-01-23, 09:03   #31
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

2·19·151 Posts
Default

Quote:
Originally Posted by preda View Post
If that would be possible... to double FP64 through a BIOS change, that'd be amazing! I would hope that either somebody finds how to edit the BIOS, or maybe AMD reaches the conclusion, in 2020, that RadeonVII is no longer a threat for the other products (being EOL) and publishes a new BIOS that unlocks the hardware. Otherwise.. it fees silly for us to try so hard for every little 1% of performance improvement, while the hardware stays locked at half-capacity. Let's ask AMD for an Easter gift -- double my GPU through a software update :)
Haha, Good luck with your dreaming.

If you can double the throughput of your existing systems with a simple download then you would have no incentive to buy more of their stuff. At least that is how they will see it.
retina is online now   Reply With Quote
Old 2020-01-23, 09:13   #32
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22·1,109 Posts
Default

Quote:
Originally Posted by preda View Post
If that would be possible... to double FP64 through a BIOS change, that'd be amazing! I would hope that either somebody finds how to edit the BIOS, or maybe AMD reaches the conclusion, in 2020, that RadeonVII is no longer a threat for the other products (being EOL) and publishes a new BIOS that unlocks the hardware. Otherwise.. it fees silly for us to try so hard for every little 1% of performance improvement, while the hardware stays locked at half-capacity. Let's ask AMD for an Easter gift -- double my GPU through a software update :)
Seems unlikely. If it could run at MI50 DP speed, they'd try to get MI50 price for it. But you know we'd take the double, and the x% too, and the next, if we could. The effort needed for finding the next mersenne prime is a STEEP function.

Last fiddled with by kriesel on 2020-01-23 at 09:14
kriesel is online now   Reply With Quote
Old 2020-01-23, 09:36   #33
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

677 Posts
Default

nomead: being EOL would be interesting info as some hardware reviewers on websites i spoke didn't pick up that info yet. Note that the 'losing money on each card'- info is also old info from end 2018.
diep is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
AMD Radeon Pro WX 3200 ET_ GPU Computing 1 2019-07-04 11:02
Radeon Pro Vega II Duo (look at this monster) M344587487 GPU Computing 10 2019-06-18 14:00
What's the best project to run on a Radeon RX 480? jasong GPU Computing 0 2016-11-09 04:32
Radeon Pro Duo 0PolarBearsHere GPU Computing 0 2016-03-15 01:32
AMD Radeon R9 295X2 firejuggler GPU Computing 33 2014-09-03 21:42

All times are UTC. The time now is 14:49.

Sun Sep 27 14:49:46 UTC 2020 up 17 days, 12 hrs, 1 user, load averages: 0.83, 1.22, 1.37

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.