mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2022-04-18, 19:08   #45
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

5×2,347 Posts
Default

@Magellan3s: What about int32 performance vs float32? I'm guessing much of the TF code uses that.

No need for big specs-tables dumps, just the rundown of int32 vs float32 for various GPUs of interest.

We know that float32 has very few bits-of-significance left over for FFT-mul data, not having to throw away 2,3,4 of those on roundoff error by way of an int32-based NTT could very well be a win even if int32 runs, say, half as fast as float32.
ewmayer is offline   Reply With Quote
Old 2022-04-18, 19:41   #46
Magellan3s
 
Mar 2022

61 Posts
Default

Quote:
Originally Posted by ewmayer View Post
@Magellan3s: What about int32 performance vs float32? I'm guessing much of the TF code uses that.

No need for big specs-tables dumps, just the rundown of int32 vs float32 for various GPUs of interest.

We know that float32 has very few bits-of-significance left over for FFT-mul data, not having to throw away 2,3,4 of those on roundoff error by way of an int32-based NTT could very well be a win even if int32 runs, say, half as fast as float32.
"The RTX 3000 series GPUs hold SMs that hold fp32 compute units. Ampere architecture supports parallel execution of FP32 and INT32 operations with independent thread scheduling. That's also described as concurrent execution of FP32 and INT32 operation. "


"GA10X includes FP32 processing on both datapaths, doubling the peak processing rate for FP32 operations.
One datapath in each partition consists of 16 FP32 CUDA Cores capable of executing 16 FP32 operations per clock. Another datapath consists of both 16 FP32 CUDA Cores and 16 INT32 Cores, and is capable of executing either
16 FP32 operations OR 16 INT32 operations per clock. As a result of this new design, each
GA10x SM partition is capable of executing either 32 FP32 operations per clock, or 16 FP32
and 16 INT32 operations per clock. All four SM partitions combined can execute 128 FP32
operations per clock, which is double the FP32 rate of the Turing SM, or 64 FP32 and 64 INT32
operations per clock."


FP32 Compute performance for the 3080 is 30 TFLOPs, 3080ti is 34 TFLOPs and 3090 is 36 TFLOPS

"The RTX 3000 cards are built on an architecture NVIDIA calls "Ampere," and its SM, in some ways, takes both the Pascal and the Turing approach. Ampere keeps the 64 FP32 cores as before, but the 64 other cores are now designated as "FP32 and INT32.” So, half the Ampere cores are dedicated to floating-point, but the other half can perform either floating-point or integer math, just like in Pascal."
Magellan3s is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
does half-precision have any use for GIMPS? ixfd64 GPU Computing 9 2017-08-05 22:12
translating double to single precision? ixfd64 Hardware 5 2012-09-12 05:10
so what GIMPS work can single precision do? ixfd64 Hardware 21 2007-10-16 03:32
New program to test a single factor dsouza123 Programming 6 2004-01-13 03:53
4 checkins in a single calendar month from a single computer Gary Edstrom Lounge 7 2003-01-13 22:35

All times are UTC. The time now is 17:41.


Sat Jun 25 17:41:35 UTC 2022 up 72 days, 15:42, 1 user, load averages: 0.90, 1.33, 1.47

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔