mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Blogorrhea > kriesel

Closed Thread
 
Thread Tools
Old 2018-06-26, 15:20   #1
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

163178 Posts
Default gpu-specific reference material

This thread is intended to hold only reference material specifically for gpus, specifically discrete gpus, which are typically in PCIe physical format.
(Suggestions are welcome. Discussion posts in this thread are not encouraged. Please use the reference material discussion thread http://www.mersenneforum.org/showthread.php?t=23383. Off-topic posts may be moved or removed, to keep the reference threads clean, tidy, and useful.)

Table of contents
  1. This post
  2. gpu temperature limits http://www.mersenneforum.org/showpos...11&postcount=2
  3. gpu tf and ll benchmarks, ratios, and SP:DP ratios http://www.mersenneforum.org/showpos...12&postcount=3
  4. NVIDIA gpu model, compute capability level, CUDA level, OS versions, and driver version http://www.mersenneforum.org/showpos...15&postcount=4
  5. KaBo's post about configuring system for power efficiency https://www.mersenneforum.org/showpo...66&postcount=1
  6. How many gpus can go in one system? https://www.mersenneforum.org/showpo...79&postcount=5
  7. etc tbd

Last fiddled with by kriesel on 2020-07-10 at 02:30
kriesel is offline  
Old 2018-06-26, 15:21   #2
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

53·59 Posts
Default gpu temperature limits

These were gathered mostly from NVIDIA spec sheets.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf gpu temp limits.pdf (10.4 KB, 359 views)

Last fiddled with by kriesel on 2019-11-18 at 13:58 Reason: added links and gpu models
kriesel is offline  
Old 2018-06-26, 15:23   #3
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

53×59 Posts
Default gpu tf and ll benchmarks, ratios, and SP:DP ratios

These are from James Heinrich's benchmarks pages, and GPU-Z output (some from henryzz or kladner)
Techpowerup has a terrific gpu database at https://www.techpowerup.com/gpu-specs/


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf tf ll ghzd and ratios vs gpu model.pdf (11.2 KB, 349 views)

Last fiddled with by kriesel on 2019-11-18 at 13:59 Reason: added gpus and formatting
kriesel is offline  
Old 2018-06-26, 15:40   #4
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

53×59 Posts
Default NVIDIA gpu model, compute capability level, CUDA level, OS versions, and driver version

The attached small table is organized by GPU model. There's a big table by compute capability level at https://en.wikipedia.org/wiki/CUDA#GPUs_supported which will show what CUDA level provides what compute capability level, and what compute capability level is required for given NVIDIA GPU models.
The practical effect is there's a minimum driver version at which a given new compute level is supported, and eventually a maximum driver version beyond which support for an old compute capability level is dropped. Attempts to install, compile, run, debug, etc, that are outside those limits on driver version will fail.

This has restrictive consequences for systems which contain both old and new GPUs. There are some GPUs (Pascal arch.; GTX10xx) that require CUDA8 capable drivers, others (Volta arch.) CUDA 9.x, others (Turing arch.; RTX20XX etc) CUDA 10. CUDA 6.5 was the last to support compute capability 1.x GPUs (Tesla arch.). CUDA 8 is the last version to support compute capability 2.x GPUs (Fermi arch.; GTX4xx, Quadro 2000, Quadro 4000, Quadro 5000, Tesla C2075, etc).
NVIDIA allows only one NVIDIA driver installed per system. Therefore, GTX4xx (CUDA <9) and RTX20xx (CUDA 10 minimum) can not be run in the same system at the same time.

Operating system is also a consideration. Old operating systems are not supported by new driver releases, which come out to support new GPU models, and fix problems. Older GPU models get dropped also from newer driver releases.

Second table attached shows relationship between driver release numbers and maximum supported CUDA levels and Windows OS versions. It also includes approximate release dates versus driver version. Note there are many more releases than listed.

Note, the highest compatible CUDA level dll, driver, executable, etc, is not necessarily the highest performance for a given GPU.
Also, it's possible that some indicated as Win10 max might run with Win11.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf nvidia compute compatibility level vs gpu.pdf (10.3 KB, 348 views)
File Type: pdf driver and cuda level compatibility.pdf (13.3 KB, 340 views)

Last fiddled with by kriesel on 2022-11-03 at 17:07
kriesel is offline  
Old 2019-11-03, 18:02   #5
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

1CCF16 Posts
Default How many gpus can go in one system?

How many GPUs can be run on one system?
I've run up to 4 NVIDIA or 4 AMD on Windows 7 or 10 Pro, on workstation systems intended to hold 1 or 2, or up to 5 AMD or 6 mixed on 6-slot motherboards. Available power in the chassis power supply and physical space are usually the limit in these. (Over time in some cases the workstations degraded to only being stable on 3 GPUs, and then 2. I suspect power supply aging.)
But in carefully selected gear, one could go much higher. Cryptocoin miners routinely do so. See for example https://www.cryptocurrencyfreak.com/...thereum-zcash/ If I recall correctly, SELROC has some experience in high-gpu-count systems.

Any of the following may be factors.
  1. AC input amperage available; remember that circuit breakers respond to amps, and both power factor and power supply efficiency may be significantly less than unity.
  2. UPSes become very costly above about 1.5KVA or 1KW, so to push the limits on GPU count and total throughput of the system, there will probably not be a UPS involved.
  3. Power supply output capacity. (Some people use multiple power supplies as a workaround.)
  4. Available heat rejection capacity of the chassis, or the room. All that power turns into heat that must be removed somehow while keeping the electronics cool enough to function efficiently and reliably over an acceptably long lifetime.
  5. Number and type of power supply auxiliary connectors
  6. Number of available PCIe slots on the motherboard, varies up to ~19 (although it may be possible to increase the count with 1-to-3 or 1-to-4 PCIe extenders)
  7. Space between slots; double or triple wide cards won't allow use of every PCIe connector on the motherboard, unless you use extender cables in most or all of them.
  8. Other geometric considerations such as possible interference between GPU installed in a PCIe slot and various motherboard components; capacitors, connectors, cabling etc. A short riser may be enough to address these.
  9. Interference between a GPU and the case cover or PCIe device hold-down component, either separate or attached to the cover. Sometimes the interference can be resolved by removing or repositioning the hold-down.
  10. Number of GPUs supported by the motherboard BIOS
  11. Number of GPUs supported by the OS (Windows is variously described as limited to 32 in Windows 7, 12 in Windows 10. https://answers.microsoft.com/en-us/...8-ad37dfa8dec6 https://www.aurelp.com/tag/how-many-...ws-10-support/)
  12. Number of GPUs supported by the NVIDIA driver for the OS (reportedly at least 10 for Windows 10 https://devtalk.nvidia.com/default/t...n-windows-10/2)
  13. Number of GPUs supported by the AMD driver for the OS (reportedly at least 12 for Windows 10; 16 on ROCm & Linux https://community.amd.com/thread/197524)
  14. Whether the BIOS deactivates any (usually Intel) IGP present when at least one discrete GPU is installed. Some automatically disable onboard video when a discrete GPU is present.
  15. Some motherboards may require user configuration to deactivate the onboard video to support use of all PCIe slots for discrete GPUs.
  16. Any issues encountered in attempting to get NVIDIA and AMD and Intel GPU drivers to coexist. I've observed issues with AMD cards from having an NVIDIA card present, or from installing an additional NVIDIA card after AMD and NVIDIA were coexisting, and Intel IGP OpenCL capability go away after installing drivers or SDK for NVIDIA or AMD. NVIDIA GPU and Intel IGP use can work together. AMD/NVIDIA coexistence issues may be resolvable by a particular sequence of GPU physical install and driver install operations.
  17. How many may be productively used simultaneously or how they are used can be constrained by system and OS considerations. For example, some GPUs have 16GB GPU ram. P-1 stage 2 run on them may result in virtual size per application instance of 15-16GB. Other applications and the OS also have virtual size. All virtual sizes summed at any moment must fit within the available virtual memory of the machine (system ram plus page or swap file). The symptoms of hitting the virtual memory limit can look like system or GPU instability. Since the maximum page file size in Windows is triple the system ram, a Windows system with 16GB ram can have no more than 48GB page file, 64GB virtual memory space. Apparently when stage 2 P-1 or GCDs of enough application instances coincide, virtual memory can be exhausted. Memory allocation in some application instance fails, causing the application or OS to crash or hang and potentially leaving the GPU or OS in a state that may require an OS restart to clear. A 5-GPU plus prime95 system with 16GB max ram, 48GB page file had frequent issues with more than 3 GPU P-1 instances coinciding, or more than 2 plus prime95. Resource exhaustion gets logged in the system event log, and so do application crashes as a result. It can take hours of first-test-wavefront P-1 activity to reach the point of enough stage 2 instances coinciding to trigger the issue. Linux swap file size is not constrained to a small multiple of physical system ram. But performance will be constrained at some point. The issue can be avoided on Windows by indirectly limiting the virtual size of gpuowl instances, by reducing the -maxAlloc parameter value for each gpuowl instance somewhat. Or by increasing installed system ram if practical. Trimming -maxAlloc by 1 or 2 GB per GPU instance resolved my situation without too much impact on GPU throughput. Increasing system ram was not possible because the board was already at maximum at 16 GB of DDR3.
  18. Limits to power handling capabilities of the on-motherboard voltage regulator components or circuit traces, which in some cases are less than the nominally supported hardware. I lost two motherboards, and in the first round of that two fast GPUs, before determining that mfakto on the IGP plus prime95 on the CPU plus the PCIe extender loads of 5 GPUs would ignite sparks and flame in a particular location on a particular Asrock motherboard model, killing the motherboard and 2 of the 5 GPUs, attached to alternate PCIe slots.
  19. Limits to the user's expertise and patience in dealing with system fragility & troubleshooting when multiple GPU manufacturers, multiple GPU models and multiple drivers are involved.
  20. Hardware budget; ongoing utility cost budget
Are there more?
See also http://snucl.snu.ac.kr/ regarding OpenCL across a cluster of systems, etc.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2023-01-02 at 18:44 Reason: updates
kriesel is offline  
Closed Thread

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
gpuOwL-specific reference material kriesel kriesel 32 2022-08-07 17:06
clLucas-specific reference material kriesel kriesel 5 2021-11-15 15:43
Mfakto-specific reference material kriesel kriesel 5 2020-07-02 01:30
CUDALucas-specific reference material kriesel kriesel 9 2020-05-28 23:32
CUDAPm1-specific reference material kriesel kriesel 12 2019-08-12 15:51

All times are UTC. The time now is 04:34.


Wed Feb 8 04:34:34 UTC 2023 up 174 days, 2:03, 1 user, load averages: 0.99, 0.86, 0.78

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔