![]() |
![]() |
#23 | |
Mar 2022
Earth
2008 Posts |
![]() Quote:
GreenWithEnvy shows Mem Clock Max as 10551 and GPU Clock Max as 2100! https://i.ibb.co/zm8j9HG/Screenshot-...6-09-22-58.png |
|
![]() |
![]() |
![]() |
#24 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
7·13·89 Posts |
![]()
"Memory Transfer Rate, what Nvidia Settings reports and changes, is different from the effective Memory Clock, what is actually being displayed by GWE" https://gitlab.com/leinardi/gwe/-/bl...ease/README.md (emphasis on effective etc mine). Effective memory clock in this context means the clock rate that would occur, if memory data bits were being sent one at a time per physical signal trace, unencoded, without PAM4 encoding etc., which they clearly per GDDR6X specifications, are not.
Last fiddled with by kriesel on 2022-04-26 at 15:00 |
![]() |
![]() |
![]() |
#25 | |
Mar 2022
Earth
12810 Posts |
![]() Quote:
https://i.ibb.co/zm8j9HG/Screenshot-...6-09-22-58.png Last fiddled with by Magellan3s on 2022-04-26 at 22:58 |
|
![]() |
![]() |
![]() |
#26 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
7×13×89 Posts |
![]()
"memory is running at 1188 MHz (19 Gbps effective)." https://www.techpowerup.com/gpu-spec...ti-20-gb.c3831
1188 MHz there is a clock frequency. Hz is a unit of frequency. Gbps is gigabits per second, a unit of data rate. Two totally different things, different names, different units, different definitions. Note also the word "effective", which can be used as marketing speak for "not really, but we're going to claim this very high performance spec anyway." (I can type at xxx,xxx bits per hour. But that's 8 bits per keypress. The keypress frequency is 1/8 the bit rate.) An apple is fruit. An orange slice is fruit. An orange slice is not an apple. No matter how much someone would wish it to be the same, or believe it to be, or how many others they may get to agree or imply that an orange slice is an apple. Tell me you're learning the difference, or trying? Last fiddled with by kriesel on 2022-04-27 at 01:05 |
![]() |
![]() |
![]() |
#27 |
"Curtis"
Feb 2005
Riverside, CA
10111011101002 Posts |
![]()
Ken, you're the one who should be learning in this thread.
The least ambiguous number to cite for speed is the one used and displayed by the drivers. You can complain all you want about how it's not a truthful speed, but those complaints should be aimed at the manufacturer, not at another forumite who is simply adopting the language used by the maker of his card. Why turn to a borderline insult like "tell me you're learning the difference, or trying?" when the OP doesn't care about your personal views? |
![]() |
![]() |
![]() |
#28 |
Sep 2009
3×827 Posts |
![]() |
![]() |
![]() |
![]() |
#29 | |
Mar 2022
Earth
12810 Posts |
![]() Quote:
![]() |
|
![]() |
![]() |
![]() |
#30 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
7×13×89 Posts |
![]()
It gets even worse.
NVIDIA's own utility for querying the card does not return the "effective" number given in their spec sheets, OR the clock frequency given in common third party utilities. As in https://www.seimaxim.com/kb/gpu/nvidia-smi-cheat-sheet (Run as admin in Windows, root on Linux, suggested) Code:
nvidia-smi -q -d SUPPORTED_CLOCKS Code:
==============NVSMI LOG============== Timestamp : Tue Apr 26 19:18:03 2022 Driver Version : 456.71 CUDA Version : 11.1 Attached GPUs : 2 GPU 00000000:01:00.0 Clocks Graphics : 1809 MHz SM : 1809 MHz Memory : 4513 MHz Video : 1620 MHz Applications Clocks Graphics : N/A Memory : N/A Default Applications Clocks Graphics : N/A Memory : N/A Max Clocks Graphics : 1961 MHz SM : 1961 MHz Memory : 5005 MHz Video : 1708 MHz Max Customer Boost Clocks Graphics : N/A SM Clock Samples Duration : 11.27 sec Number of Samples : 100 Max : 1822 MHz Min : 1771 MHz Avg : 1808 MHz Memory Clock Samples Duration : 11.27 sec Number of Samples : 100 Max : 4513 MHz Min : 4513 MHz Avg : 4513 MHz ... GPU 00000000:03:00.0 Clocks Graphics : 1680 MHz SM : 1680 MHz Memory : 6800 MHz Video : 1560 MHz Applications Clocks Graphics : N/A Memory : N/A Default Applications Clocks Graphics : N/A Memory : N/A Max Clocks Graphics : 2100 MHz SM : 2100 MHz Memory : 7000 MHz Video : 1950 MHz Max Customer Boost Clocks Graphics : N/A SM Clock Samples Duration : Not Found Number of Samples : Not Found Max : Not Found Min : Not Found Avg : Not Found Memory Clock Samples Duration : Not Found Number of Samples : Not Found Max : Not Found Min : Not Found Avg : Not Found "Memory Specs: 10 Gbps Memory Speed 8 GB GDDR5X Standard Memory Config 256-bit Memory Interface Width 320 Memory Bandwidth (GB/sec)" I suppose the vague term "memory speed" could mean peak effective bit rate per signal line. Although elsewhere historically it is used for DIMM data rate: https://www.crucial.com/support/memo...-compatability And in https://www.nvidia.com/en-me/geforce...ards/rtx-2080/ NVIDIA gives "Memory Specs: 14 Gbps 14 Gbps Memory Speed 8 GB GDDR6 8 GB GDDR6 Standard Memory Config 256-bit 256-bit Memory Interface Width 448 GB/s 448 GB/s Memory Bandwidth (GB/sec)" I think an issue with the GWE utility could be graphical layout. Putting a clear description of what was being displayed would use more characters than the app appears to have allowed for. Or its author may not have spent the time to burrow into the terminology carefully. It is time consuming. It would not have surprised me if there was a design something like: Software settable memory clock generator A -> distribution at frequency A to the various memory chips -> on-chip doubler (to b) -> doubler (to c) -> RAM clocking->PAM4 output at 2-bit voltage levels at rate c (computed bit rate d = 2c = 4b = 8A). Or maybe A -> distribution -> on-chip 4x PLL to c -> PAM4 output with effective bit rate d. Or it's QDR, clock A & 90 degree phase shifted A', transfers on both rising and falling edges (edge count c=4 A, PAM4 bit rate d= 2 c = 8 A. GPU-Z, MSI Afterburner etc report A; NVIDIA-SMI reports c; various sources (spec sheets, GWE) report d. I had looked for and not found a reasonably recent reference for GPU circuit designs. References I had found are higher level, architectural, and old enough to precede recent GPU memory types. This one is too vague too. Drilling down further, I do find PLLs, in the memory package documentation: The GDDR6 Micron 16gigibit MT61K512M32 spec sheet: Table 2 at bottom of page 7 uses units of GHz on clock signals CK_t, CK_c, WCK_t, WCK_c; 1.5 or 3 GHz respectively; 3, 6 or 12 Gbit/s/"pin" for data signals, on the fine-pitch ball grid array package. Fig 4 on page 8 shows frequency and data rate ratios relative to the clock fundamental, and clock/data alignment in time. Fig 5 shows example clock and interface circuitry. That is perhaps relevant to my RTX2080 and similar generation GPUs. The internal clock indicated as used in the memory chip ("Internal WCK") is 3. GHz. f, 2f & 4f clock signals; 2f, 4f, 8f data etc rates. The GDDR6X Micron 8gigibit MT61K256M32 spec sheet Table 1 page 5, Figure 3 page 6, and Figure 4 page 7 provide analogous data relevant to RTX30xx GPUs (PAM4 mode); 2.5 or 5 GHz clock signals, ~20Gbit/sec PAM4 (voltage-level 2-bit encoded) effective data rate = 10 GBaud/pin. The internal clock indicated as used in the memory chip ("Internal WCK") is 2.5 GHz. Also https://media-www.micron.com/-/media...fe1f9a5ff0231e So in summary, it looks like at every stage other than the memory chip manufacturer, there's loose handling of clock and data signal rate terminology. (I haven't checked, but I'd expect other GDDR6 chip manufacturers to be similarly clear about their products. A quick online search did not yield any other GDDR6X manufacturers. NVIDIA and every stage downstream from there could do better in using precise terminology to limit confusion, in my opinion.) |
![]() |
![]() |
![]() |
#31 |
Romulan Interpreter
"name field"
Jun 2011
Thailand
1035810 Posts |
![]()
Something may still be fishy with that A100 system, crunching 10xx us/iter for a 1xxM exponent.
Does the system do other things in the same time? I am getting 12xx us/iter for a 332M exponent currently crunching, also linux. For the same exponent (332M) the V100 gets 18xx us/iter. Just saying... Not a linux guy. Last fiddled with by LaurV on 2022-05-05 at 15:52 |
![]() |
![]() |
![]() |
#32 | |
"Teal Dulcet"
Jun 2018
Computer Scientist
2038 Posts |
![]() Quote:
This can be seen by looking at the timings on the V100 GPU for the last 14 commits for this same wavefront exponent (values are in us/iter): Code:
v7.2-93-ga5402c5,1236 v7.2-92-g5fb55ca,1235 v7.2-91-g9c22195,1235 v7.2-90-g75d4a1d,1235 v7.2-89-g885b8af,1236 v7.2-88-g599b2b2,1235 v7.2-87-gc1d9e26,1235 v7.2-86-gddf3314,1235 v7.2-85-g6122a0e,1236 v7.2-84-gce4fe12,1236 v7.2-83-g1aff945,1235 v7.2-82-g7bea16f,653 v7.2-81-g5f17913,656 v7.2-80-g9a975f9,656 v7.2-79-g3b4b060,645 Code:
20220430 10:13:23 Tesla V100-SXM2-16GB-0 Exception gpu_error: INVALID_KERNEL_ARGS fftMiddleIn at clwrap.cpp:324 run Code:
,57000991,63000083,67000177,73004279,76000207,84000017,95000011,103246861,113000033,125939521,131000021,144202441,150000029,169000061,187101781,205000013,223000051,247001701,260001727,283000171,295000007,331000037,367000099,403000007,438000131,487001743,509000099,559001657,580001651,650004253,720000049,791000053,861000113,960009689,999999929,1100000017,1138000001,1250000029,1410000023,1690000133,1891000019,1960000019,2147483563 v7.2-93-ga5402c5,646,,742,,841,939,1037,1137,1236,1335,1434,1536,1679,1896,,,,,,,,,,,,,,,7662,8667,,,,,,,,,,,,, v7.2-92-g5fb55ca,645,,741,,841,939,1037,1136,1236,1334,1434,1536,1680,1897,,,,,,,,,,,,,,,7662,8666,,,,,,,,,,,,, v7.2-91-g9c22195,645,,741,,841,939,1037,1136,1235,1335,1434,1536,1680,1897,,,,,,,,,,,,,,,7662,8666,,,,,,,,,,,,, v7.2-90-g75d4a1d,646,,741,,841,939,1037,1136,1236,1335,1434,1536,1679,1897,,,,,,,,,,,,,,,7662,8666,,,,,,,,,,,,, v7.2-89-g885b8af,647,,741,,841,939,1037,1136,1235,1335,1434,1536,1680,1897,,,,,,,,,,,,,,,7663,8666,,,,,,,,,,,,, v7.2-88-g599b2b2,647,,741,,841,939,1037,1136,1235,1335,1434,1536,1679,1896,,,,,,,,,,,,,,,7662,8666,,,,,,,,,,,,, v7.2-87-gc1d9e26,646,,741,,841,939,1037,1136,1235,1335,1434,1536,1679,1896,,,,,,,,,,,,,,,7662,8665,,,,,,,,,,,,, v7.2-86-gddf3314,647,,742,,841,939,1037,1136,1236,1335,1434,1536,1679,1896,,,,,,,,,,,,,,,7662,8666,,,,,,,,,,,,, v7.2-85-g6122a0e,647,,741,,841,939,1037,1136,1235,1335,1434,1536,1679,1896,,,,,,,,,,,,,,,7662,8666,,,,,,,,,,,,, v7.2-84-gce4fe12,645,,741,,841,939,1037,1137,1235,1335,1434,1536,1679,1896,,,,,,,,,,,,,,,7662,8666,,,,,,,,,,,,, v7.2-83-g1aff945,647,,741,,841,939,1037,1136,1235,1335,1434,1536,1679,1896,,,,,,,,,,,,,,,7663,8666,,,,,,,,,,,,, v7.2-82-g7bea16f,347,408,401,470,451,501,546,610,653,706,759,807,852,962,1054,1184,1262,1376,1480,1548,1689,1918,2112,2367,2550,2778,2978,3134,3591,4688,5159,5763,6140,6750,7324,7591,7244,9559,10522,12523,13808,14983,15507 v7.2-81-g5f17913,350,400,405,460,454,505,549,613,656,711,763,810,856,966,1059,1190,1266,1380,1490,1550,1703,1918,2106,2372,2547,2778,2984,3129,3573,4621,5093,5710,6072,6655,7253,7539,7204,9448,10375,12386,13634,14802,15341 v7.2-80-g9a975f9,350,386,405,438,454,505,549,613,656,711,764,809,841,949,1043,1172,1243,1355,1468,1527,1704,1917,2107,2373,2545,2778,2986,3128,3500,3934,4361,4836,5237,5641,6147,6452,7149,8127,9022,10794,11749,12718,13339 v7.2-79-g3b4b060,342,386,395,439,443,495,540,602,645,700,752,798,841,950,1044,1173,1243,1355,1468,1526,1697,1921,2106,2367,2546,2778,2992,3136,3500,3933,4361,4834,5237,5641,6145,6452,7154,8169,9047,10831,11784,12747,13377 |
|
![]() |
![]() |
![]() |
#33 | |
Mar 2022
Earth
27 Posts |
![]()
This test is an ASUS RTX 3090 STRIX
+100 MHX +900 Mhz Memory GPU Owl V6 Code:
jesus@Magallan:~/gpuowl-6$ ./gpuowl -prp 113613007 -iters 30000 2022-05-07 13:12:03 gpuowl 2022-05-07 13:12:03 config: -user Magallanes -cpu Magellan -block 1000 -maxAlloc 23500M 2022-05-07 13:12:03 config: -prp 113613007 -iters 30000 2022-05-07 13:12:03 device 0, unique id '' 2022-05-07 13:12:03 Magellan 113613007 FFT: 6M 1K:12:256 (18.06 bpw) 2022-05-07 13:12:03 Magellan Expected maximum carry32: 4CFA0000 2022-05-07 13:12:03 Magellan OpenCL args "-DEXP=113613007u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=12u -DPM1=0 -DMM2_CHAIN=1u -DMAX_ACCURACY=1 -DWEIGHT_STEP_MINUS_1=0x1.d7719ff404155p-1 -DIWEIGHT_STEP_MINUS_1=-0x1.eae2bbc5c8218p-2 -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only " 2022-05-07 13:12:03 Magellan 2022-05-07 13:12:03 Magellan OpenCL compilation in 0.00 s 2022-05-07 13:12:06 Magellan 113613007 OK 0 loaded: blockSize 1000, 0000000000000003 2022-05-07 13:12:06 Magellan validating proof residues for power 8 2022-05-07 13:12:06 Magellan Proof using power 8 2022-05-07 13:12:13 Magellan 113613007 OK 2000 0.00%; 2145 us/it; ETA 2d 19:42; 0f1a44508c206809 (check 2.18s) 2022-05-07 13:13:12 Magellan Stopping, please wait.. 2022-05-07 13:13:14 Magellan 113613007 OK 30000 0.03%; 2124 us/it; ETA 2d 19:01; 32d4895e2a4b9a36 (check 2.18s) 2022-05-07 13:13:14 Magellan Exiting because "stop requested" 2022-05-07 13:13:14 Magellan Bye jesus@Magallan:~/gpuowl-6$ GPU Owl Newest Version Code:
jesus@Magallan:~/gpuowl-master$ ./gpuowl -prp 113613007 -iters 30000 20220507 13:08:25 GpuOwl VERSION 20220507 13:08:25 GpuOwl VERSION 20220507 13:08:25 config: -user Magallanes -cpu Magellan -block 1000 -maxAlloc 23500M 20220507 13:08:25 config: -prp 113613007 -iters 30000 20220507 13:08:25 device 0, unique id '' 20220507 13:08:25 Magellan 113613007 FFT: 6M 1K:12:256 (18.06 bpw) 20220507 13:08:25 Magellan 113613007 OpenCL args "-DEXP=113613007u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=12u -DMM2_CHAIN=1u -DMAX_ACCURACY=1 -DWEIGHT_STEP=0.92078876355848627 -DIWEIGHT_STEP=-0.47938054461158819 -DIWEIGHTS={0,-0.45791076534214703,-0.41227852333612641,-0.36280502904659512,-0.30916693173607174,-0.25101366149694165,-0.18796513798337899,-0.11960928626782931,} -DFWEIGHTS={0,0.84471473710626932,0.70148623064852611,0.56937836233036643,0.44752769654326469,0.33513783709142603,0.23147422207537149,0.13585932291445821,} -cl-std=CL2.0 -cl-finite-math-only " 20220507 13:08:26 Magellan 113613007 20220507 13:08:26 Magellan 113613007 OpenCL compilation in 0.67 s 20220507 13:08:26 Magellan 113613007 maxAlloc: 22.9 GB 20220507 13:08:26 Magellan 113613007 P1(0) 0 bits 20220507 13:08:26 Magellan 113613007 PRP starting from beginning 20220507 13:08:28 Magellan 113613007 OK 0 on-load: blockSize 1000, 0000000000000003 20220507 13:08:28 Magellan 113613007 validating proof residues for power 8 20220507 13:08:28 Magellan 113613007 Proof using power 8 20220507 13:08:35 Magellan 113613007 OK 2000 0.00% 0f1a44508c206809 2130 us/it + check 2.16s + save 0.11s; ETA 2d 19:12 20220507 13:08:52 Magellan 113613007 10000 28f5eefd6236e274 2174 20220507 13:09:14 Magellan 113613007 20000 d556e5c56bf104e0 2164 20220507 13:09:36 Magellan 113613007 Stopping, please wait.. 20220507 13:09:38 Magellan 113613007 OK 30000 0.03% 32d4895e2a4b9a36 2175 us/it + check 2.20s + save 0.11s; ETA 2d 20:38 20220507 13:09:38 Magellan Exiting because "stop requested" 20220507 13:09:38 Magellan Bye jesus@Magallan:~/gpuowl-master$ Quote:
For anyone reading this post or interested: 3080TI vs 3090 performance for PRP = about the same |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Benchmarks | Pjetrode | Information & Answers | 3 | 2018-01-07 23:23 |
RPS benchmarks | pinhodecarlos | Riesel Prime Search | 29 | 2014-12-07 07:13 |
GPU Benchmarks | houding | Hardware | 7 | 2014-07-09 10:48 |
LLR benchmarks | Retep | Riesel Prime Search | 4 | 2008-11-06 22:15 |
Benchmarks | Vandy | Hardware | 6 | 2002-10-28 13:45 |