mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2021-10-22, 11:54   #3510
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

30E16 Posts
Default

The best way to look at such things from back then is the total unknown factor. Even very good programmers, amongst the best in the world, had zero experience in programming for a GPU. CUDA wasn't yet revealed, that was a few years later - and even when the first gpu's were there with CUDA - we all were very sceptic.

I remember a chat i had there with someone and the chat was total different than you guess.

We saw a guy who was absolute joke in our viewpoint - yet he already managed to get quite good performance out of a gpu. So in absolute ways - we speak now 2007 - he managed to beat a cpu convincingly. And everyone here realized that some years further the manycores would easily scale further - even if it was just because of a higher power envelope. A cpu having like 60-90 watt max in those days versus a gpu could easily use 375 watt. 375 watt wins always then of course in performance from 60 watt. Very trivial. Even today.

Yet the discussion i had was a commercial one. "is anyone ever gonna pay for a gpu program?"

And we both agreed upon the answer: "No customer i know of will ever pay for that and if one would - you can't live from just 1 customer".

So that's why my gpu coding back then was very limited other than in trying to get a feeling for it. That it has huge potential we all know.

Right now i would argue cpu's have won back enough terrain. Maybe in matrixcalculations you'd prefer a gpu still - yet anywhere else i'd argue cpu's have made up enough terrain thanks to AMD's genius concept of creating a 8 socket box onto a single chip with the CCD concept.

The highend gpu's are simply too expensive now compared to cpu's and your average algorithm won't work at a gpu or you won't find a programmer who wants to do the job cheap.

For example i studied how to do FFT (or better NTT) at a gamers gpu using integers. Now in itself that's entirely possible to program. Yet it's at some specific sort of loss to program such a NTT (with integers we'd call it NTT instead) to test prime numbers. Then furthermore we have a specific percentage you can get out of a gpu.

Nvidia whereas they are great to program for with cuda, they do not have a good trackrecord there. Something that can deliver 5 tflops fp64 so to speak, if you can get out of it 1.25 Tflops fp64, you are a mighty hell of a programmer.

And i'm ignoring here the manufacturers definition. Just looking at how many instructions you manage to execute without double counting any sort of instruction. So a multiply-add is a single instruction in this count.

Todays fastest Nvidia can deliver 10 Tflops fp64 (manufacturers definition), which is 5 Tflops in terms of number of instructions executed. And that card costs oh what is it, 10k dollar or so - if you can get it for that price at home that is. Import taxes - you name it.

Now compare this with a cpu that delivers 5 Tflops double precision. That's 2.5 Tflops in terms of instructions, and then look at how efficiently Woltman executes at it. Then compare the 2 prices of the hardware. CPU wins.
diep is offline   Reply With Quote
Old 2021-11-22, 17:08   #3511
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009

37258 Posts
Default Error Message

When I try to run the latest version of mfaktc, I get the message below:

Code:
ERROR: cudaGetLastError() returned 209: no kernel image is available for execution on the device
Basics of system:

Intel i7-7700
16 GB RAM
Windows 10 Professional v21H1
Zotac RTX 2080 Amp Extreme

The driver set does not appear to be original Nvidia despite having a system setting not to download and install hardware drivers as part of an automatic update. Recently, a system update from v2004 to v21H1 was done.

Any ideas?
storm5510 is offline   Reply With Quote
Old 2021-11-22, 17:22   #3512
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

22·29·31 Posts
Default

Quote:
Originally Posted by storm5510 View Post
When I try to run the latest version of mfaktc, I get the message below
There are many versions of mfaktc, not all of them for the same hardware. Which specific version did you download?
James Heinrich is offline   Reply With Quote
Old 2021-11-22, 17:26   #3513
rebirther
 
rebirther's Avatar
 
Sep 2011
Germany

BEB16 Posts
Default

Quote:
Originally Posted by storm5510 View Post
When I try to run the latest version of mfaktc, I get the message below:

Code:
ERROR: cudaGetLastError() returned 209: no kernel image is available for execution on the device
Any ideas?
try this for more infos:

mfaktc.exe -st
rebirther is offline   Reply With Quote
Old 2021-11-22, 17:54   #3514
kruoli
 
kruoli's Avatar
 
"Oliver"
Sep 2017
Porta Westfalica, DE

2·5·83 Posts
Default

Have you tried this or this version?

Just to check if they run. I am not sure which version (2047 or not) is optimal for your use case.
kruoli is online now   Reply With Quote
Old 2021-11-22, 18:02   #3515
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

137118 Posts
Default

Quote:
Originally Posted by storm5510 View Post
When I try to run the latest version of mfaktc, I get the message below:

Code:
ERROR: cudaGetLastError() returned 209: no kernel image is available for execution on the device
Basics of system:

Intel i7-7700
16 GB RAM
Windows 10 Professional v21H1
Zotac RTX 2080 Amp Extreme

The driver set does not appear to be original Nvidia despite having a system setting not to download and install hardware drivers as part of an automatic update. Recently, a system update from v2004 to v21H1 was done.

Any ideas?
Win10 pro should not be an issue. Some older OS versions (Vista, XP) can be due to limits on driver availability for newer GPUs. Or occasionally an old CPU is not compatible with a new driver, or any driver for a newer GPU; Core2Duo & RX550 for Windows Vista or 7 is such a case in my experience, while the RX550 can be run on Win7 or Win10 on newer CPUs (i7-4790 etc).
It might be a version of mfaktc not meant for the minimum CUDA level required to support the rather new GPU. As I recall, I needed a CUDA10 compatible version for RTX2080 or GTX1650, and CUDA8 compatible version for GTX10xx.
"Generally you will need to ensure that the GPU, driver version, any library files required, and application software are mutually compatible. Otherwise there will be errors. The particulars vary by application, OS, & GPU model. (Avoid mixing very old and new cards in the same system. They can have mutually incompatible requirements.)

If using NVIDIA and CUDA based applications, see https://en.wikipedia.org/wiki/CUDA#GPUs_supported, and note that the latest CUDA version or driver is not always the best performance for a given GPU, and eventually as versions progress, may not even be compatible with a given gpu."
https://www.mersenneforum.org/showpo...89&postcount=1
kriesel is offline   Reply With Quote
Old 2021-11-22, 18:04   #3516
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009

37258 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
There are many versions of mfaktc, not all of them for the same hardware. Which specific version did you download?
On your site, right at the top of the list.

https://download.mersenne.ca/mfaktc/...a11.2-2047.zip

For the past few years, only the 2047's would run on any of my Windows 10 setups. It was never a problem until now. Doing a driver update made no difference. This did not cause any other issues with the system.

Last fiddled with by storm5510 on 2021-11-22 at 18:06 Reason: Additional.
storm5510 is offline   Reply With Quote
Old 2021-11-22, 18:20   #3517
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

6,089 Posts
Default

Run mfaktc with the -info option. Output should look something like the following, which works. Check the bolded part below. (Underlined stuff can vary.) Mismatch between runtime version (dll) and binary (.exe) version is fatal. Too low a driver version is fatal, but a little higher than required for the exe is not.

Code:
mfaktc v0.21 (64bit built)

Compiletime options
  THREADS_PER_BLOCK         256
  SIEVE_SIZE_LIMIT          32kiB
  SIEVE_SIZE                193154bits
  SIEVE_SPLIT               250
  MORE_CLASSES              enabled

Runtime options
  SievePrimes               25000
  SievePrimesAdjust         1
  SievePrimesMin            5000
  SievePrimesMax            100000
  NumStreams                3
  CPUStreams                3
  GridSize                  3
  GPUSievePrimes            106000
  GPUSieveSize              2047Mi bits
  GPUSieveProcessSize       32Ki bits
  Checkpoints               enabled
  CheckpointDelay           600s
  WorkFileAddDelay          3600s
  Stages                    enabled
  StopAfterFactor           bitlevel
  PrintMode                 full
  V5UserID                  Kriesel
  ComputerID                asr3-rtx2080
  ProgressHeader            "Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait"
  ProgressFormat            "%d %T | %C %p%% | %t  %e |   %g  %s  %W%%"
  AllowSleep                no
  TimeStampInResults        yes

CUDA version info
  binary compiled for CUDA  10.0
  CUDA runtime version      10.0
  CUDA driver version       11.10

CUDA device info
  name                      GeForce RTX 2080
  compute capability        7.5
  max threads per block     1024
  max shared memory per MP  65536 byte
  number of multiprocessors 46
  clock rate (CUDA cores)   1710MHz
  memory clock rate:        7000MHz
  memory bus width:         256 bit

Automatic parameters
  threads per grid          753664
  random selftest offset    31788
  GPUSievePrimes (adjusted) 106038
  GPUsieve minimum exponent 1385094
kriesel is offline   Reply With Quote
Old 2021-11-23, 18:44   #3518
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009

5×401 Posts
Default

Below is the results of mfaktc -info:

Code:
mfaktc v0.21 (64bit built)

Compiletime options
  THREADS_PER_BLOCK             256
  SIEVE_SIZE_LIMIT          	32kiB
  SIEVE_SIZE                	193154bits
  SIEVE_SPLIT               	250
  MORE_CLASSES              	enabled

Runtime options
  SievePrimes               	25000
  SievePrimesAdjust         	1
  SievePrimesMin            	2000
  SievePrimesMax            	100000
  NumStreams                	3
  CPUStreams                	3
  GridSize                  	3
  GPU Sieving               	enabled
  GPUSievePrimes            	80000
  GPUSieveSize              	64Mi bits
  GPUSieveProcessSize       	16Ki bits
  Checkpoints               	enabled
  CheckpointDelay           	30s
  WorkFileAddDelay          	disabled
  Stages                    	disabled
  StopAfterFactor           	disabled
  PrintMode                 	full
  V5UserID                  	storm5510
  ComputerID                	3570_Ivy_Bridge
  AllowSleep                	no
  TimeStampInResults            no

CUDA version info
  binary compiled for CUDA  	11.20
  CUDA runtime version      	11.20
  CUDA driver version       	11.50

CUDA device info
  name                      	NVIDIA GeForce RTX 2080
  compute capability        	7.5
  max threads per block     	1024
  max shared memory per MP  	65536 byte
  number of multiprocessors 	46
  clock rate (CUDA cores)   	1830MHz
  memory clock rate:        	7000MHz
  memory bus width:         	256 bit

Automatic parameters
  threads per grid          	753664
  GPUSievePrimes (adjusted) 	80182
  GPUsieve minimum exponent     1022822

running a simple selftest...
ERROR: cudaGetLastError() returned 209: no kernel image is available for execution on the device
There are differences. I have not seen any new documentation indicating what these should be. I am using an older configuration file from several versions back because of the output screen formatting I created. It was a major pain to get it the way I wanted it.

Edit: The CUDA driver differences may be because of the recent update I did. I matched the settings in my config with those in the archive config. No difference.

Last fiddled with by storm5510 on 2021-11-23 at 18:54 Reason: Additional
storm5510 is offline   Reply With Quote
Old 2021-11-23, 19:10   #3519
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

6,089 Posts
Default

Quote:
Originally Posted by storm5510 View Post
Below is the results of mfaktc -info:
...

CUDA version info
binary compiled for CUDA 11.20
CUDA runtime version 11.20
CUDA driver version 11.50

CUDA device info
name NVIDIA GeForce RTX 2080
compute capability 7.5

...

running a simple selftest...
ERROR: cudaGetLastError() returned 209: no kernel image is available for execution on the device
[/CODE]There are differences. I have not seen any new documentation indicating what these should be. I am using an older configuration file from several versions back because of the output screen formatting I created. It was a major pain to get it the way I wanted it.

Edit: The CUDA driver differences may be because of the recent update I did. I matched the settings in my config with those in the archive config. No difference.
Try an explicitly CUDA10.x build of mfaktc. It's possible the CUDA11.2 mfaktc build (CC 8.6) did not include the optional downlevel compute capability levels such as CC 7.5. See https://www.mersenneforum.org/showthread.php?t=26798 for how that's done by whoever does the compile.
https://en.wikipedia.org/wiki/CUDA
If that doesn't solve it, maybe show us GPU-Z card pane and Device Manager properties for the GPU. Good luck.

Last fiddled with by kriesel on 2021-11-23 at 19:13
kriesel is offline   Reply With Quote
Old 2021-11-23, 23:45   #3520
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009

7D516 Posts
Default

Quote:
Originally Posted by kriesel View Post
Try an explicitly CUDA10.x build of mfaktc. It's possible the CUDA11.2 mfaktc build (CC 8.6) did not include the optional downlevel compute capability levels such as CC 7.5...
I did and it runs fine. I believe this is the one I ran before, but lost due to a drive problem. Not sure though.

Thank you all for your assistance.
storm5510 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 40 2021-12-27 12:45
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1680 2021-09-13 17:01
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 17:43.


Mon Jan 17 17:43:27 UTC 2022 up 178 days, 12:12, 0 users, load averages: 1.50, 1.56, 1.47

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔