mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2020-10-27, 17:52   #3422
rebirther
 
rebirther's Avatar
 
Sep 2011
Germany

23×113 Posts
Default

Quote:
Originally Posted by kriesel View Post
Examples? The developers have, or the end users have? How do you know it is not working?
The correct mappings work locally, without BOINC involved. Run times for gpuowl tasks are typically long.
OpenCL mapping is platform is zero-based, but device on a platform is not, apparently.
-d01 is the first platform of the first device in lsgpu and mfakto.

(Maybe some of this thread should be moved to a BOINC thread.)
the end users, you can find some infos here
rebirther is offline   Reply With Quote
Old 2020-10-27, 22:01   #3423
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·32·263 Posts
Default

BOINC use of mfaktc on multiple gpus should be straightforward.
The BOINC client needs to be able to interrogate the system, enumerate the OpenCL devices, like the lsgpu utility can and gpuowl does, and create a translation table between its 0-based device numbers and OpenCL platform & device # on platform combination, to use multiple gpus in Mfakto.

A complicating factor is that there's some device identification overloading occurring.
The correct form for first platform first device is 01, but 0, 00, 01, 1 all work to use the same device in mfakto.

Platform numbering is zero based but OpenCL device numbering is not, from what I've seen.

I've also seen inconsistent platform numbering between lsgpu and mfakto on a multiplatform Windows 10 test system, but consistent numbering on a single-platform Windows 7 test system.

Nvidia-smi numbering does not match OpenCL numbering or order on the NVIDIA platform on a Win7 test system, but the CUDA device number order match the OpenCL device number order there. In general, numbering from one context to another is messy.
Quote:
Message 6225 - Posted: 15 Apr 2020, 18:53:48 UTC - in response to Message 6223. Theoretically both applications can support more than one (different) GPU.
But: BOINC enumerates the GPU with 0, 1, 2, ....
In OpenCl you have platforms, e.g. Intel=0, AMD=1, NVidia=2, and for each platform 1..n devices GPU.

A mapping from 0, 1, 2 to 00, 10, 11 is different for each computer with more than one graphics device.

So there is currently only one mapping --device 0 to d 001. (possible)
Code:
lsgpu, derived/modified from https://gist.github.com/CptFoobar/bcb513d87e574e69c2db
2 Platforms found.

Platform 0
1 Device: GeForce GTX 1050 Ti
  1.1 Vendor: NVIDIA Corporation
  1.2 Type: CL_DEVICE_TYPE_GPU
  1.3 Hardware version: OpenCL 1.2 CUDA
  1.4 Software version: 451.67
  1.5 OpenCL version: OpenCL C 1.2
  1.6 Little Endian: Yes
  1.7 Max Clock frequency: 1620 MHz
  1.8 Image support available: Yes
  1.9 Parallel compute units: 6
  1.10 OpenCL Device Availability: Yes
  1.11 OpenCL Compiler Availability: Yes
  1.12 OpenCL Linker Availability: Yes


Platform 1
1 Device: Intel(R) UHD Graphics 630
  1.1 Vendor: Intel(R) Corporation
  1.2 Type: CL_DEVICE_TYPE_GPU
  1.3 Hardware version: OpenCL 2.1 NEO
  1.4 Software version: 23.20.16.4973
  1.5 OpenCL version: OpenCL C 2.1
  1.6 Little Endian: Yes
  1.7 Max Clock frequency: 1100 MHz
  1.8 Image support available: Yes
  1.9 Parallel compute units: 24
  1.10 OpenCL Device Availability: Yes
  1.11 OpenCL Compiler Availability: Yes
  1.12 OpenCL Linker Availability: Yes

2 Device: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
  2.1 Vendor: Intel(R) Corporation
  2.2 Type: CL_DEVICE_TYPE_CPU
  2.3 Hardware version: OpenCL 2.1 (Build 611)
  2.4 Software version: 7.6.0.611
  2.5 OpenCL version: OpenCL C 2.0
  2.6 Little Endian: Yes
  2.7 Max Clock frequency: 2200 MHz
  2.8 Image support available: Yes
  2.9 Parallel compute units: 12
  2.10 OpenCL Device Availability: Yes
  2.11 OpenCL Compiler Availability: Yes
  2.12 OpenCL Linker Availability: Yes

End.
Code:
for %%a in ( 0 1 2 3 4 ) do call mft %%a
for %%a in ( 0 1 2 ) do for %%b in ( 0 1 2 3 4 ) do call mft %%a%%b
Code:
:mft.bat
set dev=%1
echo device spec -d %dev% for following run >>mfakto-x.txt
mfakto -d %dev% >>mfakto-x.txt

:making no assumptions about app behavior, try for 2 platforms 3 total devices, 0-3 for platform, 0-4 for device on a given platform
: -d 0 gave uhd630
: -d 1 gave uhd630
: -d 2 Select device - ERROR: init_CL(3, 2) failed
: -d 3 Select device - ERROR: init_CL(3, 3) failed
: -d 4 Select device - ERROR: init_CL(3, 4) failed
: -d 00 gave uhd630
: -d 01 gave uhd630
: -d 02 Select device - ERROR: init_CL(3, 2) failed
: -d 03 Select device - ERROR: init_CL(3, 3) failed
: -d 04 Select device - ERROR: init_CL(3, 4) failed
: -d 10 gave uhd630
: -d 11 gave gtx1050ti and subsequent Error -5 (Out of resources): clEnqueueReadBuffer RES failed.
: -d 12 Select device - ERROR: init_CL(3, 2) failed
: -d 13 Select device - ERROR: init_CL(3, 3) failed
: -d 14 Select device - ERROR: init_CL(3, 4) failed
: -d 20 gave uhd630
: -d 21 gave uhd630
: -d 22 Select device - ERROR: init_CL(3, 2) failed
: -d 23 Select device - ERROR: init_CL(3, 3) failed
: -d 24 Select device - ERROR: init_CL(3, 4) failed
:
:Note: the mfakto.ini contents were left appropriate for the uhd630 throughout, so the gtx105ti may have had reason to fail
:platform number is not matching lsgpu output
:cpu opencl not encountered
:note this is on Windows 10 Pro X64, i7-8750H with UHD 630 and GTX 1050Ti

: -d 11 was previously uhd630

Last fiddled with by kriesel on 2020-10-27 at 22:02
kriesel is online now   Reply With Quote
Old 2020-10-31, 16:59   #3424
rebirther
 
rebirther's Avatar
 
Sep 2011
Germany

23·113 Posts
Default

Another error with a GTX3080 (NVIDIA GeForce RTX 3080 (4095MB) driver: 456.38 OpenCL: 1.2):

http://srbase.my-firewall.org/sr5/re...ultid=22792002

ERROR: cudaGetLastError() returned 48: no kernel image is available for execution on the device

self compiled by the user:

CUDA version info
binary compiled for CUDA 11.10
CUDA runtime version 11.10
CUDA driver version 11.10

with BOINC:

CUDA version info
binary compiled for CUDA 10.0
CUDA runtime version 10.0
CUDA driver version 11.10

Both are not working, Any help is much appreciated.
rebirther is offline   Reply With Quote
Old 2020-10-31, 17:30   #3425
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

3,187 Posts
Default

Quote:
Originally Posted by rebirther View Post
Another error with a GTX3080
Any help is much appreciated.
You have access to a 3080?
Any and all benchmarks for mfaktc, cudalucas, and gpuowl are highly sought after.

Last fiddled with by James Heinrich on 2020-10-31 at 17:54
James Heinrich is offline   Reply With Quote
Old 2020-10-31, 17:55   #3426
Icecold
 
Oct 2020

22 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
You have access to a 3080?
Any and all benchmarks for mfaktc, cudalucas, and gpuowl are highly sought after.
I have a 3080 and am the user with the SRBase issue Rebirther posted about. I can run any benchmarks that need ran if you can point me in the right direction on how to run them. I'm running Windows on the machine with the 3080.
Icecold is offline   Reply With Quote
Old 2020-10-31, 23:37   #3427
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

2·3·5·37 Posts
Default

Finally got in touch with a RTX 3090:
Code:
mfaktc v0.22-pre8 (64bit built)
[...]
CUDA version info
  binary compiled for CUDA  11.10
  CUDA runtime version      11.10
  CUDA driver version       11.10

CUDA device info
  name                      GeForce RTX 3090
  compute capability        8.6
  max threads per block     1024
  max shared memory per MP  102400 byte
  number of multiprocessors 82
  clock rate (CUDA cores)   1755MHz
  memory clock rate:        9751MHz
  memory bus width:         384 bit
[...]
Starting trial factoring M66362159 from 2^74 to 2^75 (57.65 GHz-days)
 k_min =  142321062303420
 k_max =  284642124610180
Using GPU kernel "barrett76_mul32_gs"
Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
Nov 01 00:21 |    0   0.1% |  0.794  12m41s |   6535.09    82485    n.a.%
Nov 01 00:21 |    4   0.2% |  0.788  12m35s |   6584.85    82485    n.a.%
Nov 01 00:21 |    9   0.3% |  0.776  12m23s |   6686.67    82485    n.a.%
[...]
Nov 01 00:35 | 4617 100.0% |  0.849   0m00s |   6111.73    82485    n.a.%
no factor for M66362159 from 2^74 to 2^75 [mfaktc 0.22-pre8 barrett76_mul32_gs CUDA 11.10 arch 8.0] B29A657C
tf(): total time spent: 13m 28.241s
GPU clock was 1650-1680 MHz once the GPU heated up, at startup it was a bit higher.
Power consumption during the run was about 340-345 Watt as reported by nvidia-smi.

Oliver
TheJudger is offline   Reply With Quote
Old 2020-11-01, 00:40   #3428
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

3,187 Posts
Default

Quote:
Originally Posted by Icecold View Post
I have a 3080 and am the user with the SRBase issue Rebirther posted about. I can run any benchmarks that need ran if you can point me in the right direction on how to run them. I'm running Windows on the machine with the 3080.
If you could, please run a benchmark with mfaktc filling in the form here:
https://www.mersenne.ca/mfaktc.php#benchmark

For both cudalucas and gpuowl, if you could start a primality test on exponent 57885161 and let it run for a few iterations until the iteration time is stable (typically anywhere from 30k-100k iterations) and then email me the output (james@mersenne.ca)
BTW: There will likely be a much more useful benchmark mode for gpuowl in a future version, but that's likely several months away yet, so for now that simple benchmark will suffice for comparison with other results.
James Heinrich is offline   Reply With Quote
Old 2020-11-01, 03:08   #3429
axn
 
axn's Avatar
 
Jun 2003

7×683 Posts
Default

Quote:
Originally Posted by rebirther View Post
ERROR: cudaGetLastError() returned 48: no kernel image is available for execution on the device

self compiled by the user:

CUDA version info
binary compiled for CUDA 11.10
CUDA runtime version 11.10
CUDA driver version 11.10
RTX 3000 series is CC 8.6. Can the user modify makefile to include support for that? Probably best for user to self compile, since I think only CUDA 11 support this.

It is weird that this is not working, though. Normally, the driver would take any available PTX and generate code for the CC dynamically.
axn is online now   Reply With Quote
Old 2020-11-01, 05:57   #3430
Icecold
 
Oct 2020

416 Posts
Default

Quote:
Originally Posted by axn View Post
RTX 3000 series is CC 8.6. Can the user modify makefile to include support for that? Probably best for user to self compile, since I think only CUDA 11 support this.

It is weird that this is not working, though. Normally, the driver would take any available PTX and generate code for the CC dynamically.
User here - Do you mean 86 as the compute code? I had to comment out the other CC versions and add 86 to get it to compile. This is what I had in my makefile to get it to compile(ignore the "CC 5.x GPUs will use this code" I just left that in there when I copy/pasted and commented out the previous line):

# generate code for various compute capabilities
#NVCCFLAGS += --generate-code arch=compute_11,code=sm_11 # CC 1.1, 1.2 and 1.3 GPUs will use this code (1.0 is not possible for mfaktc)
#NVCCFLAGS += --generate-code arch=compute_20,code=sm_20 # CC 2.x GPUs will use this code, one code fits all!
#NVCCFLAGS += --generate-code arch=compute_30,code=sm_30 # all CC 3.x GPUs _COULD_ use this code
#NVCCFLAGS += --generate-code arch=compute_35,code=sm_35 # but CC 3.5 (3.2?) _CAN_ use funnel shift which is useful for mfaktc
#NVCCFLAGS += --generate-code arch=compute_50,code=sm_50 # CC 5.x GPUs will use this code
NVCCFLAGS += --generate-code arch=compute_86,code=sm_86 # CC 5.x GPUs will use this code

Last fiddled with by Icecold on 2020-11-01 at 05:58
Icecold is offline   Reply With Quote
Old 2020-11-01, 06:03   #3431
Icecold
 
Oct 2020

22 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
If you could, please run a benchmark with mfaktc filling in the form here:
https://www.mersenne.ca/mfaktc.php#benchmark

For both cudalucas and gpuowl, if you could start a primality test on exponent 57885161 and let it run for a few iterations until the iteration time is stable (typically anywhere from 30k-100k iterations) and then email me the output (james@mersenne.ca)
BTW: There will likely be a much more useful benchmark mode for gpuowl in a future version, but that's likely several months away yet, so for now that simple benchmark will suffice for comparison with other results.
Will do, I should be able to send over cudalucas and gpuowl in the next couple days, and mfaktc as soon as I can get that working. If anybody thinks this issue is related to something on my PC rather than the new card not being compatible I can try Linux tomorrow just let me know, but this is a fresh install of Windows 10. I appreciate the help so far, and hopefully can provide benchmarks soon.
Icecold is offline   Reply With Quote
Old 2020-11-01, 10:29   #3432
axn
 
axn's Avatar
 
Jun 2003

7·683 Posts
Default

Quote:
Originally Posted by Icecold View Post
User here - Do you mean 86 as the compute code? I had to comment out the other CC versions and add 86 to get it to compile. This is what I had in my makefile to get it to compile(ignore the "CC 5.x GPUs will use this code" I just left that in there when I copy/pasted and commented out the previous line)
Yes, that's what I meant. Did that solve the "no kernel image" issue? I'm assuming you're able to use it as anonymous platform under BOINC
axn is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1657 2020-10-27 01:23
The P-1 factoring CUDA program firejuggler GPU Computing 752 2020-09-08 16:15
"CUDA runtime version 0.0" when running mfaktc.exe froderik GPU Computing 4 2016-10-30 15:29
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51
World's dumbest CUDA program? xilman Programming 1 2009-11-16 10:26

All times are UTC. The time now is 05:53.

Mon Nov 30 05:53:51 UTC 2020 up 81 days, 3:04, 3 users, load averages: 1.05, 1.12, 1.20

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.