mersenneforum.org mfaktc: a CUDA program for Mersenne prefactoring
 Register FAQ Search Today's Posts Mark Forums Read

2020-10-27, 17:52   #3422
rebirther

Sep 2011
Germany

2·1,489 Posts

Quote:
 Originally Posted by kriesel Examples? The developers have, or the end users have? How do you know it is not working? The correct mappings work locally, without BOINC involved. Run times for gpuowl tasks are typically long. OpenCL mapping is platform is zero-based, but device on a platform is not, apparently. -d01 is the first platform of the first device in lsgpu and mfakto. (Maybe some of this thread should be moved to a BOINC thread.)
the end users, you can find some infos here

2020-10-27, 22:01   #3423
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·3·5·193 Posts

BOINC use of mfaktc on multiple gpus should be straightforward.
The BOINC client needs to be able to interrogate the system, enumerate the OpenCL devices, like the lsgpu utility can and gpuowl does, and create a translation table between its 0-based device numbers and OpenCL platform & device # on platform combination, to use multiple gpus in Mfakto.

The correct form for first platform first device is 01, but 0, 00, 01, 1 all work to use the same device in mfakto.

Platform numbering is zero based but OpenCL device numbering is not, from what I've seen.

I've also seen inconsistent platform numbering between lsgpu and mfakto on a multiplatform Windows 10 test system, but consistent numbering on a single-platform Windows 7 test system.

Nvidia-smi numbering does not match OpenCL numbering or order on the NVIDIA platform on a Win7 test system, but the CUDA device number order match the OpenCL device number order there. In general, numbering from one context to another is messy.
Quote:
 Message 6225 - Posted: 15 Apr 2020, 18:53:48 UTC - in response to Message 6223. Theoretically both applications can support more than one (different) GPU. But: BOINC enumerates the GPU with 0, 1, 2, .... In OpenCl you have platforms, e.g. Intel=0, AMD=1, NVidia=2, and for each platform 1..n devices GPU. A mapping from 0, 1, 2 to 00, 10, 11 is different for each computer with more than one graphics device. So there is currently only one mapping --device 0 to d 001. (possible)
Code:
lsgpu, derived/modified from https://gist.github.com/CptFoobar/bcb513d87e574e69c2db
2 Platforms found.

Platform 0
1 Device: GeForce GTX 1050 Ti
1.1 Vendor: NVIDIA Corporation
1.2 Type: CL_DEVICE_TYPE_GPU
1.3 Hardware version: OpenCL 1.2 CUDA
1.4 Software version: 451.67
1.5 OpenCL version: OpenCL C 1.2
1.6 Little Endian: Yes
1.7 Max Clock frequency: 1620 MHz
1.8 Image support available: Yes
1.9 Parallel compute units: 6
1.10 OpenCL Device Availability: Yes
1.11 OpenCL Compiler Availability: Yes

Platform 1
1 Device: Intel(R) UHD Graphics 630
1.1 Vendor: Intel(R) Corporation
1.2 Type: CL_DEVICE_TYPE_GPU
1.3 Hardware version: OpenCL 2.1 NEO
1.4 Software version: 23.20.16.4973
1.5 OpenCL version: OpenCL C 2.1
1.6 Little Endian: Yes
1.7 Max Clock frequency: 1100 MHz
1.8 Image support available: Yes
1.9 Parallel compute units: 24
1.10 OpenCL Device Availability: Yes
1.11 OpenCL Compiler Availability: Yes

2 Device: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
2.1 Vendor: Intel(R) Corporation
2.2 Type: CL_DEVICE_TYPE_CPU
2.3 Hardware version: OpenCL 2.1 (Build 611)
2.4 Software version: 7.6.0.611
2.5 OpenCL version: OpenCL C 2.0
2.6 Little Endian: Yes
2.7 Max Clock frequency: 2200 MHz
2.8 Image support available: Yes
2.9 Parallel compute units: 12
2.10 OpenCL Device Availability: Yes
2.11 OpenCL Compiler Availability: Yes

End.
Code:
for %%a in ( 0 1 2 3 4 ) do call mft %%a
for %%a in ( 0 1 2 ) do for %%b in ( 0 1 2 3 4 ) do call mft %%a%%b
Code:
:mft.bat
set dev=%1
echo device spec -d %dev% for following run >>mfakto-x.txt
mfakto -d %dev% >>mfakto-x.txt

:making no assumptions about app behavior, try for 2 platforms 3 total devices, 0-3 for platform, 0-4 for device on a given platform
: -d 0 gave uhd630
: -d 1 gave uhd630
: -d 2 Select device - ERROR: init_CL(3, 2) failed
: -d 3 Select device - ERROR: init_CL(3, 3) failed
: -d 4 Select device - ERROR: init_CL(3, 4) failed
: -d 00 gave uhd630
: -d 01 gave uhd630
: -d 02 Select device - ERROR: init_CL(3, 2) failed
: -d 03 Select device - ERROR: init_CL(3, 3) failed
: -d 04 Select device - ERROR: init_CL(3, 4) failed
: -d 10 gave uhd630
: -d 11 gave gtx1050ti and subsequent Error -5 (Out of resources): clEnqueueReadBuffer RES failed.
: -d 12 Select device - ERROR: init_CL(3, 2) failed
: -d 13 Select device - ERROR: init_CL(3, 3) failed
: -d 14 Select device - ERROR: init_CL(3, 4) failed
: -d 20 gave uhd630
: -d 21 gave uhd630
: -d 22 Select device - ERROR: init_CL(3, 2) failed
: -d 23 Select device - ERROR: init_CL(3, 3) failed
: -d 24 Select device - ERROR: init_CL(3, 4) failed
:
:Note: the mfakto.ini contents were left appropriate for the uhd630 throughout, so the gtx105ti may have had reason to fail
:platform number is not matching lsgpu output
:cpu opencl not encountered
:note this is on Windows 10 Pro X64, i7-8750H with UHD 630 and GTX 1050Ti

: -d 11 was previously uhd630

Last fiddled with by kriesel on 2020-10-27 at 22:02

 2020-10-31, 16:59 #3424 rebirther     Sep 2011 Germany BA216 Posts Another error with a GTX3080 (NVIDIA GeForce RTX 3080 (4095MB) driver: 456.38 OpenCL: 1.2): http://srbase.my-firewall.org/sr5/re...ultid=22792002 ERROR: cudaGetLastError() returned 48: no kernel image is available for execution on the device self compiled by the user: CUDA version info binary compiled for CUDA 11.10 CUDA runtime version 11.10 CUDA driver version 11.10 with BOINC: CUDA version info binary compiled for CUDA 10.0 CUDA runtime version 10.0 CUDA driver version 11.10 Both are not working, Any help is much appreciated.
2020-10-31, 17:30   #3425
James Heinrich

"James Heinrich"
May 2004
ex-Northern Ontario

31·113 Posts

Quote:
 Originally Posted by rebirther Another error with a GTX3080 Any help is much appreciated.
Any and all benchmarks for mfaktc, cudalucas, and gpuowl are highly sought after.

Last fiddled with by James Heinrich on 2020-10-31 at 17:54

2020-10-31, 17:55   #3426
Icecold

Oct 2020

1002 Posts

Quote:
 Originally Posted by James Heinrich You have access to a 3080? Any and all benchmarks for mfaktc, cudalucas, and gpuowl are highly sought after.
I have a 3080 and am the user with the SRBase issue Rebirther posted about. I can run any benchmarks that need ran if you can point me in the right direction on how to run them. I'm running Windows on the machine with the 3080.

 2020-10-31, 23:37 #3427 TheJudger     "Oliver" Mar 2005 Germany 11×101 Posts Finally got in touch with a RTX 3090: Code: mfaktc v0.22-pre8 (64bit built) [...] CUDA version info binary compiled for CUDA 11.10 CUDA runtime version 11.10 CUDA driver version 11.10 CUDA device info name GeForce RTX 3090 compute capability 8.6 max threads per block 1024 max shared memory per MP 102400 byte number of multiprocessors 82 clock rate (CUDA cores) 1755MHz memory clock rate: 9751MHz memory bus width: 384 bit [...] Starting trial factoring M66362159 from 2^74 to 2^75 (57.65 GHz-days) k_min = 142321062303420 k_max = 284642124610180 Using GPU kernel "barrett76_mul32_gs" Date Time | class Pct | time ETA | GHz-d/day Sieve Wait Nov 01 00:21 | 0 0.1% | 0.794 12m41s | 6535.09 82485 n.a.% Nov 01 00:21 | 4 0.2% | 0.788 12m35s | 6584.85 82485 n.a.% Nov 01 00:21 | 9 0.3% | 0.776 12m23s | 6686.67 82485 n.a.% [...] Nov 01 00:35 | 4617 100.0% | 0.849 0m00s | 6111.73 82485 n.a.% no factor for M66362159 from 2^74 to 2^75 [mfaktc 0.22-pre8 barrett76_mul32_gs CUDA 11.10 arch 8.0] B29A657C tf(): total time spent: 13m 28.241s GPU clock was 1650-1680 MHz once the GPU heated up, at startup it was a bit higher. Power consumption during the run was about 340-345 Watt as reported by nvidia-smi. Oliver
2020-11-01, 00:40   #3428
James Heinrich

"James Heinrich"
May 2004
ex-Northern Ontario

1101101011112 Posts

Quote:
 Originally Posted by Icecold I have a 3080 and am the user with the SRBase issue Rebirther posted about. I can run any benchmarks that need ran if you can point me in the right direction on how to run them. I'm running Windows on the machine with the 3080.
If you could, please run a benchmark with mfaktc filling in the form here:
https://www.mersenne.ca/mfaktc.php#benchmark

For both cudalucas and gpuowl, if you could start a primality test on exponent 57885161 and let it run for a few iterations until the iteration time is stable (typically anywhere from 30k-100k iterations) and then email me the output (james@mersenne.ca)
BTW: There will likely be a much more useful benchmark mode for gpuowl in a future version, but that's likely several months away yet, so for now that simple benchmark will suffice for comparison with other results.

2020-11-01, 03:08   #3429
axn

Jun 2003

19×271 Posts

Quote:
 Originally Posted by rebirther ERROR: cudaGetLastError() returned 48: no kernel image is available for execution on the device self compiled by the user: CUDA version info binary compiled for CUDA 11.10 CUDA runtime version 11.10 CUDA driver version 11.10
RTX 3000 series is CC 8.6. Can the user modify makefile to include support for that? Probably best for user to self compile, since I think only CUDA 11 support this.

It is weird that this is not working, though. Normally, the driver would take any available PTX and generate code for the CC dynamically.

2020-11-01, 05:57   #3430
Icecold

Oct 2020

22 Posts

Quote:
 Originally Posted by axn RTX 3000 series is CC 8.6. Can the user modify makefile to include support for that? Probably best for user to self compile, since I think only CUDA 11 support this. It is weird that this is not working, though. Normally, the driver would take any available PTX and generate code for the CC dynamically.
User here - Do you mean 86 as the compute code? I had to comment out the other CC versions and add 86 to get it to compile. This is what I had in my makefile to get it to compile(ignore the "CC 5.x GPUs will use this code" I just left that in there when I copy/pasted and commented out the previous line):

# generate code for various compute capabilities
#NVCCFLAGS += --generate-code arch=compute_11,code=sm_11 # CC 1.1, 1.2 and 1.3 GPUs will use this code (1.0 is not possible for mfaktc)
#NVCCFLAGS += --generate-code arch=compute_20,code=sm_20 # CC 2.x GPUs will use this code, one code fits all!
#NVCCFLAGS += --generate-code arch=compute_30,code=sm_30 # all CC 3.x GPUs _COULD_ use this code
#NVCCFLAGS += --generate-code arch=compute_35,code=sm_35 # but CC 3.5 (3.2?) _CAN_ use funnel shift which is useful for mfaktc
#NVCCFLAGS += --generate-code arch=compute_50,code=sm_50 # CC 5.x GPUs will use this code
NVCCFLAGS += --generate-code arch=compute_86,code=sm_86 # CC 5.x GPUs will use this code

Last fiddled with by Icecold on 2020-11-01 at 05:58

2020-11-01, 06:03   #3431
Icecold

Oct 2020

22 Posts

Quote:
 Originally Posted by James Heinrich If you could, please run a benchmark with mfaktc filling in the form here: https://www.mersenne.ca/mfaktc.php#benchmark For both cudalucas and gpuowl, if you could start a primality test on exponent 57885161 and let it run for a few iterations until the iteration time is stable (typically anywhere from 30k-100k iterations) and then email me the output (james@mersenne.ca) BTW: There will likely be a much more useful benchmark mode for gpuowl in a future version, but that's likely several months away yet, so for now that simple benchmark will suffice for comparison with other results.
Will do, I should be able to send over cudalucas and gpuowl in the next couple days, and mfaktc as soon as I can get that working. If anybody thinks this issue is related to something on my PC rather than the new card not being compatible I can try Linux tomorrow just let me know, but this is a fresh install of Windows 10. I appreciate the help so far, and hopefully can provide benchmarks soon.

2020-11-01, 10:29   #3432
axn

Jun 2003

19·271 Posts

Quote:
 Originally Posted by Icecold User here - Do you mean 86 as the compute code? I had to comment out the other CC versions and add 86 to get it to compile. This is what I had in my makefile to get it to compile(ignore the "CC 5.x GPUs will use this code" I just left that in there when I copy/pasted and commented out the previous line)
Yes, that's what I meant. Did that solve the "no kernel image" issue? I'm assuming you're able to use it as anonymous platform under BOINC

 Similar Threads Thread Thread Starter Forum Replies Last Post Bdot GPU Computing 1680 2021-09-13 17:01 firejuggler GPU Computing 753 2020-12-12 18:07 MrRepunit GPU Computing 32 2020-11-11 19:56 keisentraut Software 2 2020-08-18 07:03 fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 14:40.

Fri Oct 22 14:40:51 UTC 2021 up 91 days, 9:09, 1 user, load averages: 0.90, 1.13, 1.25