mersenneforum.org mfakto: an OpenCL program for Mersenne prefactoring
 Register FAQ Search Today's Posts Mark Forums Read

 2018-01-21, 19:38 #1431 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 407110 Posts NVIDIA GTX1070 OpenCL data as reported by GPU-Z That certainly seems to explain the VectorSize=1 requirement given by Bdot for mfakto on NVIDIA. Code: General Platform Name NVIDIA CUDA Platform Vendor NVIDIA Corporation Platform Profile FULL_PROFILE Platform Version OpenCL 1.2 CUDA 8.0.0 Vendor NVIDIA Corporation Device Name GeForce GTX 1070 Version OpenCL 1.2 CUDA Driver Version 378.66 C Version OpenCL C 1.2 Profile FULL_PROFILE Global Memory Size 8192 MB Clock Frequency 1708 MHz Compute Units 15 Device Available Yes Compiler Available Yes Linker Available Yes Preferred Synchronization Device CMD Queue Properties Out of Order, Profiling SVM Capabilities Coarse DP Capability Denorm, INF NAN, Round Nearest, Round Zero, Round INF, FMA SP Capability Denorm, INF NAN, Round Nearest, Round Zero, Round INF, FMA Half FP Capability None Address Bits 64 Preferred On-Device Queue 256 KB Global Memory Cache 240 KB (RW Cache) Global Memory Cacheline 0 KB Local Memory Local (48 KB) Memory Alignment 4096 bits Built-in Kernels Little Endian Yes Error Correction No Execution Capability Kernel Unified Memory No Image Support Yes Limits Max Device Events 2048 Max Device Queues 4 Max On-Device Queue 256 KB Max Memory Allocation 2048 MB Max Constant Buffer 64 KB Max Constant Args 9 Max Read Image Args 256 Max Write Image Args 16 Max Samplers 32 Max Work Item Dims 3 Max Write Image Args 16 Native Vectors Native Vector Width (CHAR) 1 Native Vector Width (SHORT) 1 Native Vector Width (INT) 1 Native Vector Width (LONG) 1 Native Vector Width (FLOAT) 1 Native Vector Width (DOUBLE) 1 Native Vector Width (HALF) N/A Preferred Vector Width (CHAR) 1 Preferred Vector Width (SHORT) 1 Preferred Vector Width (INT) 1 Preferred Vector Width (LONG) 1 Preferred Vector Width (FLOAT) 1 Preferred Vector Width (DOUBLE) 1 Preferred Vector Width (HALF) N/A Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts
2018-01-21, 20:54   #1432
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

1111111001112 Posts

Quote:
 Originally Posted by xx005fs Don't really think DP performance matters that much for GpuOwL, because for my gpu overclocking the HBM memory makes it a lot faster and more efficient. Not 100% sure why tho.
Interesting. I interpret the GpuOwL author's post regarding new features in v1.5 to say that DP performance is very important for GpuOwL; sounds to me like it's the best one of the four transforms implemented. That's both sufficiently fast and provides sufficient bits of precision to be worth using at 4M length and above: http://www.mersenneforum.org/showpos...&postcount=224

2018-01-22, 01:40   #1433
xx005fs

"Eric"
Jan 2018
USA

7×29 Posts

Quote:
 Originally Posted by kriesel Interesting. I interpret the GpuOwL author's post regarding new features in v1.5 to say that DP performance is very important for GpuOwL; sounds to me like it's the best one of the four transforms implemented. That's both sufficiently fast and provides sufficient bits of precision to be worth using at 4M length and above: http://www.mersenneforum.org/showpos...&postcount=224
Double precision is definitely important, but memory speed (at least for Vega card) is just as important. Increasing the clock speed from 1400 to 1700MHz reduced from 3.6ms/it to 3.2ms/it on 1190MHz HBM, however, 800MHz HBM with 1700MHz increased it to 4ms/it. So they are equally important I guess.

 2018-02-28, 19:50 #1434 SELROC   2·7·19·37 Posts mfakto compilation on debian Hello I am trying to compile mfakto on debian stretch . It gives a great amount of errors, I can post the compiler trace if necessary, but I would like to know if you have some first-time suggestions. SELROC
2018-03-05, 21:45   #1435
ixfd64
Bemusing Prompter

"Danny"
Dec 2002
California

2·17·67 Posts

Has anyone else tried using WINE to run mfakto on macOS?

I'm getting the following error:

Quote:
 Compiling kernels. Error 002a:fixme:msvcp:_Locinfo__Locinfo_ctor_cat_cstr (0x22f2a8 1 C) semi-stub 002a:fixme:msvcp:_Locinfo__Locinfo_ctor_cat_cstr (0x22eeb8 1 C) semi-stub -43 (Invalid build options): clBuildProgram ERROR: load_kernels(0) failed

 2018-03-05, 23:22 #1436 henryzz Just call me Henry     "David" Sep 2007 Cambridge (GMT) 13×19×23 Posts Does opencl ever work in WINE?
 2018-04-27, 03:24 #1437 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 3×23×59 Posts mfakto downshift upon finding a factor Anyone have an idea why mfakto dropped in indicated throughput immediately upon finding a factor? It's 3 for 3, on a new RX550, mfakto 0.15pre6, 64bit on Win7, that passed the full selftest. 183ghzd/day before, 89 after, on an RX550. The drop is persistent, continuing after several hours, and more than a 2:1 ratio. The ETA seems not affected, so maybe it's only a cosmetic effect. (Exponent, factor and bits were changed in the first most recent example below, not yet submitted since the bit level hasn't completed yet.) Code: Apr 26 12:34 | 1785 38.8% | 512.54 3d11h | 183.47 38299 0.00% Apr 26 12:42 | 1792 38.9% | 513.40 3d11h | 183.16 38299 0.00% M1234567 has a factor: 123456789012134567 (72.843305 bits, 504.966873 GHz-d) Apr 26 12:51 | 1801 39.0% | 511.66 3d11h | 88.82 38299 0.00% Apr 26 12:59 | 1809 39.1% | 513.38 3d11h | 88.52 38299 0.00% Also, it seems to clear up with either completion of a worktodo line or a restart, or perhaps a bitlevel completion. Ctrl-c and restart cleared it up, with the stop/start and short selftest costing about 15 minutes of throughput. Code: Date Time | class Pct | time ETA | GHz-d/day Sieve Wait Apr 26 22:28 | 2116 45.9% | 513.02 3d01h | 183.29 38299 0.00% I do not recall seeing such an effect in mfaktc, which I've run much more than mfakto. A search through a 60MB sample of mfaktc screen output logged to file shows nothing like that. Here's another mfakto example, with a more than 15:1 ratio indicated Code: Apr 13 21:01 | 264 5.8% | 456.00 4d18h | 212.08 10045 0.00% Apr 13 21:08 | 267 5.9% | 453.66 4d17h | 213.18 10045 0.00% Apr 13 21:16 | 271 6.0% | 453.81 4d17h | 213.11 10045 0.00% M111269 has a factor: 617778664352573195639 (69.065652 bits, 70.546139 GHz-d) Apr 13 21:24 | 276 6.1% | 457.79 4d18h | 13.87 10045 0.00% Apr 13 21:31 | 280 6.3% | 459.59 4d18h | 13.81 10045 0.00% Date Time | class Pct | time ETA | GHz-d/day Sieve Wait Apr 13 21:39 | 291 6.4% | 459.27 4d18h | 13.82 10045 0.00% Another example, which persisted to completion of this exponent's bit level, clearing up when going on to the next worktodo entry. Code: Apr 13 11:11 | 3555 77.2% | 42.900 2h36m | 110.71 63018 0.00% Apr 13 11:12 | 3564 77.3% | 44.406 2h41m | 106.96 63018 0.00% Apr 13 11:13 | 3567 77.4% | 43.091 2h35m | 110.22 63018 0.00% M290001377 has a factor: 96303240212210144213599 (76.350002 bits, 18.470592 GHz-d) Apr 13 11:14 | 3568 77.5% | 44.628 2h40m | 37.25 63018 0.00% Apr 13 11:14 | 3579 77.6% | 44.089 2h37m | 37.70 63018 0.00% Apr 13 11:15 | 3583 77.7% | 43.331 2h34m | 38.36 63018 0.00% Last fiddled with by kriesel on 2018-04-27 at 03:34
 2018-05-31, 04:27 #1438 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 3×23×59 Posts Reference material I was offered "a blog area to consolidate all of your pdfs and guides and stuff" and accepted. Feel free to have a look and suggest content. (G-rated only;) General interest gpu related reference material http://www.mersenneforum.org/showthread.php?t=23371 Mfakto OpenCl based factoring on gpus http://www.mersenneforum.org/showthread.php?t=23394 Future updates to material previously posted in this thread will probably occur on the blog threads and not here. Having in-place update without a time limit makes it more manageable there.
 2018-06-23, 23:04 #1439 James Heinrich     "James Heinrich" May 2004 ex-Northern Ontario 1011010001102 Posts Just as a matter of curiosity: Both mfakto and mfaktc have a limit of exponent < 232. That's an obvious limit point, but how arbitrary or absolute is that limit? In the (probably distant) future, how easy or hard is it to extend the capabilities of mfakto to higher exponents?
2018-06-24, 21:52   #1440
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

FE716 Posts

Quote:
 Originally Posted by James Heinrich Just as a matter of curiosity: Both mfakto and mfaktc have a limit of exponent < 232. That's an obvious limit point, but how arbitrary or absolute is that limit? In the (probably distant) future, how easy or hard is it to extend the capabilities of mfakto to higher exponents?
I looked only briefly, and in the main routines it seemed not too bad. But a small sample of CUDA interface code shows various u32 instructions. So probably it has to be gone through from one end to the other by someone who knows what they're doing, routine by routine, kernel by kernel. I nominate not-me.
Looking at prime95's p-1 code for other reasons, I noticed code for handling bounds values bigger than 2^32, which you'll find in ecm.c (containing both ecm and p-1 code).
There's certainly plenty of gpu trial factoring to do within the mersenne.org 10^9 exponent cap, much less 2^32-5, more than 4.2 times higher.

Last fiddled with by kriesel on 2018-06-24 at 22:11

2018-06-24, 22:05   #1441
James Heinrich

"James Heinrich"
May 2004
ex-Northern Ontario

2×3×13×37 Posts

Quote:
 Originally Posted by kriesel I looked only briefly, and in the main routines it seemed not too bad. But a small sample of CUDA interface code shows various u32 instructions. So probably it has to be gone through from one end to the other by someone who knows what they're doing, routine by routine, kernel by kernel. I nominate not-me.
Thanks for looking into it. That's kind of what I suspected. I also assume that rewritten code for larger exponents has the potential to be at least slightly slower.

Fortunately there's still a couple hundred million exponents below 232 that need some more TF'ing first, so it's not really a high-priority problem.

 Similar Threads Thread Thread Starter Forum Replies Last Post TheJudger GPU Computing 3276 2020-06-29 19:00 preda GpuOwl 2360 2020-06-26 21:40 msft GPU Computing 433 2019-06-23 21:11 TObject GPU Computing 2 2013-10-12 21:09 Stargate38 Factoring 24 2011-11-03 00:34

All times are UTC. The time now is 20:26.

Fri Jul 10 20:26:53 UTC 2020 up 107 days, 17:59, 1 user, load averages: 1.73, 1.66, 1.66