mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2018-01-21, 19:38   #1431
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

407110 Posts
Default NVIDIA GTX1070 OpenCL data as reported by GPU-Z

That certainly seems to explain the VectorSize=1 requirement given by Bdot for mfakto on NVIDIA.
Code:
General
Platform Name    NVIDIA CUDA
Platform Vendor    NVIDIA Corporation
Platform Profile    FULL_PROFILE
Platform Version    OpenCL 1.2 CUDA 8.0.0
Vendor    NVIDIA Corporation
Device Name    GeForce GTX 1070
Version    OpenCL 1.2 CUDA
Driver Version    378.66
C Version    OpenCL C 1.2 
Profile    FULL_PROFILE
Global Memory Size    8192 MB
Clock Frequency    1708 MHz
Compute Units    15
Device Available    Yes
Compiler Available    Yes
Linker Available    Yes
Preferred Synchronization    Device
CMD Queue Properties    Out of Order, Profiling
SVM Capabilities    Coarse
DP Capability    Denorm, INF NAN, Round Nearest, Round Zero, Round INF, FMA
SP Capability    Denorm, INF NAN, Round Nearest, Round Zero, Round INF, FMA
Half FP Capability    None
Address Bits    64
Preferred On-Device Queue    256 KB
Global Memory Cache    240 KB (RW Cache)
Global Memory Cacheline    0 KB
Local Memory    Local (48 KB)
Memory Alignment    4096 bits
Built-in Kernels    
Little Endian    Yes
Error Correction    No
Execution Capability    Kernel
Unified Memory    No
Image Support    Yes

Limits
Max Device Events    2048
Max Device Queues    4
Max On-Device Queue    256 KB
Max Memory Allocation    2048 MB
Max Constant Buffer    64 KB
Max Constant Args    9
Max Read Image Args    256
Max Write Image Args    16
Max Samplers    32
Max Work Item Dims    3
Max Write Image Args    16

Native Vectors
Native Vector Width (CHAR)    1
Native Vector Width (SHORT)    1
Native Vector Width (INT)    1
Native Vector Width (LONG)    1
Native Vector Width (FLOAT)    1
Native Vector Width (DOUBLE)    1
Native Vector Width (HALF)    N/A
Preferred Vector Width (CHAR)    1
Preferred Vector Width (SHORT)    1
Preferred Vector Width (INT)    1
Preferred Vector Width (LONG)    1
Preferred Vector Width (FLOAT)    1
Preferred Vector Width (DOUBLE)    1
Preferred Vector Width (HALF)    N/A

Extensions
cl_khr_global_int32_base_atomics 
cl_khr_global_int32_extended_atomics 
cl_khr_local_int32_base_atomics 
cl_khr_local_int32_extended_atomics 
cl_khr_fp64 
cl_khr_byte_addressable_store 
cl_khr_icd cl_khr_gl_sharing 
cl_nv_compiler_options 
cl_nv_device_attribute_query 
cl_nv_pragma_unroll 
cl_nv_d3d9_sharing 
cl_nv_d3d10_sharing 
cl_khr_d3d10_sharing 
cl_nv_d3d11_sharing 
cl_nv_copy_opts
kriesel is online now   Reply With Quote
Old 2018-01-21, 20:54   #1432
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

1111111001112 Posts
Default

Quote:
Originally Posted by xx005fs View Post
Don't really think DP performance matters that much for GpuOwL, because for my gpu overclocking the HBM memory makes it a lot faster and more efficient. Not 100% sure why tho.
Interesting. I interpret the GpuOwL author's post regarding new features in v1.5 to say that DP performance is very important for GpuOwL; sounds to me like it's the best one of the four transforms implemented. That's both sufficiently fast and provides sufficient bits of precision to be worth using at 4M length and above: http://www.mersenneforum.org/showpos...&postcount=224
kriesel is online now   Reply With Quote
Old 2018-01-22, 01:40   #1433
xx005fs
 
"Eric"
Jan 2018
USA

7×29 Posts
Default

Quote:
Originally Posted by kriesel View Post
Interesting. I interpret the GpuOwL author's post regarding new features in v1.5 to say that DP performance is very important for GpuOwL; sounds to me like it's the best one of the four transforms implemented. That's both sufficiently fast and provides sufficient bits of precision to be worth using at 4M length and above: http://www.mersenneforum.org/showpos...&postcount=224
Double precision is definitely important, but memory speed (at least for Vega card) is just as important. Increasing the clock speed from 1400 to 1700MHz reduced from 3.6ms/it to 3.2ms/it on 1190MHz HBM, however, 800MHz HBM with 1700MHz increased it to 4ms/it. So they are equally important I guess.
xx005fs is offline   Reply With Quote
Old 2018-02-28, 19:50   #1434
SELROC
 

2·7·19·37 Posts
Exclamation mfakto compilation on debian

Hello I am trying to compile mfakto on debian stretch . It gives a great amount of errors, I can post the compiler trace if necessary, but I would like to know if you have some first-time suggestions.

SELROC
  Reply With Quote
Old 2018-03-05, 21:45   #1435
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

2·17·67 Posts
Default

Has anyone else tried using WINE to run mfakto on macOS?

I'm getting the following error:

Quote:
Compiling kernels.
Error 002a:fixme:msvcp:_Locinfo__Locinfo_ctor_cat_cstr (0x22f2a8 1 C) semi-stub
002a:fixme:msvcp:_Locinfo__Locinfo_ctor_cat_cstr (0x22eeb8 1 C) semi-stub
-43 (Invalid build options): clBuildProgram
ERROR: load_kernels(0) failed
ixfd64 is online now   Reply With Quote
Old 2018-03-05, 23:22   #1436
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT)

13×19×23 Posts
Default

Does opencl ever work in WINE?
henryzz is offline   Reply With Quote
Old 2018-04-27, 03:24   #1437
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3×23×59 Posts
Default mfakto downshift upon finding a factor

Anyone have an idea why mfakto dropped in indicated throughput immediately upon finding a factor? It's 3 for 3, on a new RX550, mfakto 0.15pre6, 64bit on Win7, that passed the full selftest. 183ghzd/day before, 89 after, on an RX550. The drop is persistent, continuing after several hours, and more than a 2:1 ratio. The ETA seems not affected, so maybe it's only a cosmetic effect. (Exponent, factor and bits were changed in the first most recent example below, not yet submitted since the bit level hasn't completed yet.)

Code:
Apr 26 12:34 | 1785  38.8% | 512.54   3d11h |    183.47    38299    0.00%
Apr 26 12:42 | 1792  38.9% | 513.40   3d11h |    183.16    38299    0.00%
M1234567 has a factor: 123456789012134567 (72.843305 bits, 504.966873 GHz-d)
Apr 26 12:51 | 1801  39.0% | 511.66   3d11h |     88.82    38299    0.00%
Apr 26 12:59 | 1809  39.1% | 513.38   3d11h |     88.52    38299    0.00%
Also, it seems to clear up with either completion of a worktodo line or a restart, or perhaps a bitlevel completion.
Ctrl-c and restart cleared it up, with the stop/start and short selftest costing about 15 minutes of throughput.
Code:
Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
Apr 26 22:28 | 2116  45.9% | 513.02   3d01h |    183.29    38299    0.00%
I do not recall seeing such an effect in mfaktc, which I've run much more than mfakto.
A search through a 60MB sample of mfaktc screen output logged to file shows nothing like that.

Here's another mfakto example, with a more than 15:1 ratio indicated
Code:
Apr 13 21:01 |  264   5.8% | 456.00   4d18h |    212.08    10045    0.00%
Apr 13 21:08 |  267   5.9% | 453.66   4d17h |    213.18    10045    0.00%
Apr 13 21:16 |  271   6.0% | 453.81   4d17h |    213.11    10045    0.00%
M111269 has a factor: 617778664352573195639 (69.065652 bits, 70.546139 GHz-d)
Apr 13 21:24 |  276   6.1% | 457.79   4d18h |     13.87    10045    0.00%
Apr 13 21:31 |  280   6.3% | 459.59   4d18h |     13.81    10045    0.00%
Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
Apr 13 21:39 |  291   6.4% | 459.27   4d18h |     13.82    10045    0.00%
Another example, which persisted to completion of this exponent's bit level, clearing up when going on to the next worktodo entry.
Code:
Apr 13 11:11 | 3555  77.2% | 42.900   2h36m |    110.71    63018    0.00%
Apr 13 11:12 | 3564  77.3% | 44.406   2h41m |    106.96    63018    0.00%
Apr 13 11:13 | 3567  77.4% | 43.091   2h35m |    110.22    63018    0.00%
M290001377 has a factor: 96303240212210144213599 (76.350002 bits, 18.470592 GHz-d)
Apr 13 11:14 | 3568  77.5% | 44.628   2h40m |     37.25    63018    0.00%
Apr 13 11:14 | 3579  77.6% | 44.089   2h37m |     37.70    63018    0.00%
Apr 13 11:15 | 3583  77.7% | 43.331   2h34m |     38.36    63018    0.00%

Last fiddled with by kriesel on 2018-04-27 at 03:34
kriesel is online now   Reply With Quote
Old 2018-05-31, 04:27   #1438
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3×23×59 Posts
Default Reference material

I was offered "a blog area to consolidate all of your pdfs and guides and stuff" and accepted.
Feel free to have a look and suggest content. (G-rated only;)
General interest gpu related reference material http://www.mersenneforum.org/showthread.php?t=23371
Mfakto OpenCl based factoring on gpus http://www.mersenneforum.org/showthread.php?t=23394

Future updates to material previously posted in this thread will probably occur on the blog threads and not here. Having in-place update without a time limit makes it more manageable there.
kriesel is online now   Reply With Quote
Old 2018-06-23, 23:04   #1439
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

1011010001102 Posts
Default

Just as a matter of curiosity:
Both mfakto and mfaktc have a limit of exponent < 232. That's an obvious limit point, but how arbitrary or absolute is that limit? In the (probably distant) future, how easy or hard is it to extend the capabilities of mfakto to higher exponents?
James Heinrich is offline   Reply With Quote
Old 2018-06-24, 21:52   #1440
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

FE716 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
Just as a matter of curiosity:
Both mfakto and mfaktc have a limit of exponent < 232. That's an obvious limit point, but how arbitrary or absolute is that limit? In the (probably distant) future, how easy or hard is it to extend the capabilities of mfakto to higher exponents?
I looked only briefly, and in the main routines it seemed not too bad. But a small sample of CUDA interface code shows various u32 instructions. So probably it has to be gone through from one end to the other by someone who knows what they're doing, routine by routine, kernel by kernel. I nominate not-me.
Looking at prime95's p-1 code for other reasons, I noticed code for handling bounds values bigger than 2^32, which you'll find in ecm.c (containing both ecm and p-1 code).
There's certainly plenty of gpu trial factoring to do within the mersenne.org 10^9 exponent cap, much less 2^32-5, more than 4.2 times higher.

Last fiddled with by kriesel on 2018-06-24 at 22:11
kriesel is online now   Reply With Quote
Old 2018-06-24, 22:05   #1441
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

2×3×13×37 Posts
Default

Quote:
Originally Posted by kriesel View Post
I looked only briefly, and in the main routines it seemed not too bad. But a small sample of CUDA interface code shows various u32 instructions. So probably it has to be gone through from one end to the other by someone who knows what they're doing, routine by routine, kernel by kernel. I nominate not-me.
Thanks for looking into it. That's kind of what I suspected. I also assume that rewritten code for larger exponents has the potential to be at least slightly slower.

Fortunately there's still a couple hundred million exponents below 232 that need some more TF'ing first, so it's not really a high-priority problem.
James Heinrich is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3276 2020-06-29 19:00
gpuOwL: an OpenCL program for Mersenne primality testing preda GpuOwl 2360 2020-06-26 21:40
LL with OpenCL msft GPU Computing 433 2019-06-23 21:11
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
Program to TF Mersenne numbers with more than 1 sextillion digits? Stargate38 Factoring 24 2011-11-03 00:34

All times are UTC. The time now is 20:26.

Fri Jul 10 20:26:53 UTC 2020 up 107 days, 17:59, 1 user, load averages: 1.73, 1.66, 1.66

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.