mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2019-09-11, 18:00   #3191
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

941010 Posts
Default

Quote:
Originally Posted by TheJudger View Post
unless you're really sure about the increased sieve size limit I suggest to stay with 1024... "Doesn't crash" and "passes the builtin selftest" doesn't prove that 2047 is OK. 2048 crashes hard because of an (integer) overflow...
Could factors be missed? I'm deploying hansl's resultant executable on many CoLab and Kaggle instances, and am finding the expected number of factors.

But before I put this into "production", should I revert to an unmodified build?
chalsall is online now   Reply With Quote
Old 2019-09-11, 18:01   #3192
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

2×3×5×11 Posts
Default

Quote:
Originally Posted by TheJudger View Post
Hi,
"Doesn't crash" and "passes the builtin selftest" doesn't prove that 2047 is OK. 2048 crashes hard because of an (integer) overflow...

Oliver
Fair enough, the difference between 1024 and 2047 isn't that big anymore anyway, less than 1%. Although, I've been running it at 2047 since January, mostly on the >1G exponents on mersenne.ca, and while I haven't collected stats for all that time, the last 2.5 months are: 189717 factors found for 12729313 exponents, 14903 ppm / 1,49 %. Of course that doesn't *prove* that it doesn't miss any factors anywhere... but now I'm retesting some ranges that have been independently factored (2-55 bits, with something else than mfaktc), I can check whether it has missed anything there thus far.
nomead is offline   Reply With Quote
Old 2019-09-11, 18:16   #3193
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

2×3×5×37 Posts
Default

Actually I didn't spent much time on thinking about this. I'm not sure wheter TF to 255 hits the wrap around or not.
I don't have any evidence that 2047 doesn't work, I'm just not a fan of "changed a number and it seems to work" changes.

Oliver
TheJudger is offline   Reply With Quote
Old 2019-09-11, 19:03   #3194
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

14A16 Posts
Default

Quote:
Originally Posted by chalsall View Post
Could factors be missed? I'm deploying hansl's resultant executable on many CoLab and Kaggle instances, and am finding the expected number of factors.

But before I put this into "production", should I revert to an unmodified build?
The build makes no changes to the code itself, it's just a modification to the bounds check in mfaktc.ini file processing. Ordinarily the limit is 128, and you can get the same effect by setting GPUSieveSize=128 (or even less) in mfaktc.ini, as you like.
nomead is offline   Reply With Quote
Old 2019-09-11, 19:36   #3195
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

100010101102 Posts
Default

Quote:
Originally Posted by nomead View Post
The build makes no changes to the code itself, it's just a modification to the bounds check in mfaktc.ini file processing.
That perfectly explains why it crashes at 2048...

Oliver
TheJudger is offline   Reply With Quote
Old 2019-09-11, 20:15   #3196
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

22·31·59 Posts
Default

The GPU sieve code was written ages ago. I've long forgotten its assumptions and limitations. My biggest fear is that the code requires the sieve size to be a power of two. Someone really needs to scrutinize the code before using 2047.
Prime95 is online now   Reply With Quote
Old 2019-09-11, 21:50   #3197
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

13·373 Posts
Default

Quote:
Originally Posted by Prime95 View Post
The GPU sieve code was written ages ago. I've long forgotten its assumptions and limitations. My biggest fear is that the code requires the sieve size to be a power of two. Someone really needs to scrutinize the code before using 2047.
Here's the entirety of the mfaktc.ini section on gpusievesize as the program is distributed. Seems like if a power of two was a requirement it would have been disclosed in a comment there, same as for the requirement of another parameter to be a multiple of 8. What happens if one uses 5 or 6 or 7 or 15 or 31 or 63 or 127 in an unaltered executable, other than performance variations? Seems like 7 vs. 8 would have the highest odds of showing mischief in a test.
Code:
# GPUSieveSize defines how big of a GPU sieve we use (in M bits).
#
# Minimum: GPUSieveSize=4
# Maximum: GPUSieveSize=128
#
# Default: GPUSieveSize=64

GPUSieveSize=64
Skimming the gpusieve.cu code, nothing jumps out at me as requiring a power of 2 there, although that means essentially nothing; I don't know CUDA programming. There are some things that seem to me to indicate the sieve size should be a multiple of a considerable power of two, but it's in units of M bits (220 bits).

Last fiddled with by kriesel on 2019-09-11 at 22:18
kriesel is online now   Reply With Quote
Old 2019-09-11, 23:38   #3198
hansl
 
hansl's Avatar
 
Apr 2019

5·41 Posts
Default

Quote:
Originally Posted by nomead View Post
The build makes no changes to the code itself, it's just a modification to the bounds check in mfaktc.ini file processing. Ordinarily the limit is 128, and you can get the same effect by setting GPUSieveSize=128 (or even less) in mfaktc.ini, as you like.
Right, the only "code" change was increasing the maximum allowable limit for that in src/params.h
This enables the max GPUSieveSize to go beyond 128, but its still subject to what is set in mfaktc.ini.

So, if its a concern, you should be fine just to lower the setting in mfaktc.ini to 1024 or whatever (assuming you are even using the bundled mfaktc.ini and not your own).
Building it again shouldn't be necessary.
hansl is offline   Reply With Quote
Old 2019-09-11, 23:54   #3199
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

2×3×5×11 Posts
Default

I've mentioned in this thread what I did back in January, and no big objections were raised back then. Well, this only gives me a better motivation to finally build a bigger test set of exponents and bit depths of already found factors, as extracted from the mersenne.ca database.

But here's another data point. I've started taking the >1G range to 64 bits, as in, finding all the factors up to that point, not just the first ones that can be found. This was recently exhaustively done by hansl to 55 bits, but there are still factors waiting to be found between 55 and 64. So, for the bits and pieces I've managed to do between about 2800 and 3000 million, in the short time I've been running this job thus far, comparing against already known factors:
1240663 exponents
786890 factors (in database) - none were missed in this search.
42547 new factors
There were already some factors between 55 and 64 bits in length in the database, of course. Most notably, for quite a long way above 2900 million, someone had already factored everything up to 64 bits.

The question then becomes, is any amount of testing ever enough?
nomead is offline   Reply With Quote
Old 2019-09-12, 00:35   #3200
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

13×373 Posts
Default

Quote:
Originally Posted by nomead View Post
I've mentioned in this thread what I did back in January, and no big objections were raised back then. Well, this only gives me a better motivation to finally build a bigger test set of exponents and bit depths of already found factors, as extracted from the mersenne.ca database....
The question then becomes, is any amount of testing ever enough?
Production use is somewhat informative but is not the equal of well designed testing. A test would look something like, run a set of exponents for a bit level on a non-power of two gpusievesize. Then run the no-factor-found survivors again in the same bit level with a power of two gpusievesize, and see how many more factors are found. If any, there's probably an issue. If none, maybe the sample size was too small and there's a small issue that's gone undetected.
kriesel is online now   Reply With Quote
Old 2019-09-12, 03:21   #3201
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

22·31·59 Posts
Default

Quote:
Originally Posted by nomead View Post
The question then becomes, is any amount of testing ever enough?
The first step should be to look at the code and convince oneself that 2047 ought to work. No one has done that. So.....

I took a look at the code. I see no reason why a setting of 2047 would not work. In fact, changing gpu_sieve_size to an unsigned int might allow for values up to 4095. Changing to unsigned long long could allow much higher values. There would also be some typecasts required to avoid compiler warnings.

The real limit is imposed by CUDA on this code line:

Code:
	SegSieve<<<(sieve_size + block_size - 1) / block_size, threadsPerBlock>>>((uint8 *)mystuff->d_bitarray, (uint8 *)mystuff->d_sieve_info, primes_per_thread);
What is CUDA's limit on the first parameter ((sieve_size + block_size - 1) / block_size)?
Prime95 is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1668 2020-12-22 15:38
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 32 2020-11-11 19:56
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 20:24.

Tue Jan 26 20:24:56 UTC 2021 up 54 days, 16:36, 0 users, load averages: 2.68, 2.82, 2.64

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.