mersenneforum.org mfaktc: a CUDA program for Mersenne prefactoring
 Register FAQ Search Today's Posts Mark Forums Read

2019-09-11, 18:00   #3191
chalsall
If I May

"Chris Halsall"
Sep 2002

941010 Posts

Quote:
 Originally Posted by TheJudger unless you're really sure about the increased sieve size limit I suggest to stay with 1024... "Doesn't crash" and "passes the builtin selftest" doesn't prove that 2047 is OK. 2048 crashes hard because of an (integer) overflow...
Could factors be missed? I'm deploying hansl's resultant executable on many CoLab and Kaggle instances, and am finding the expected number of factors.

But before I put this into "production", should I revert to an unmodified build?

2019-09-11, 18:01   #3192

"Sam Laur"
Dec 2018
Turku, Finland

2×3×5×11 Posts

Quote:
 Originally Posted by TheJudger Hi, "Doesn't crash" and "passes the builtin selftest" doesn't prove that 2047 is OK. 2048 crashes hard because of an (integer) overflow... Oliver
Fair enough, the difference between 1024 and 2047 isn't that big anymore anyway, less than 1%. Although, I've been running it at 2047 since January, mostly on the >1G exponents on mersenne.ca, and while I haven't collected stats for all that time, the last 2.5 months are: 189717 factors found for 12729313 exponents, 14903 ppm / 1,49 %. Of course that doesn't *prove* that it doesn't miss any factors anywhere... but now I'm retesting some ranges that have been independently factored (2-55 bits, with something else than mfaktc), I can check whether it has missed anything there thus far.

 2019-09-11, 18:16 #3193 TheJudger     "Oliver" Mar 2005 Germany 2×3×5×37 Posts Actually I didn't spent much time on thinking about this. I'm not sure wheter TF to 255 hits the wrap around or not. I don't have any evidence that 2047 doesn't work, I'm just not a fan of "changed a number and it seems to work" changes. Oliver
2019-09-11, 19:03   #3194

"Sam Laur"
Dec 2018
Turku, Finland

14A16 Posts

Quote:
 Originally Posted by chalsall Could factors be missed? I'm deploying hansl's resultant executable on many CoLab and Kaggle instances, and am finding the expected number of factors. But before I put this into "production", should I revert to an unmodified build?
The build makes no changes to the code itself, it's just a modification to the bounds check in mfaktc.ini file processing. Ordinarily the limit is 128, and you can get the same effect by setting GPUSieveSize=128 (or even less) in mfaktc.ini, as you like.

2019-09-11, 19:36   #3195
TheJudger

"Oliver"
Mar 2005
Germany

100010101102 Posts

Quote:
 Originally Posted by nomead The build makes no changes to the code itself, it's just a modification to the bounds check in mfaktc.ini file processing.
That perfectly explains why it crashes at 2048...

Oliver

 2019-09-11, 20:15 #3196 Prime95 P90 years forever!     Aug 2002 Yeehaw, FL 22·31·59 Posts The GPU sieve code was written ages ago. I've long forgotten its assumptions and limitations. My biggest fear is that the code requires the sieve size to be a power of two. Someone really needs to scrutinize the code before using 2047.
2019-09-11, 21:50   #3197
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

13·373 Posts

Quote:
 Originally Posted by Prime95 The GPU sieve code was written ages ago. I've long forgotten its assumptions and limitations. My biggest fear is that the code requires the sieve size to be a power of two. Someone really needs to scrutinize the code before using 2047.
Here's the entirety of the mfaktc.ini section on gpusievesize as the program is distributed. Seems like if a power of two was a requirement it would have been disclosed in a comment there, same as for the requirement of another parameter to be a multiple of 8. What happens if one uses 5 or 6 or 7 or 15 or 31 or 63 or 127 in an unaltered executable, other than performance variations? Seems like 7 vs. 8 would have the highest odds of showing mischief in a test.
Code:
# GPUSieveSize defines how big of a GPU sieve we use (in M bits).
#
# Minimum: GPUSieveSize=4
# Maximum: GPUSieveSize=128
#
# Default: GPUSieveSize=64

GPUSieveSize=64
Skimming the gpusieve.cu code, nothing jumps out at me as requiring a power of 2 there, although that means essentially nothing; I don't know CUDA programming. There are some things that seem to me to indicate the sieve size should be a multiple of a considerable power of two, but it's in units of M bits (220 bits).

Last fiddled with by kriesel on 2019-09-11 at 22:18

2019-09-11, 23:38   #3198
hansl

Apr 2019

5·41 Posts

Quote:
 Originally Posted by nomead The build makes no changes to the code itself, it's just a modification to the bounds check in mfaktc.ini file processing. Ordinarily the limit is 128, and you can get the same effect by setting GPUSieveSize=128 (or even less) in mfaktc.ini, as you like.
Right, the only "code" change was increasing the maximum allowable limit for that in src/params.h
This enables the max GPUSieveSize to go beyond 128, but its still subject to what is set in mfaktc.ini.

So, if its a concern, you should be fine just to lower the setting in mfaktc.ini to 1024 or whatever (assuming you are even using the bundled mfaktc.ini and not your own).
Building it again shouldn't be necessary.

 2019-09-11, 23:54 #3199 nomead     "Sam Laur" Dec 2018 Turku, Finland 2×3×5×11 Posts I've mentioned in this thread what I did back in January, and no big objections were raised back then. Well, this only gives me a better motivation to finally build a bigger test set of exponents and bit depths of already found factors, as extracted from the mersenne.ca database. But here's another data point. I've started taking the >1G range to 64 bits, as in, finding all the factors up to that point, not just the first ones that can be found. This was recently exhaustively done by hansl to 55 bits, but there are still factors waiting to be found between 55 and 64. So, for the bits and pieces I've managed to do between about 2800 and 3000 million, in the short time I've been running this job thus far, comparing against already known factors: 1240663 exponents 786890 factors (in database) - none were missed in this search. 42547 new factors There were already some factors between 55 and 64 bits in length in the database, of course. Most notably, for quite a long way above 2900 million, someone had already factored everything up to 64 bits. The question then becomes, is any amount of testing ever enough?
2019-09-12, 00:35   #3200
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

13×373 Posts

Quote:
 Originally Posted by nomead I've mentioned in this thread what I did back in January, and no big objections were raised back then. Well, this only gives me a better motivation to finally build a bigger test set of exponents and bit depths of already found factors, as extracted from the mersenne.ca database.... The question then becomes, is any amount of testing ever enough?
Production use is somewhat informative but is not the equal of well designed testing. A test would look something like, run a set of exponents for a bit level on a non-power of two gpusievesize. Then run the no-factor-found survivors again in the same bit level with a power of two gpusievesize, and see how many more factors are found. If any, there's probably an issue. If none, maybe the sample size was too small and there's a small issue that's gone undetected.

2019-09-12, 03:21   #3201
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

22·31·59 Posts

Quote:
 Originally Posted by nomead The question then becomes, is any amount of testing ever enough?
The first step should be to look at the code and convince oneself that 2047 ought to work. No one has done that. So.....

I took a look at the code. I see no reason why a setting of 2047 would not work. In fact, changing gpu_sieve_size to an unsigned int might allow for values up to 4095. Changing to unsigned long long could allow much higher values. There would also be some typecasts required to avoid compiler warnings.

The real limit is imposed by CUDA on this code line:

Code:
	SegSieve<<<(sieve_size + block_size - 1) / block_size, threadsPerBlock>>>((uint8 *)mystuff->d_bitarray, (uint8 *)mystuff->d_sieve_info, primes_per_thread);
What is CUDA's limit on the first parameter ((sieve_size + block_size - 1) / block_size)?

 Similar Threads Thread Thread Starter Forum Replies Last Post Bdot GPU Computing 1668 2020-12-22 15:38 firejuggler GPU Computing 753 2020-12-12 18:07 MrRepunit GPU Computing 32 2020-11-11 19:56 keisentraut Software 2 2020-08-18 07:03 fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 20:24.

Tue Jan 26 20:24:56 UTC 2021 up 54 days, 16:36, 0 users, load averages: 2.68, 2.82, 2.64