 mersenneforum.org Trial division with CUDA (mmff) -- used, but runs like new!
 Register FAQ Search Today's Posts Mark Forums Read  2019-10-25, 13:37 #353 lalera   Jul 2003 13×47 Posts hi, many thanks to fan ming (and nomead) for the new win x64 cuda 10.1 executable of mmff v0.28   2019-11-23, 11:45 #354 ATH Einyen   Dec 2003 Denmark 22·13·61 Posts Maximum limits for mmff-0.28 for Fermat factoring. Tested on the Windows CUDA 10.1 version built by Fan Ming: https://www.mersenneforum.org/showpo...1&postcount=39 The ultimate limit is k < 264 but for some exponents the limits is lower than that. Code: 28 <= n <= 223 n=28-119: k*2n+1 < 2n+64 (92-183) k<264 n=120-127: k*2n+1 < 2183 k<263 to k<256 n=128-151: k*2n+1 < 2n+64 (192-215) k<264 n=152-159: k*2n+1 < 2215 k<263 to k<256 n=160-183: k*2n+1 < 2n+64 (224-247) k<264 n=184-191: k*2n+1 < 2247 k<263 to k<256 n=192-223: k*2n+1 < 2252 k<260 to k<229   2019-11-30, 16:41 #355 Gary   "Gary Gostin" Aug 2015 Texas, USA 1018 Posts I found it curious that Andreas got errors when trying to verify the factors of F205 and F215, since Serge said he verified these factors when he released version 0.28 (https://www.mersenneforum.org/showpo...&postcount=317). So I did some testing. I confirmed that Andreas's FermatFactor=207,224,225 range dies with ERROR: Exponentiation failure. The next smaller bit range is not supported, while higher bit ranges run to completion. I then tried testing individual values of K in the 225-bit factor range, and found that 207,232905,232905 correctly finds the factor of F205. However about 30% of individual K values die with ERROR: Exponentiation failure. Code: // Trying bit ranges //FermatFactor=207,223,224 // WARNING: bit range isn't supported! //FermatFactor=207,224,225 // ERROR: Exponentiation failure: k range: 131072 to 262143 (225-bit factors) //FermatFactor=207,225,226 // Runs: k range: 262144 to 524287 (226-bit factors) //FermatFactor=207,226,227 // Runs: k range: 524288 to 1048575 (227-bit factors) //FermatFactor=207,227,228 // Runs: k range: 1048576 to 2097151 (228-bit factors) // Trying individual values of K in the 225 bit factor range //FermatFactor=207,232885,232885 // Runs //FermatFactor=207,232887,232887 // ERROR: Exponentiation failure //FermatFactor=207,232889,232889 // Runs //FermatFactor=207,232891,232891 // Runs //FermatFactor=207,232893,232893 // Runs //FermatFactor=207,232895,232895 // Runs //FermatFactor=207,232897,232897 // Runs //FermatFactor=207,232899,232899 // Runs //FermatFactor=207,232901,232901 // ERROR: Exponentiation failure //FermatFactor=207,232903,232903 // Runs //FermatFactor=207,232905,232905 // Runs, finds F205 factor //FermatFactor=207,232907,232907 // ERROR: Exponentiation failure //FermatFactor=207,232909,232909 // Runs //FermatFactor=207,232911,232911 // Runs //FermatFactor=207,232913,232913 // Runs //FermatFactor=207,232915,232915 // Runs //FermatFactor=207,232917,232917 // Runs //FermatFactor=207,232919,232919 // ERROR: Exponentiation failure //FermatFactor=207,232921,232921 // Runs //FermatFactor=207,232923,232923 // Runs //FermatFactor=207,232925,232925 // ERROR: Exponentiation failure For the 226-bit factor range, while the full 207,262144,524287 range runs without error, about 30% of individual K values continue to die with ERROR: Exponentiation failure. I also found that any range of K that contains a failing K also fails, up to that point that the range contains more than about 160000 K, at which point mmff runs to completion without error. Code: // Trying individual values of K in the 226 bit factor range //FermatFactor=207,419987,419987 // ERROR: Exponentiation failure //FermatFactor=207,419989,419989 // Runs //FermatFactor=207,419991,419991 // ERROR: Exponentiation failure //FermatFactor=207,419993,419993 // Runs //FermatFactor=207,419995,419995 // Runs //FermatFactor=207,419997,419997 // ERROR: Exponentiation failure //FermatFactor=207,419999,419999 // Runs //FermatFactor=207,420001,420001 // Runs //FermatFactor=207,420003,420003 // Runs //FermatFactor=207,420005,420005 // Runs //FermatFactor=207,420007,420007 // Runs //FermatFactor=207,420009,420009 // ERROR: Exponentiation failure // Trying ranges of K in the 226 bit factor range //FermatFactor=207,419999,420007 // Runs //FermatFactor=207,419997,420007 // ERROR: Exponentiation failure //FermatFactor=207,410000,420000 // ERROR: Exponentiation failure //FermatFactor=207,300000,420000 // ERROR: Exponentiation failure //FermatFactor=207,300000,450000 // ERROR: Exponentiation failure //FermatFactor=207,300000,460000 // Runs //FermatFactor=207,262144,524287 // Runs I see the same thing happening in recent "production" search ranges. In Andreas's recent range, individual K or small K ranges die with either ERROR: Exponentiation failure or ERROR: Class problems Factor divisible by ..., and the error will vary randomly on repeating the same test multiple times. Code: // Trying Andreas's full range ***** //FermatFactor=205,130000000000000,140737488355327 // Runs // Trying individual values of K in the 252 bit factor range //FermatFactor=205,130000000000001,130000000000001 // Exp failure OR Factor divisible (random) //FermatFactor=205,130000000000003,130000000000003 // Runs //FermatFactor=205,130000000000005,130000000000005 // Exp failure OR Factor divisible (random) //FermatFactor=205,130000000000007,130000000000007 // Runs //FermatFactor=205,130000000000009,130000000000009 // Runs //FermatFactor=205,130000000000011,130000000000011 // Exp failure OR Factor divisible (random) //FermatFactor=205,130000000000013,130000000000013 // Runs //FermatFactor=205,130000000000015,130000000000015 // Runs //FermatFactor=205,130000000000017,130000000000017 // Runs //FermatFactor=205,130000000000019,130000000000019 // Exp failure OR Factor divisible (random) // Trying ranges of K in the 252 bit factor range //FermatFactor=205,130000000000000,130000000000100 // Exp failure OR Factor divisible (random) //FermatFactor=205,130000000000000,130000000001000 // Exp failure OR Factor divisible (random) //FermatFactor=205,130000000000000,130000000010000 // Exp failure OR Factor divisible (random) //FermatFactor=205,130000000000000,130000000180000 // Exp failure OR Factor divisible (random) //FermatFactor=205,130000000000000,130000000190000 // Runs Also in Peter's recent range with 171 bit factors. Code: // Trying ranges of K in the 171 bit factor range *** //FermatFactor=120,1527888802614000,1527888802615000 // ERROR: Exponentiation failure //FermatFactor=120,1527888802600000,1527888802700000 // ERROR: Exponentiation failure //FermatFactor=120,1527888802500000,1527888802700000 // Runs, finds F118 factor Of course this might a problem with my system. I am running Ubuntu 18.04 LTS with Cuda 10.1 on an RTX 2080. Could someone else verify some of the results above (just comment out individual lines). If it persists, hopefully this is an mmff problem that only affects small ranges of K. But looking at the source, it appears that only a tiny fraction of K values are checked for accuracy by calling validate_exponentiation(), for obvious performance reasons. So is it possible, if highly unlikely, that undetected errors are occurring for larger ranges of K? George or Serge, would one of you have time to investigate this? For hardware validation, by using single K values and adding some recent factors, here is an expanded version of Andreas's worktodo file that should verify 41 known Fermat factors. Code: // Check the known Fermat factors within the ranges of mmff // Ranges supported: 28 <= exp <= 223; 64 bit <= factor size <= 252 bit; K min/max vary with exp // K min/max < 1000 are interpreted as factor bit size min/max, >= 1000 as K min/max FermatFactor=36,2e10,3e10 // F28: 25709319373 * 2^36 + 1 FermatFactor=33,546e10,547e10 // F31: 5463561471303 * 2^33 + 1 FermatFactor=39,69,70 // F37: 1275438465 * 2^39 + 1 FermatFactor=41,286492e10,286493e10 // F39: 2864929972774011 * 2^41 + 1 FermatFactor=45,11131e10,11132e10 // F42: 111318179143061 * 2^45 + 1 FermatFactor=45,21e10,22e10 // F43: 212675402445 * 2^45 + 1 FermatFactor=50,213e10,214e10 // F48: 2139543641769 * 2^50 + 1 FermatFactor=54,4119,4119 // F52: 4119 * 2^54 + 1 FermatFactor=54,78,79 // F52: 21626655 * 2^54 + 1 FermatFactor=54,8190e10,8191e10 // F52: 81909357657279 * 2^54 + 1 //FermatFactor=61,67,68 // F58: 95 * 2^61 + 1 ***No way to specify FermatFactor=68,121089e10,121090e10 // F65: 1210895760431083 * 2^68 + 1 FermatFactor=74,100,101 // F72: 76432329 * 2^74 + 1 FermatFactor=77,98,99 // F75: 3447431 * 2^77 + 1 FermatFactor=79,5e9,6e9 // F77: 5940341195 * 2^79 + 1 FermatFactor=87,1595e9,1596e9 // F83: 1595863660157 * 2^87 + 1 FermatFactor=88,20018e9,20019e9 // F86: 20018578522347 * 2^88 + 1 FermatFactor=90,119e9,120e9 // F88: 119942751127 * 2^90 + 1 FermatFactor=92,198e9,199e9 // F90: 198922467387 * 2^92 + 1 FermatFactor=93,1421,1421 // F91: 1421 * 2^93 + 1 FermatFactor=97,482e9,483e9 // F94: 482524552001 * 2^97 + 1 FermatFactor=101,3334e9,3335e9 // F96: 3334131633063 * 2^101 + 1 FermatFactor=111,141,142 // F107: 1289179925 * 2^111 + 1 FermatFactor=120,3e9,4e9 // F116: 3433149787 * 2^120 + 1 FermatFactor=120,1527888802500000,1527888802700000 // F118: 1527888802614951 * 2^120 + 1 FermatFactor=124,146,147 // F122: 5234775 * 2^124 + 1 //FermatFactor=127,129,130 // F125: 5 * 2^127 + 1 ***No way to specify FermatFactor=135,1075441212600000,1075441212800000 // F132: 1075441212722595 * 2^135 + 1 FermatFactor=135,88e9,89e9 // F133: 88075576149 * 2^135 + 1 FermatFactor=145,167,168 // F142: 8152599 * 2^145 + 1 FermatFactor=148,173,174 // F146: 37092477 * 2^148 + 1 FermatFactor=149,3125,3125 // F147: 3125 * 2^149 + 1 FermatFactor=149,175,176 // F147: 124567335 * 2^149 + 1 FermatFactor=157,1575,1575 // F150: 1575 * 2^157 + 1 FermatFactor=154,5439,5439 // F150: 5439 * 2^154 + 1 FermatFactor=167,197,198 // F164: 1835601567 * 2^167 + 1 FermatFactor=171,2674e9,2675e9 // F166: 2674670937447 * 2^171 + 1 FermatFactor=174,20e9,21e9 // F172: 20569603303 * 2^174 + 1 FermatFactor=180,3e8,4e8 // F178: 313047661 * 2^180 + 1 FermatFactor=187,213,214 // F184: 117012935 * 2^187 + 1 FermatFactor=197,48594e9,48596e9 // F195: 48595346636925 * 2^197 + 1 FermatFactor=207,232905,232905 // F205: 232905 * 2^207 + 1 FermatFactor=217,32111,32111 // F215: 32111 * 2^217 + 1   2019-12-02, 06:20 #356 ATH Einyen   Dec 2003 Denmark 61448 Posts My guess is that when the k-range is very small then sieving might remove all candidates and there is no candidate left to do the exponentiation. In your single k tests the ones that work are probably the ones without any small factors. I might check later but I do not have time right now.   2019-12-09, 03:47   #357
Gary

"Gary Gostin"
Aug 2015
Texas, USA

5×13 Posts Quote:
 Originally Posted by ATH My guess is that when the k-range is very small then sieving might remove all candidates and there is no candidate left to do the exponentiation. In your single k tests the ones that work are probably the ones without any small factors. I might check later but I do not have time right now.
Looks like you are right! When running small ranges, each time before an error occurs the number of factors surviving the sieve is zero (total_bit_count = 0 in the tf_*.h kernel). This causes the kernel to skip the calculations entirely, but it still copies the factor and final remainder for one value of K to the results array (RES) for validation. Since the factor and final remainder are function local variables that are never written, they contain garbage values. This explains why running the same test repeatedly produces various Factor divisible and Exponentiation failure errors.

So the mystery is solved, and none of this raises any doubts about mmff correctness for large ranges of K (which I hoped and expected all along).

I modified the kernels to set a flag in the results validation array (datalen = 0) when zero factors survive the sieve. Then in tf_validate.h the validation checks are skipped if datalen is zero. Hopefully this will eliminate the following errors for correctly working hardware:

ERROR: Class problems. Factor divisible by 2, 3, 5, 7, or 11
ERROR: GPU sieve problems. Factor divisible by <int>
ERROR: Exponentiation failure

With these changes, all 43 known factors within the range of mmff can be verified using the following worktodo.txt file:

Code:
// Check the known Fermat factors within the ranges of mmff
// Ranges supported: 28 <= exp <= 223; 64 bit <= factor size <= 252 bit; K min/max vary with exp
// K min/max < 1000 are interpreted as factor bit size min/max, >= 1000 as K min/max

FermatFactor=36,2e10,3e10        // F28: 25709319373 * 2^36 + 1
FermatFactor=33,546e10,547e10        // F31: 5463561471303 * 2^33 + 1
FermatFactor=39,69,70            // F37: 1275438465 * 2^39 + 1
FermatFactor=41,286492e10,286493e10    // F39: 2864929972774011 * 2^41 + 1
FermatFactor=45,11131e10,11132e10    // F42: 111318179143061 * 2^45 + 1
FermatFactor=45,21e10,22e10        // F43: 212675402445 * 2^45 + 1
FermatFactor=50,213e10,214e10        // F48: 2139543641769 * 2^50 + 1
FermatFactor=54,66,67            // F52: 4119 * 2^54 + 1
FermatFactor=54,78,79            // F52: 21626655 * 2^54 + 1
FermatFactor=54,8190e10,8191e10        // F52: 81909357657279 * 2^54 + 1
FermatFactor=61,67,68            // F58: 95 * 2^61 + 1
FermatFactor=68,121089e10,121090e10    // F65: 1210895760431083 * 2^68 + 1
FermatFactor=74,100,101            // F72: 76432329 * 2^74 + 1
FermatFactor=77,98,99            // F75: 3447431 * 2^77 + 1
FermatFactor=79,5e9,6e9            // F77: 5940341195 * 2^79 + 1
FermatFactor=87,1595e9,1596e9        // F83: 1595863660157 * 2^87 + 1
FermatFactor=88,20018e9,20019e9        // F86: 20018578522347 * 2^88 + 1
FermatFactor=90,119e9,120e9        // F88: 119942751127 * 2^90 + 1
FermatFactor=92,198e9,199e9        // F90: 198922467387 * 2^92 + 1
FermatFactor=93,103,104            // F91: 1421 * 2^93 + 1
FermatFactor=97,482e9,483e9        // F94: 482524552001 * 2^97 + 1
FermatFactor=101,3334e9,3335e9        // F96: 3334131633063 * 2^101 + 1
FermatFactor=111,141,142        // F107: 1289179925 * 2^111 + 1
FermatFactor=120,3e9,4e9        // F116: 3433149787 * 2^120 + 1
FermatFactor=120,1527888e9,1527889e9    // F118: 1527888802614951 * 2^120 + 1
FermatFactor=124,146,147        // F122: 5234775 * 2^124 + 1
FermatFactor=127,129,130        // F125: 5 * 2^127 + 1
FermatFactor=135,1075441e9,1075442e9    // F132: 1075441212722595 * 2^135 + 1
FermatFactor=135,88e9,89e9        // F133: 88075576149 * 2^135 + 1
FermatFactor=145,167,168        // F142: 8152599 * 2^145 + 1
FermatFactor=148,173,174        // F146: 37092477 * 2^148 + 1
FermatFactor=149,160,161        // F147: 3125 * 2^149 + 1
FermatFactor=149,175,176        // F147: 124567335 * 2^149 + 1
FermatFactor=157,167,168        // F150: 1575 * 2^157 + 1
FermatFactor=154,166,167        // F150: 5439 * 2^154 + 1
FermatFactor=167,197,198        // F164: 1835601567 * 2^167 + 1
FermatFactor=171,2674e9,2675e9        // F166: 2674670937447 * 2^171 + 1
FermatFactor=174,20e9,21e9        // F172: 20569603303 * 2^174 + 1
FermatFactor=180,3e8,4e8        // F178: 313047661 * 2^180 + 1
FermatFactor=187,213,214        // F184: 117012935 * 2^187 + 1
FermatFactor=197,48594e9,48596e9    // F195: 48595346636925 * 2^197 + 1
FermatFactor=207,224,225        // F205: 232905 * 2^207 + 1
FermatFactor=217,231,232        // F215: 32111 * 2^217 + 1
Here is source with these changes and a CUDA 10.1 Linux binary that will hopefully run on Kepler or later (--gpu-architecture=compute_30). I included Serge's patch to print factors found in K*2^N+1 form. If you want factors in the old format, use output.c from the 0.28 release. I also fixed a few other misc things, and changed the version to 0.28.1 to identify this binary. I am not sure who the current owner of mmff is, but if I changed anything in a "bad" way please feel free to fix it and re-post.
Attached Files mmff-0.28.1.zip (569.9 KB, 133 views)   2019-12-09, 04:40 #358 Dylan14   "Dylan" Mar 2017 2·293 Posts @Gary: The original v0.28 version was posted by Serge (https://mersenneforum.org/showpost.p...&postcount=317), so I would presume he is the current maintainer. Do note, it has been 5 years since that has been posted.   2019-12-09, 06:09 #359 Prime95 P90 years forever!   Aug 2002 Yeehaw, FL 1DD016 Posts @Gary: I think it is a case of "you touch it, you own it". Congratulations.   2020-01-23, 07:04   #360
Fan Ming

Oct 2019

5×19 Posts Thanks for clues provided by Andreas!
The class problems and exp failure problems are indeed solved for mmff now, I post the source code here because I also did some other minor changes and still some problems with Windows binary.
Attached file contains CUDA 10.1 binary compiled for linux-64bit and the source code. The code is based on 0.28 version, and the compiled binary can be used on Google colab.
Note that changes for tf to fix the class problems in source codes are made before I saw the source files posted by Gary (I haven't check now), so notice me If I did some flaky/bad changes.

Minor changes:
(1) Fixed the class problems & exp failure caused by tf validate by set RES[RESULTS_ARRAY_VALIDATION_OFFSET] = 0 and do not copy other values if no candidate survives. If RES[RESULTS_ARRAY_VALIDATION_OFFSET] == 0 then just do not call validate function. Note I think that the "ERROR: Exponentiation failure" error message is somewhat unclear, so I changed it to : "ERROR: Verifying on CPU failed. Remainder didn\'t match. Possible problems exist." Please notice me if my understanding is incorrect.
(2) Replaced all deprecated cudaThreadSynchronize() functions with cudaDeviceSynchronize() funtions in case they are not supported in the future.
(3) In gpusieve.cu, the launch bounds for many functions are:
Code:
__global__ static void __launch_bounds__(256,6) blablabla....
However, the maximum number of threads per stream multiprocessor for Turing cards (CC 7.5) are 1024 instead of 2048 of all previous cards. Since it's lower bound setting, this will cause overflow for Turing cards so the second parameter setting is ignored when compiling for Turing CC7.5 architecture using NVCC. I don't know if this lower bound setting is necessary, but I still changed all these launch bounds settings to:
Code:
#if __CUDA_ARCH__ < 750
__global__ static void __launch_bounds__(256,6) blablabla...
#else
__global__ static void __launch_bounds__(256,3) blablabla...
Notice me if this change is incorrect.
(4) Minor format reading problems fixes.

The compiled binary for linux passed all 41 test cases provided by ATH:
Code:
FermatFactor=36,2e10,3e10		# F28: 25709319373 * 2^36 + 1
FermatFactor=33,546e10,547e10		# F31: 5463561471303 * 2^33 + 1
FermatFactor=39,69,70			# F37: 1275438465 * 2^39 + 1
FermatFactor=41,286492e10,286493e10	# F39: 2864929972774011 * 2^41 + 1
FermatFactor=45,11131e10,11132e10	# F42: 111318179143061 * 2^45 + 1
FermatFactor=45,21e10,22e10		# F43: 212675402445 * 2^45 + 1
FermatFactor=50,213e10,214e10		# F48: 2139543641769 * 2^50 + 1
FermatFactor=54,66,67			# F52: 4119 * 2^54 + 1
FermatFactor=54,78,79			# F52: 21626655 * 2^54 + 1
FermatFactor=54,8190e10,8191e10		# F52: 81909357657279 * 2^54 + 1
FermatFactor=61,67,68			# F58: 95 * 2^61 + 1
FermatFactor=68,121089e10,121090e10	# F65: 1210895760431083 * 2^68 + 1
FermatFactor=74,100,101			# F72: 76432329 * 2^74 + 1
FermatFactor=77,98,99			# F75: 3447431 * 2^77 + 1
FermatFactor=79,5e9,6e9			# F77: 5940341195 * 2^79 + 1
FermatFactor=87,1595e9,1596e9		# F83: 1595863660157 * 2^87 + 1
FermatFactor=88,20018e9,20019e9		# F86: 20018578522347 * 2^88 + 1
FermatFactor=90,119e9,120e9		# F88: 119942751127 * 2^90 + 1
FermatFactor=92,198e9,199e9		# F90: 198922467387 * 2^92 + 1
FermatFactor=93,103,104			# F91: 1421 * 2^93 + 1
FermatFactor=97,482e9,483e9		# F94: 482524552001 * 2^97 + 1
FermatFactor=101,3334e9,3335e9		# F96: 3334131633063 * 2^101 + 1
FermatFactor=111,141,142		# F107: 1289179925 * 2^111 + 1
FermatFactor=120,3e9,4e9		# F116: 3433149787 * 2^120 + 1
FermatFactor=124,146,147		# F122: 5234775 * 2^124 + 1
FermatFactor=127,129,130		# F125: 5 * 2^127 + 1
FermatFactor=135,88e9,89e9		# F133: 88075576149 * 2^135 + 1
FermatFactor=145,167,168		# F142: 8152599 * 2^145 + 1
FermatFactor=148,173,174		# F146: 37092477 * 2^148 + 1
FermatFactor=149,160,161		# F147: 3125 * 2^149 + 1
FermatFactor=149,175,176		# F147: 124567335 * 2^149 + 1
FermatFactor=154,166,167		# F150: 5439 * 2^154 + 1
FermatFactor=157,167,168		# F150: 1575 * 2^157 + 1
FermatFactor=167,197,198		# F164: 1835601567 * 2^167 + 1
FermatFactor=171,2674e9,2675e9		# F166: 2674670937447 * 2^171 + 1
FermatFactor=174,20e9,21e9		# F172: 20569603303 * 2^174 + 1
FermatFactor=180,3e8,4e8		# F178: 313047661 * 2^180 + 1
FermatFactor=187,213,214		# F184: 117012935 * 2^187 + 1
FermatFactor=197,48594e9,48596e9	# F195:	48595346636925 * 2^197 + 1
FermatFactor=207,224,227		# F205:	232905 * 2^207 + 1
FermatFactor=217,231,232		# F215: 32111 * 2^217 + 1
Result:
Code:
F28 has a factor: 1766730974551267606529 [TF:70:71*:mmff 0.28 mfaktc_barrett89_F32_63gs]
found 1 factor for k*2^36+1 in k range: 20G to 30G (71-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_F32_63gs]
F31 has a factor: 46931635677864055013377 [TF:75:76*:mmff 0.28 mfaktc_barrett89_F32_63gs]
found 1 factor for k*2^33+1 in k range: 5460G to 5470G (76-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_F32_63gs]
F37 has a factor: 701179711390136401921 [TF:69:70*:mmff 0.28 mfaktc_barrett89_F32_63gs]
found 1 factor for k*2^39+1 in k range: 1073741824 to 2147483647 (70-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_F32_63gs]
F39 has a factor: 6300047635658008393597059073 [TF:92:93*:mmff 0.28 mfaktc_barrett96_F32_63gs]
found 1 factor for k*2^41+1 in k range: 2864920G to 2864930G (93-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett96_F32_63gs]
F42 has a factor: 3916660235220715932328394753 [TF:91:92*:mmff 0.28 mfaktc_barrett96_F32_63gs]
found 1 factor for k*2^45+1 in k range: 111310G to 111320G (92-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett96_F32_63gs]
F43 has a factor: 7482850493766970889994241 [TF:82:83*:mmff 0.28 mfaktc_barrett89_F32_63gs]
found 1 factor for k*2^45+1 in k range: 210G to 220G (83-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_F32_63gs]
F48 has a factor: 2408911986953445595315961857 [TF:90:91*:mmff 0.28 mfaktc_barrett96_F32_63gs]
found 1 factor for k*2^50+1 in k range: 2130G to 2140G (91-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett96_F32_63gs]
F52 has a factor: 74201307460556292097 [TF:66:67*:mmff 0.28 mfaktc_barrett89_F32_63gs]
found 1 factor for k*2^54+1 in k range: 4096 to 8191 (67-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_F32_63gs]
F52 has a factor: 389591181597081096683521 [TF:78:79*:mmff 0.28 mfaktc_barrett89_F32_63gs]
found 1 factor for k*2^54+1 in k range: 16777216 to 33554431 (79-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_F32_63gs]
F52 has a factor: 1475547810493913550438096961537 [TF:100:101*:mmff 0.28 mfaktc_barrett108_F32_63gs]
found 1 factor for k*2^54+1 in k range: 81900G to 81910G (101-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett108_F32_63gs]
F58 has a factor: 219055085875300925441 [TF:67:68*:mmff 0.28 mfaktc_barrett89_F32_63gs]
found 1 factor for k*2^61+1 in k range: 64 to 127 (68-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_F32_63gs]
F65 has a factor: 357393347081793620781479724788482049 [TF:118:119*:mmff 0.28 mfaktc_barrett120_F64_95gs]
found 1 factor for k*2^68+1 in k range: 1210890G to 1210900G (119-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett120_F64_95gs]
F72 has a factor: 1443765874709062348345951911937 [TF:100:101*:mmff 0.28 mfaktc_barrett108_F64_95gs]
found 1 factor for k*2^74+1 in k range: 67108864 to 134217727 (101-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett108_F64_95gs]
F75 has a factor: 520961043404985083798310879233 [TF:98:99*:mmff 0.28 mfaktc_barrett108_F64_95gs]
found 1 factor for k*2^77+1 in k range: 2097152 to 4194303 (99-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett108_F64_95gs]
F77 has a factor: 3590715923977960355577974656860161 [TF:111:112*:mmff 0.28 mfaktc_barrett120_F64_95gs]
found 1 factor for k*2^79+1 in k range: 5G to 6G (112-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett120_F64_95gs]
F83 has a factor: 246947940268608417020015902258307792897 [TF:127:128*:mmff 0.28 mfaktc_barrett128_F64_95gs]
found 1 factor for k*2^87+1 in k range: 1595G to 1596G (128-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett128_F64_95gs]
F86 has a factor: 6195449970597928748332522715641578258433 [TF:132:133*:mmff 0.28 mfaktc_barrett140_F64_95gs]
found 1 factor for k*2^88+1 in k range: 20018G to 20019G (133-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett140_F64_95gs]
F88 has a factor: 148481934042154969241780501829489000449 [TF:126:127*:mmff 0.28 mfaktc_barrett128_F64_95gs]
found 1 factor for k*2^90+1 in k range: 119G to 120G (127-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett128_F64_95gs]
F90 has a factor: 985016348367230226078056532654006730753 [TF:129:130*:mmff 0.28 mfaktc_barrett140_F64_95gs]
found 1 factor for k*2^92+1 in k range: 198G to 199G (130-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett140_F64_95gs]
F91 has a factor: 14072902366596202965053244178433 [TF:103:104*:mmff 0.28 mfaktc_barrett108_F64_95gs]
found 1 factor for k*2^93+1 in k range: 1024 to 2047 (104-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett108_F64_95gs]
F94 has a factor: 76459067246115642538831634131564386844673 [TF:135:136*:mmff 0.28 mfaktc_barrett140_F96_127gs]
found 1 factor for k*2^97+1 in k range: 482G to 483G (136-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett140_F96_127gs]
F96 has a factor: 8453027931784477309850388309101819121893377 [TF:142:143*:mmff 0.28 mfaktc_barrett152_F96_127gs]
found 1 factor for k*2^101+1 in k range: 3334G to 3335G (143-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett152_F96_127gs]
F107 has a factor: 3346902437331832346018436558958369334886401 [TF:141:142*:mmff 0.28 mfaktc_barrett152_F96_127gs]
found 1 factor for k*2^111+1 in k range: 1073741824 to 2147483647 (142-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett152_F96_127gs]
F116 has a factor: 4563438810603420826872624280490561141381005313 [TF:151:152*:mmff 0.28 mfaktc_barrett152_F96_127gs]
found 1 factor for k*2^120+1 in k range: 3G to 4G (152-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett152_F96_127gs]
F122 has a factor: 111331351706159727817280425663664652445286401 [TF:146:147*:mmff 0.28 mfaktc_barrett152_F96_127gs]
found 1 factor for k*2^124+1 in k range: 4194304 to 8388607 (147-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett152_F96_127gs]
F125 has a factor: 850705917302346158658436518579420528641 [TF:129:130*:mmff 0.28 mfaktc_barrett140_F96_127gs]
found 1 factor for k*2^127+1 in k range: 4 to 7 (130-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett140_F96_127gs]
F133 has a factor: 3836232386548105510567872577199319351015739156856833 [TF:171:172*:mmff 0.28 mfaktc_barrett172_F128_159gs]
found 1 factor for k*2^135+1 in k range: 88G to 89G (172-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett172_F128_159gs]
F142 has a factor: 363618066009591119386121910507749518730588867002369 [TF:167:168*:mmff 0.28 mfaktc_barrett172_F128_159gs]
found 1 factor for k*2^145+1 in k range: 4194304 to 8388607 (168-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett172_F128_159gs]
F146 has a factor: 13235038053749721162769301995307025251972223086886913 [TF:173:174*:mmff 0.28 mfaktc_barrett183_F128_159gs]
found 1 factor for k*2^148+1 in k range: 33554432 to 67108863 (174-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett183_F128_159gs]
F147 has a factor: 2230074519853062314153571827264836150598041600001 [TF:160:161*:mmff 0.28 mfaktc_barrett172_F128_159gs]
found 1 factor for k*2^149+1 in k range: 2048 to 4095 (161-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett172_F128_159gs]
F147 has a factor: 88894220732640180500173831441107513117330143465963521 [TF:175:176*:mmff 0.28 mfaktc_barrett183_F128_159gs]
found 1 factor for k*2^149+1 in k range: 67108864 to 134217727 (176-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett183_F128_159gs]
F150 has a factor: 124204803210043452689216278205372864748572142206977 [TF:166:167*:mmff 0.28 mfaktc_barrett172_F128_159gs]
found 1 factor for k*2^154+1 in k range: 4096 to 8191 (167-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett172_F128_159gs]
F150 has a factor: 287733134849521512021350451441018219494761719398401 [TF:167:168*:mmff 0.28 mfaktc_barrett172_F128_159gs]
found 1 factor for k*2^157+1 in k range: 1024 to 2047 (168-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett172_F128_159gs]
F164 has a factor: 343390041044181900054983258125842173093877961821829176754177 [TF:197:198*:mmff 0.28 mfaktc_barrett204_F160_191gs]
found 1 factor for k*2^167+1 in k range: 1073741824 to 2147483647 (198-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett204_F160_191gs]
F166 has a factor: 8005705634611551271269985633916919970948098093294822472135213057 [TF:212:213*:mmff 0.28 mfaktc_barrett215_F160_191gs]
found 1 factor for k*2^171+1 in k range: 2674G to 2675G (213-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett215_F160_191gs]
F172 has a factor: 492544145925433733451855533863925475950550777193174123310743553 [TF:208:209*:mmff 0.28 mfaktc_barrett215_F160_191gs]
found 1 factor for k*2^174+1 in k range: 20G to 21G (209-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett215_F160_191gs]
F178 has a factor: 479744144560996421795040836675707785358665797968769873751310337 [TF:208:209*:mmff 0.28 mfaktc_barrett215_F160_191gs]
found 1 factor for k*2^180+1 in k range: 300M to 400M (209-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett215_F160_191gs]
F184 has a factor: 22953190542224652377639611826608942557783370967811443134226759681 [TF:213:214*:mmff 0.28 mfaktc_barrett215_F160_191gs]
found 1 factor for k*2^187+1 in k range: 67108864 to 134217727 (214-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett215_F160_191gs]
F195 has a factor: 9761213910603494986281795830720869047027739722070601061612088452553113601 [TF:242:243*:mmff 0.28 mfaktc_barrett247_F192_223gs]
found 1 factor for k*2^197+1 in k range: 48594G to 48596G (243-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett247_F192_223gs]
F205 has a factor: 47905779865361936656012887182939964920375512098173614759150973091841 [TF:224:225*:mmff 0.28 mfaktc_barrett236_F192_223gs]
found 1 factor for k*2^207+1 in k range: 131072 to 262143 (225-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett236_F192_223gs]
F215 has a factor: 6763365995538079644113691573900682504384080816814065022974359599316993 [TF:231:232*:mmff 0.28 mfaktc_barrett236_F192_223gs]
found 1 factor for k*2^217+1 in k range: 16384 to 32767 (232-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett236_F192_223gs]
Some double mersennes test cases:
Code:
MMFactor=31,64,65
MMFactor=61,549e9,550e9
MMFactor=31,56e9,57e9
MMFactor=31,54e9,55e9
MMFactor=31,414.5e11,415e11
MMFactor=31,414e11,415e11
MMFactor=31,416e11,417e11
The results are as expected without problems:
Code:
no factor for MM31 in k range: 4294967298 to 8589934595 (65-bit factors) [mmff 0.28 mfaktc_barrett89_M31gs]
no factor for MM61 in k range: 549000000000 to 549755813887 (101-bit factors) [mmff 0.28 mfaktc_barrett108_M61gs]
no factor for MM61 in k range: 549755813888 to 550000000000 (102-bit factors) [mmff 0.28 mfaktc_barrett108_M61gs]
MM31 has a factor: 242557615644693265201 [TF:67:68*:mmff 0.28 mfaktc_barrett89_M31gs]
found 1 factor for MM31 in k range: 56G to 57G (68-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_M31gs]
no factor for MM31 in k range: 54G to 55G (68-bit factors) [mmff 0.28 mfaktc_barrett89_M31gs]
no factor for MM31 in k range: 41450G to 41500G (78-bit factors) [mmff 0.28 mfaktc_barrett89_M31gs]
MM31 has a factor: 178021379228511215367151 [TF:77:78*:mmff 0.28 mfaktc_barrett89_M31gs]
found 1 factor for MM31 in k range: 41400G to 41500G (78-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_M31gs]
no factor for MM31 in k range: 41600G to 41700G (78-bit factors) [mmff 0.28 mfaktc_barrett89_M31gs]
However, when it was compiled for Windows using Visual Studio 2019 it still failed to run (but was not class problems, etc.):
Code:
mmff v0.28 (64bit built)

Compiletime options
MORE_CLASSES              enabled

Runtime options
GPU Sieving               enabled
WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486)
GPUSievePrimes            depends on worktodo entry
GPUSieveSize              16M bits
WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8)
GPUSieveProcessSize       8K bits
WorkFile                  worktodo.txt
Checkpoints               enabled
CheckpointDelay           30s
StopAfterFactor           class
PrintMode                 full
V5UserID                  (none)
ComputerID                (none)
WARNING, no ProgressFormat specified in mmff.ini, using default
TimeStampInResults        no

CUDA version info
binary compiled for CUDA  10.10
CUDA runtime version      10.10
CUDA driver version       10.20

CUDA device info
name                      GeForce GTX 1660
compute capability        7.5
number of mutliprocessors 22 (unknown number of shader cores)
clock rate                1800MHz

got assignment: MM127, k range 116500000000000000 to 117000000000000000 (185-bit
factors)
Starting trial factoring of MM127 in k range: 116500T to 117000T (185-bit factor
s)
k_min = 116500000000000000
k_max = 117000000000000000
Using GPU kernel "mfaktc_barrett185_M127gs"
class | candidates |    time |    ETA | raw  rate | SievePrimes | CPU wait
5/4620 |    108.23G | 11.263s |  3h00m | 9608.91M/s |      810549
ERROR: cudaGetLastError() returned 98: invalid device function
The invalid device function error is usually problems when a kernel was not compiled for correct CC architecture or didn't exist.
I tried to get the attribute for target kernel but it also returns error. So that's not because kernels are not compiled with correct CC architecture, but didn't exist.
I wrote a test kernel and it also raised the same problem. Yes, the program failed to recognize it, simply thought it didn't exist (and will not be executed).
I don't know what went wrong for MSVC 2019 compiler to cause the programs can't recongnize the existance of any kernels, since the file size are normal. The older Visual Studio 2012 version should work, but I haven't use it now since I already uninstalled it.
However, some problems must existed since all newer versions of MSVC compiler (2017 or later, I don't know about 2013 or 2015) can cause the problem.
I really have no idea about that...
The compiling process for Windows binary using Visual Studio follows the post by TheJudger somewhere in the forum.
I haven't test the normal CUDA compiling process using Visual Studio, since it needs to adjust some including relations of header files in many source files of mmff, which is a little bit unconvenient.
Attached Files mmff-linux-CUDA10.1-colab.zip (3.04 MB, 211 views)

Last fiddled with by Fan Ming on 2020-01-23 at 07:33   2020-01-26, 08:35   #361
Fan Ming

Oct 2019

1378 Posts Quote:
 Originally Posted by Fan Ming The invalid device function error is usually problems...
It seems this problem can occur at linux too.
Tesla P100 instances on Google colab.

Last fiddled with by Fan Ming on 2020-01-26 at 08:36   2020-01-26, 11:40   #362
Fan Ming

Oct 2019

5×19 Posts Quote:
 Originally Posted by Fan Ming It seems this problem can occur at linux too. Tesla P100 instances on Google colab. However, I'm not sure about this, and it's much harder to have P100 assigned on Google colab now. Can anyone confirm this?
Got a P100 instance successfully.
It's not this "invalid device funtion" error (which can cause Exp failure if failed to execute kernel and the garbage value in memory satisfies some conditions), but the real Exponentiation failure error.
For unknown reason the "-v 3" option couldn't work (mmff raised ERROR: can't parse -v option) on colab, so I changed the default verbosity level to 3.

I ran sometimes and here is the error information:
Code:
mmff v0.28 (64bit built)

Compiletime options
MORE_CLASSES              enabled

Runtime options
GPU Sieving               enabled
WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486)
GPUSievePrimes            depends on worktodo entry
GPUSieveSize              128M bits
WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8)
GPUSieveProcessSize       8K bits
WorkFile                  worktodo.txt
Checkpoints               enabled
CheckpointDelay           30s
StopAfterFactor           class
PrintMode                 full
V5UserID                  (none)
ComputerID                (none)
GPUProgressHeader         "    class | candidates |    time |    ETA | raw  rate | SievePrimes | CPU wait"
GPUProgressFormat            "%C/4620 |    %n | %ts | %e | %rM/s |     %s |  %W%%"
TimeStampInResults        no

CUDA version info
binary compiled for CUDA  10.10
CUDA runtime version      10.10
CUDA driver version       10.10

CUDA device info
name                      Tesla P100-PCIE-16GB
compute capability        6.0
number of mutliprocessors 56 (unknown number of shader cores)
clock rate                1328MHz

got assignment: MM127, k range 70368744177664 to 500000000000000 (175 to 177 bit factors)
Starting trial factoring of MM127 in k range: 70368744177664 to 140737488355327 (175-bit factors)
k_min = 70368744177664
k_max = 140737488355327
Using GPU kernel "mfaktc_barrett183_M127gs"
Verifying (2^(2^127)) % 23945244016114007668591781862075984047752025015141633 = 4926629721325240139649429581548920523512559095913937
ERROR: Verifying on CPU failed.	Remainder didn't match. Possible problems exist.
Code:
mmff v0.28 (64bit built)

Compiletime options
MORE_CLASSES              enabled

Runtime options
GPU Sieving               enabled
WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486)
GPUSievePrimes            depends on worktodo entry
GPUSieveSize              128M bits
WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8)
GPUSieveProcessSize       8K bits
WorkFile                  worktodo.txt
Checkpoints               enabled
CheckpointDelay           30s
StopAfterFactor           class
PrintMode                 full
V5UserID                  (none)
ComputerID                (none)
GPUProgressHeader         "    class | candidates |    time |    ETA | raw  rate | SievePrimes | CPU wait"
GPUProgressFormat            "%C/4620 |    %n | %ts | %e | %rM/s |     %s |  %W%%"
TimeStampInResults        no

CUDA version info
binary compiled for CUDA  10.10
CUDA runtime version      10.10
CUDA driver version       10.10

CUDA device info
name                      Tesla P100-PCIE-16GB
compute capability        6.0
number of mutliprocessors 56 (unknown number of shader cores)
clock rate                1328MHz

got assignment: MM127, k range 70368744177664 to 500000000000000 (175 to 177 bit factors)
Starting trial factoring of MM127 in k range: 70368744177664 to 140737488355327 (175-bit factors)
k_min = 70368744177664
k_max = 140737488355327
Using GPU kernel "mfaktc_barrett183_M127gs"
Verifying (2^(2^127)) % 23945243918643526487758168387626961494996338526257873 = 11812001209279499151039916953333557370062855661257534
ERROR: Verifying on CPU failed.	Remainder didn't match. Possible problems exist.
Code:
mmff v0.28 (64bit built)

Compiletime options
MORE_CLASSES              enabled

Runtime options
GPU Sieving               enabled
WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486)
GPUSievePrimes            depends on worktodo entry
GPUSieveSize              128M bits
WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8)
GPUSieveProcessSize       8K bits
WorkFile                  worktodo.txt
Checkpoints               enabled
CheckpointDelay           30s
StopAfterFactor           class
PrintMode                 full
V5UserID                  (none)
ComputerID                (none)
GPUProgressHeader         "    class | candidates |    time |    ETA | raw  rate | SievePrimes | CPU wait"
GPUProgressFormat            "%C/4620 |    %n | %ts | %e | %rM/s |     %s |  %W%%"
TimeStampInResults        no

CUDA version info
binary compiled for CUDA  10.10
CUDA runtime version      10.10
CUDA driver version       10.10

CUDA device info
name                      Tesla P100-PCIE-16GB
compute capability        6.0
number of mutliprocessors 56 (unknown number of shader cores)
clock rate                1328MHz

got assignment: MM127, k range 70368744177664 to 500000000000000 (175 to 177 bit factors)
Starting trial factoring of MM127 in k range: 70368744177664 to 140737488355327 (175-bit factors)
k_min = 70368744177664
k_max = 140737488355327
Using GPU kernel "mfaktc_barrett183_M127gs"
Verifying (2^(2^127)) % 23945244016114007668591781862075984047752025015141633 = 4926629721325240139649429581548920523512559095913937
ERROR: Verifying on CPU failed.	Remainder didn't match. Possible problems exist.
Code:
got assignment: MM127, k range 70368744177664 to 500000000000000 (175 to 177 bit factors)
Starting trial factoring of MM127 in k range: 70368744177664 to 140737488355327 (175-bit factors)
k_min = 70368744177664
k_max = 140737488355327
Using GPU kernel "mfaktc_barrett183_M127gs"
Verifying (2^(2^127)) % 23945244006681380457543367654871239929743410193636753 = 20826885465921148439067402367610686467153380117365399
ERROR: Verifying on CPU failed.	Remainder didn't match. Possible problems exist.
Code:
got assignment: MM127, k range 70368744177664 to 500000000000000 (175 to 177 bit factors)
Starting trial factoring of MM127 in k range: 70368744177664 to 140737488355327 (175-bit factors)
k_min = 70368744177664
k_max = 140737488355327
Using GPU kernel "mfaktc_barrett183_M127gs"
Verifying (2^(2^127)) % 23945243923359840093282375491229333554000645937010313 = 18376582414064778318809558114847430298939300967906033
ERROR: Verifying on CPU failed.	Remainder didn't match. Possible problems exist.
Code:
got assignment: MM127, k range 70368744177664 to 500000000000000 (175 to 177 bit factors)
Starting trial factoring of MM127 in k range: 70368744177664 to 140737488355327 (175-bit factors)
k_min = 70368744177664
k_max = 140737488355327
Using GPU kernel "mfaktc_barrett183_M127gs"
Verifying (2^(2^127)) % 23945244006681380457543367654871239929743410193636753 = 20826885465921148439067402367610686467153380117365399
ERROR: Verifying on CPU failed.	Remainder didn't match. Possible problems exist.
Note the "ERROR: Verifying on CPU failed. Remainder didn\'t match. Possible problems exist." information is actually "ERROR: Exponentiation failure". I changed the description of this error, see post #360 I posted several days ago.

It seems the factor values were all legal values, for example,
23945244016114007668591781862075984047752025015141633,
23945243918643526487758168387626961494996338526257873,
23945243923359840093282375491229333554000645937010313,
23945244006681380457543367654871239929743410193636753
They are all legal 2kp+1 values.
However, all remainder values are indeed wrong.
And for same factor value, the wrong remainder value is same.
This problem also exists in previous mmff 0.28 version(Before the solution of class problems, not because my changes. I haven't check previous versions now), so some problems must exists. Don't know why...

Last fiddled with by Fan Ming on 2020-01-26 at 12:35   2020-01-26, 12:17   #363
Fan Ming

Oct 2019

5×19 Posts Use Gary's source, and still errors occured(ran several times):
Code:
mmff v0.28.1 (64bit built)

Compiletime options
MORE_CLASSES              enabled

Runtime options
GPU Sieving               enabled
WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486)
GPUSievePrimes            depends on worktodo entry
GPUSieveSize              128M bits
WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8)
GPUSieveProcessSize       8K bits
WorkFile                  worktodo.txt
Checkpoints               enabled
CheckpointDelay           30s
StopAfterFactor           class
PrintMode                 full
V5UserID                  (none)
ComputerID                (none)
GPUProgressHeader         "    class | candidates |    time |    ETA | raw  rate | SievePrimes | CPU wait"
WARNING, no ProgressFormat specified in mmff.ini, using default
ProgressFormat            "%C/4620 |    %n | %ts | %e | %rM/s |     %s"
TimeStampInResults        no

CUDA version info
binary compiled for CUDA  10.10
CUDA runtime version      10.10
CUDA driver version       10.10

CUDA device info
name                      Tesla P100-PCIE-16GB
compute capability        6.0
number of mutliprocessors 56 (unknown number of shader cores)
clock rate                1328MHz

got assignment: MM127, k range 70368744177664 to 500000000000000 (175 to 177 bit factors)
Starting trial factoring of MM127 in k range: 70368744177664 to 140737488355327 (175-bit factors)
k_min = 70368744177664
k_max = 140737488355327
Using GPU kernel "mfaktc_barrett183_M127gs"
Verifying (2^(2^127)) % 23945243937508780909854996802036449731013568169267633 = 7606706320838621808794870660151320699229326362771323
ERROR: Exponentiation failure
Code:
mmff v0.28.1 (64bit built)

Compiletime options
MORE_CLASSES              enabled

Runtime options
GPU Sieving               enabled
WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486)
GPUSievePrimes            depends on worktodo entry
GPUSieveSize              128M bits
WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8)
GPUSieveProcessSize       8K bits
WorkFile                  worktodo.txt
Checkpoints               enabled
CheckpointDelay           30s
StopAfterFactor           class
PrintMode                 full
V5UserID                  (none)
ComputerID                (none)
GPUProgressHeader         "    class | candidates |    time |    ETA | raw  rate | SievePrimes | CPU wait"
WARNING, no ProgressFormat specified in mmff.ini, using default
ProgressFormat            "%C/4620 |    %n | %ts | %e | %rM/s |     %s"
TimeStampInResults        no

CUDA version info
binary compiled for CUDA  10.10
CUDA runtime version      10.10
CUDA driver version       10.10

CUDA device info
name                      Tesla P100-PCIE-16GB
compute capability        6.0
number of mutliprocessors 56 (unknown number of shader cores)
clock rate                1328MHz

got assignment: MM127, k range 70368744177664 to 500000000000000 (175 to 177 bit factors)
Starting trial factoring of MM127 in k range: 70368744177664 to 140737488355327 (175-bit factors)
k_min = 70368744177664
k_max = 140737488355327
Using GPU kernel "mfaktc_barrett183_M127gs"
Verifying (2^(2^127)) % 23945243937508780909854996802036449731013568169267633 = 7606706320838621808794870660151320699229326362771323
ERROR: Exponentiation failure
Code:
got assignment: MM127, k range 70368744177664 to 500000000000000 (175 to 177 bit factors)
Starting trial factoring of MM127 in k range: 70368744177664 to 140737488355327 (175-bit factors)
k_min = 70368744177664
k_max = 140737488355327
Using GPU kernel "mfaktc_barrett183_M127gs"
Verifying (2^(2^127)) % 23945243956374035331951825216445937967030797812277393 = 21049357416014847908393584649762608127534186076535180
ERROR: Exponentiation failure
Code:
got assignment: MM127, k range 70368744177664 to 500000000000000 (175 to 177 bit factors)
Starting trial factoring of MM127 in k range: 70368744177664 to 140737488355327 (175-bit factors)
k_min = 70368744177664
k_max = 140737488355327
Using GPU kernel "mfaktc_barrett183_M127gs"
Verifying (2^(2^127)) % 23945243956374035331951825216445937967030797812277393 = 21049357416014847908393584649762608127534186076535180
ERROR: Exponentiation failure
Code:
got assignment: MM127, k range 70368744177664 to 500000000000000 (175 to 177 bit factors)
Starting trial factoring of MM127 in k range: 70368744177664 to 140737488355327 (175-bit factors)
k_min = 70368744177664
k_max = 140737488355327
Using GPU kernel "mfaktc_barrett183_M127gs"
Verifying (2^(2^127)) % 23945243956374035331951825216445937967030797812277393 = 21049357416014847908393584649762608127534186076535180
ERROR: Exponentiation failure
However, other numbers seems work correctly(too large, see attached logs.txt, part 1):
[too large, see attached logs.txt]

Some other test cases(too large, only sample here, see attached logs.txt, part 2):
Code:
/content/drive/My Drive/mmff-0.28.1
mmff v0.28.1 (64bit built)

Compiletime options
MORE_CLASSES              enabled

Runtime options
GPU Sieving               enabled
WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486)
GPUSievePrimes            depends on worktodo entry
GPUSieveSize              128M bits
WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8)
GPUSieveProcessSize       8K bits
WorkFile                  worktodo.txt
Checkpoints               enabled
CheckpointDelay           30s
StopAfterFactor           class
PrintMode                 full
V5UserID                  (none)
ComputerID                (none)
GPUProgressHeader         "    class | candidates |    time |    ETA | raw  rate | SievePrimes | CPU wait"
WARNING, no ProgressFormat specified in mmff.ini, using default
ProgressFormat            "%C/4620 |    %n | %ts | %e | %rM/s |     %s"
TimeStampInResults        no

CUDA version info
binary compiled for CUDA  10.10
CUDA runtime version      10.10
CUDA driver version       10.10

CUDA device info
name                      Tesla P100-PCIE-16GB
compute capability        6.0
number of mutliprocessors 56 (unknown number of shader cores)
clock rate                1328MHz

got assignment: MM31, k range 4294967298 to 8589934595 (65-bit factors)
Starting trial factoring of MM31 in k range: 4294967298 to 8589934595 (65-bit factors)
k_min = 4294967298
k_max = 8589934595
Using GPU kernel "mfaktc_barrett89_M31gs"
Verifying (2^(2^31)) % 18455732847550407041 = 18041335883521486051
class | candidates |    time |    ETA | raw  rate | SievePrimes | CPU wait
2/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18455474908994598577 = 13210018195264925476
6/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18450117401151801329 = 7139557165896038944
14/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18450375361182446263 = 9057953753314217069
15/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18455732950629622097 = 988841009176436615
26/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18449601545515020871 = 15586616874725587374
27/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18454185233395425433 = 11040915153769198707
30/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18451883495998061423 = 14953888264990787734
35/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18455236950626641801 = 6124863183292633277
42/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18453351910956141671 = 15189020512858988414
47/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18450018342026132513 = 1741686722528884267
50/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18451208911254996607 = 1117800095954824569
51/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18457876109244557039 = 7096891942730331470
59/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18455594206006156721 = 4096068541967968090
62/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18454899726974586097 = 9373744610566902525
66/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18453887768255610287 = 83107776338795315
71/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18453034547232853423 = 7489924297197587194
75/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18453490977702154097 = 1069386786885538204
86/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18454879987304902873 = 13223985354672910196
90/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18448252547827582999 = 16324629731528481613
99/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18453292640407484471 = 16327008864618939019
class | candidates |    time |    ETA | raw  rate | SievePrimes | CPU wait
107/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18453868093010436473 = 14672522872021511977
110/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18456844509640145767 = 158751210797949868
WARNING: Factor divisible by 293.  Only occasionally should GPU sieve let small factors slip through
111/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18456030969820218169 = 13243929036377631083
114/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18459761428087931279 = 928694628223646081
119/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18451864026911317721 = 12732358327742052958
122/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18456308819844401617 = 414574268522609115
126/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18455495301499310489 = 9211324510596461997
134/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18457836750164274823 = 16899488420486930904
135/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18451705387999346537 = 17081498552114410952
146/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18455812841316257791 = 11683251650899530624
147/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18451427606694639793 = 13103389033054754948
150/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18453590487799388783 = 11499864167607720722
155/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18450733149137905639 = 12859935701888440148
159/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18454027058339922001 = 11659533504391022171
162/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18452677772889675431 = 10744843596348203562
167/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18454900173651184673 = 1859362368207149326
170/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18453610399267763767 = 8701203538784565069
171/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18454324751113003729 = 6213104066269045180
174/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18451685699869270841 = 6495362410738542899
182/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18464305726823282567 = 5979809582257939535
class | candidates |    time |    ETA | raw  rate | SievePrimes | CPU wait
191/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18456805180624634609 = 12467334056667722874
194/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18453848615333758183 = 14209312759826643484
195/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18453451807600432817 = 10388588196196914677
206/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18452122360604117233 = 10349137720694434220
210/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18455614705885050983 = 167229099570259508
215/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18453471719068807801 = 14507868372907181248
222/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18453610639785932231 = 17813826982514021073
227/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18456011629582493287 = 1627973603262381094
231/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
Verifying (2^(2^31)) % 18449781019313335249 = 12125103717347465058
234/4620 |      0.93M |  0.001s |   n.a. | 933.89M/s |       90677
All seems work properly. However, once I changed to MM127, errors occured again:
Code:
/content/drive/My Drive/mmff-0.28.1
mmff v0.28.1 (64bit built)

Compiletime options
MORE_CLASSES              enabled

Runtime options
GPU Sieving               enabled
WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486)
GPUSievePrimes            depends on worktodo entry
GPUSieveSize              128M bits
WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8)
GPUSieveProcessSize       8K bits
WorkFile                  worktodo.txt
Checkpoints               enabled
CheckpointDelay           30s
StopAfterFactor           class
PrintMode                 full
V5UserID                  (none)
ComputerID                (none)
GPUProgressHeader         "    class | candidates |    time |    ETA | raw  rate | SievePrimes | CPU wait"
WARNING, no ProgressFormat specified in mmff.ini, using default
ProgressFormat            "%C/4620 |    %n | %ts | %e | %rM/s |     %s"
TimeStampInResults        no

CUDA version info
binary compiled for CUDA  10.10
CUDA runtime version      10.10
CUDA driver version       10.10

CUDA device info
name                      Tesla P100-PCIE-16GB
compute capability        6.0
number of mutliprocessors 56 (unknown number of shader cores)
clock rate                1328MHz

got assignment: MM127, k range 562949953421312 to 1125899906842623 (178-bit factors)
Starting trial factoring of MM127 in k range: 562949953421312 to 1125899906842623 (178-bit factors)
k_min = 562949953421312
k_max = 1125899906842623
Using GPU kernel "mfaktc_barrett183_M127gs"
Verifying (2^(2^127)) % 191561943147467962859727723905659853364042304328803289 = 25168583490388808698318691898045119457541087143113062
ERROR: Exponentiation failure
Other mmff 0.28 version are the same (including the original version with some class problems unsolved and the version I posted). Possible bugs exist.
Attached Files logs.zip (387.6 KB, 115 views)

Last fiddled with by Fan Ming on 2020-01-26 at 12:58   Thread Tools Show Printable Version Email this Page Similar Threads Thread Thread Starter Forum Replies Last Post mathPuzzles Math 8 2017-04-21 07:21 Peter Hackman Factoring 7 2009-10-26 18:27 SPWorley Math 8 2009-08-24 23:26 SPWorley Factoring 7 2009-08-16 00:23 ewmayer Factoring 7 2008-12-11 22:12

All times are UTC. The time now is 18:40.

Sat Oct 16 18:40:03 UTC 2021 up 85 days, 13:09, 1 user, load averages: 1.07, 1.38, 1.36 