View Single Post
Old 2020-01-23, 07:04   #360
Fan Ming
 
Oct 2019

5F16 Posts
Default

Thanks for clues provided by Andreas!
The class problems and exp failure problems are indeed solved for mmff now, I post the source code here because I also did some other minor changes and still some problems with Windows binary.
Attached file contains CUDA 10.1 binary compiled for linux-64bit and the source code. The code is based on 0.28 version, and the compiled binary can be used on Google colab.
Note that changes for tf to fix the class problems in source codes are made before I saw the source files posted by Gary (I haven't check now), so notice me If I did some flaky/bad changes.

Minor changes:
(1) Fixed the class problems & exp failure caused by tf validate by set RES[RESULTS_ARRAY_VALIDATION_OFFSET] = 0 and do not copy other values if no candidate survives. If RES[RESULTS_ARRAY_VALIDATION_OFFSET] == 0 then just do not call validate function. Note I think that the "ERROR: Exponentiation failure" error message is somewhat unclear, so I changed it to : "ERROR: Verifying on CPU failed. Remainder didn\'t match. Possible problems exist." Please notice me if my understanding is incorrect.
(2) Replaced all deprecated cudaThreadSynchronize() functions with cudaDeviceSynchronize() funtions in case they are not supported in the future.
(3) In gpusieve.cu, the launch bounds for many functions are:
Code:
__global__ static void __launch_bounds__(256,6) blablabla....
However, the maximum number of threads per stream multiprocessor for Turing cards (CC 7.5) are 1024 instead of 2048 of all previous cards. Since it's lower bound setting, this will cause overflow for Turing cards so the second parameter setting is ignored when compiling for Turing CC7.5 architecture using NVCC. I don't know if this lower bound setting is necessary, but I still changed all these launch bounds settings to:
Code:
#if __CUDA_ARCH__ < 750
__global__ static void __launch_bounds__(256,6) blablabla...
#else
__global__ static void __launch_bounds__(256,3) blablabla...
Notice me if this change is incorrect.
(4) Minor format reading problems fixes.

The compiled binary for linux passed all 41 test cases provided by ATH:
Code:
FermatFactor=36,2e10,3e10		# F28: 25709319373 * 2^36 + 1
FermatFactor=33,546e10,547e10		# F31: 5463561471303 * 2^33 + 1
FermatFactor=39,69,70			# F37: 1275438465 * 2^39 + 1
FermatFactor=41,286492e10,286493e10	# F39: 2864929972774011 * 2^41 + 1
FermatFactor=45,11131e10,11132e10	# F42: 111318179143061 * 2^45 + 1
FermatFactor=45,21e10,22e10		# F43: 212675402445 * 2^45 + 1
FermatFactor=50,213e10,214e10		# F48: 2139543641769 * 2^50 + 1
FermatFactor=54,66,67			# F52: 4119 * 2^54 + 1
FermatFactor=54,78,79			# F52: 21626655 * 2^54 + 1
FermatFactor=54,8190e10,8191e10		# F52: 81909357657279 * 2^54 + 1
FermatFactor=61,67,68			# F58: 95 * 2^61 + 1
FermatFactor=68,121089e10,121090e10	# F65: 1210895760431083 * 2^68 + 1
FermatFactor=74,100,101			# F72: 76432329 * 2^74 + 1
FermatFactor=77,98,99			# F75: 3447431 * 2^77 + 1
FermatFactor=79,5e9,6e9			# F77: 5940341195 * 2^79 + 1
FermatFactor=87,1595e9,1596e9		# F83: 1595863660157 * 2^87 + 1
FermatFactor=88,20018e9,20019e9		# F86: 20018578522347 * 2^88 + 1
FermatFactor=90,119e9,120e9		# F88: 119942751127 * 2^90 + 1
FermatFactor=92,198e9,199e9		# F90: 198922467387 * 2^92 + 1
FermatFactor=93,103,104			# F91: 1421 * 2^93 + 1
FermatFactor=97,482e9,483e9		# F94: 482524552001 * 2^97 + 1
FermatFactor=101,3334e9,3335e9		# F96: 3334131633063 * 2^101 + 1
FermatFactor=111,141,142		# F107: 1289179925 * 2^111 + 1
FermatFactor=120,3e9,4e9		# F116: 3433149787 * 2^120 + 1
FermatFactor=124,146,147		# F122: 5234775 * 2^124 + 1
FermatFactor=127,129,130		# F125: 5 * 2^127 + 1
FermatFactor=135,88e9,89e9		# F133: 88075576149 * 2^135 + 1
FermatFactor=145,167,168		# F142: 8152599 * 2^145 + 1
FermatFactor=148,173,174		# F146: 37092477 * 2^148 + 1
FermatFactor=149,160,161		# F147: 3125 * 2^149 + 1
FermatFactor=149,175,176		# F147: 124567335 * 2^149 + 1
FermatFactor=154,166,167		# F150: 5439 * 2^154 + 1
FermatFactor=157,167,168		# F150: 1575 * 2^157 + 1
FermatFactor=167,197,198		# F164: 1835601567 * 2^167 + 1
FermatFactor=171,2674e9,2675e9		# F166: 2674670937447 * 2^171 + 1
FermatFactor=174,20e9,21e9		# F172: 20569603303 * 2^174 + 1
FermatFactor=180,3e8,4e8		# F178: 313047661 * 2^180 + 1
FermatFactor=187,213,214		# F184: 117012935 * 2^187 + 1
FermatFactor=197,48594e9,48596e9	# F195:	48595346636925 * 2^197 + 1
FermatFactor=207,224,227		# F205:	232905 * 2^207 + 1
FermatFactor=217,231,232		# F215: 32111 * 2^217 + 1
Result:
Code:
F28 has a factor: 1766730974551267606529 [TF:70:71*:mmff 0.28 mfaktc_barrett89_F32_63gs]
found 1 factor for k*2^36+1 in k range: 20G to 30G (71-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_F32_63gs]
F31 has a factor: 46931635677864055013377 [TF:75:76*:mmff 0.28 mfaktc_barrett89_F32_63gs]
found 1 factor for k*2^33+1 in k range: 5460G to 5470G (76-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_F32_63gs]
F37 has a factor: 701179711390136401921 [TF:69:70*:mmff 0.28 mfaktc_barrett89_F32_63gs]
found 1 factor for k*2^39+1 in k range: 1073741824 to 2147483647 (70-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_F32_63gs]
F39 has a factor: 6300047635658008393597059073 [TF:92:93*:mmff 0.28 mfaktc_barrett96_F32_63gs]
found 1 factor for k*2^41+1 in k range: 2864920G to 2864930G (93-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett96_F32_63gs]
F42 has a factor: 3916660235220715932328394753 [TF:91:92*:mmff 0.28 mfaktc_barrett96_F32_63gs]
found 1 factor for k*2^45+1 in k range: 111310G to 111320G (92-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett96_F32_63gs]
F43 has a factor: 7482850493766970889994241 [TF:82:83*:mmff 0.28 mfaktc_barrett89_F32_63gs]
found 1 factor for k*2^45+1 in k range: 210G to 220G (83-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_F32_63gs]
F48 has a factor: 2408911986953445595315961857 [TF:90:91*:mmff 0.28 mfaktc_barrett96_F32_63gs]
found 1 factor for k*2^50+1 in k range: 2130G to 2140G (91-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett96_F32_63gs]
F52 has a factor: 74201307460556292097 [TF:66:67*:mmff 0.28 mfaktc_barrett89_F32_63gs]
found 1 factor for k*2^54+1 in k range: 4096 to 8191 (67-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_F32_63gs]
F52 has a factor: 389591181597081096683521 [TF:78:79*:mmff 0.28 mfaktc_barrett89_F32_63gs]
found 1 factor for k*2^54+1 in k range: 16777216 to 33554431 (79-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_F32_63gs]
F52 has a factor: 1475547810493913550438096961537 [TF:100:101*:mmff 0.28 mfaktc_barrett108_F32_63gs]
found 1 factor for k*2^54+1 in k range: 81900G to 81910G (101-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett108_F32_63gs]
F58 has a factor: 219055085875300925441 [TF:67:68*:mmff 0.28 mfaktc_barrett89_F32_63gs]
found 1 factor for k*2^61+1 in k range: 64 to 127 (68-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_F32_63gs]
F65 has a factor: 357393347081793620781479724788482049 [TF:118:119*:mmff 0.28 mfaktc_barrett120_F64_95gs]
found 1 factor for k*2^68+1 in k range: 1210890G to 1210900G (119-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett120_F64_95gs]
F72 has a factor: 1443765874709062348345951911937 [TF:100:101*:mmff 0.28 mfaktc_barrett108_F64_95gs]
found 1 factor for k*2^74+1 in k range: 67108864 to 134217727 (101-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett108_F64_95gs]
F75 has a factor: 520961043404985083798310879233 [TF:98:99*:mmff 0.28 mfaktc_barrett108_F64_95gs]
found 1 factor for k*2^77+1 in k range: 2097152 to 4194303 (99-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett108_F64_95gs]
F77 has a factor: 3590715923977960355577974656860161 [TF:111:112*:mmff 0.28 mfaktc_barrett120_F64_95gs]
found 1 factor for k*2^79+1 in k range: 5G to 6G (112-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett120_F64_95gs]
F83 has a factor: 246947940268608417020015902258307792897 [TF:127:128*:mmff 0.28 mfaktc_barrett128_F64_95gs]
found 1 factor for k*2^87+1 in k range: 1595G to 1596G (128-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett128_F64_95gs]
F86 has a factor: 6195449970597928748332522715641578258433 [TF:132:133*:mmff 0.28 mfaktc_barrett140_F64_95gs]
found 1 factor for k*2^88+1 in k range: 20018G to 20019G (133-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett140_F64_95gs]
F88 has a factor: 148481934042154969241780501829489000449 [TF:126:127*:mmff 0.28 mfaktc_barrett128_F64_95gs]
found 1 factor for k*2^90+1 in k range: 119G to 120G (127-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett128_F64_95gs]
F90 has a factor: 985016348367230226078056532654006730753 [TF:129:130*:mmff 0.28 mfaktc_barrett140_F64_95gs]
found 1 factor for k*2^92+1 in k range: 198G to 199G (130-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett140_F64_95gs]
F91 has a factor: 14072902366596202965053244178433 [TF:103:104*:mmff 0.28 mfaktc_barrett108_F64_95gs]
found 1 factor for k*2^93+1 in k range: 1024 to 2047 (104-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett108_F64_95gs]
F94 has a factor: 76459067246115642538831634131564386844673 [TF:135:136*:mmff 0.28 mfaktc_barrett140_F96_127gs]
found 1 factor for k*2^97+1 in k range: 482G to 483G (136-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett140_F96_127gs]
F96 has a factor: 8453027931784477309850388309101819121893377 [TF:142:143*:mmff 0.28 mfaktc_barrett152_F96_127gs]
found 1 factor for k*2^101+1 in k range: 3334G to 3335G (143-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett152_F96_127gs]
F107 has a factor: 3346902437331832346018436558958369334886401 [TF:141:142*:mmff 0.28 mfaktc_barrett152_F96_127gs]
found 1 factor for k*2^111+1 in k range: 1073741824 to 2147483647 (142-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett152_F96_127gs]
F116 has a factor: 4563438810603420826872624280490561141381005313 [TF:151:152*:mmff 0.28 mfaktc_barrett152_F96_127gs]
found 1 factor for k*2^120+1 in k range: 3G to 4G (152-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett152_F96_127gs]
F122 has a factor: 111331351706159727817280425663664652445286401 [TF:146:147*:mmff 0.28 mfaktc_barrett152_F96_127gs]
found 1 factor for k*2^124+1 in k range: 4194304 to 8388607 (147-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett152_F96_127gs]
F125 has a factor: 850705917302346158658436518579420528641 [TF:129:130*:mmff 0.28 mfaktc_barrett140_F96_127gs]
found 1 factor for k*2^127+1 in k range: 4 to 7 (130-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett140_F96_127gs]
F133 has a factor: 3836232386548105510567872577199319351015739156856833 [TF:171:172*:mmff 0.28 mfaktc_barrett172_F128_159gs]
found 1 factor for k*2^135+1 in k range: 88G to 89G (172-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett172_F128_159gs]
F142 has a factor: 363618066009591119386121910507749518730588867002369 [TF:167:168*:mmff 0.28 mfaktc_barrett172_F128_159gs]
found 1 factor for k*2^145+1 in k range: 4194304 to 8388607 (168-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett172_F128_159gs]
F146 has a factor: 13235038053749721162769301995307025251972223086886913 [TF:173:174*:mmff 0.28 mfaktc_barrett183_F128_159gs]
found 1 factor for k*2^148+1 in k range: 33554432 to 67108863 (174-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett183_F128_159gs]
F147 has a factor: 2230074519853062314153571827264836150598041600001 [TF:160:161*:mmff 0.28 mfaktc_barrett172_F128_159gs]
found 1 factor for k*2^149+1 in k range: 2048 to 4095 (161-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett172_F128_159gs]
F147 has a factor: 88894220732640180500173831441107513117330143465963521 [TF:175:176*:mmff 0.28 mfaktc_barrett183_F128_159gs]
found 1 factor for k*2^149+1 in k range: 67108864 to 134217727 (176-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett183_F128_159gs]
F150 has a factor: 124204803210043452689216278205372864748572142206977 [TF:166:167*:mmff 0.28 mfaktc_barrett172_F128_159gs]
found 1 factor for k*2^154+1 in k range: 4096 to 8191 (167-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett172_F128_159gs]
F150 has a factor: 287733134849521512021350451441018219494761719398401 [TF:167:168*:mmff 0.28 mfaktc_barrett172_F128_159gs]
found 1 factor for k*2^157+1 in k range: 1024 to 2047 (168-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett172_F128_159gs]
F164 has a factor: 343390041044181900054983258125842173093877961821829176754177 [TF:197:198*:mmff 0.28 mfaktc_barrett204_F160_191gs]
found 1 factor for k*2^167+1 in k range: 1073741824 to 2147483647 (198-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett204_F160_191gs]
F166 has a factor: 8005705634611551271269985633916919970948098093294822472135213057 [TF:212:213*:mmff 0.28 mfaktc_barrett215_F160_191gs]
found 1 factor for k*2^171+1 in k range: 2674G to 2675G (213-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett215_F160_191gs]
F172 has a factor: 492544145925433733451855533863925475950550777193174123310743553 [TF:208:209*:mmff 0.28 mfaktc_barrett215_F160_191gs]
found 1 factor for k*2^174+1 in k range: 20G to 21G (209-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett215_F160_191gs]
F178 has a factor: 479744144560996421795040836675707785358665797968769873751310337 [TF:208:209*:mmff 0.28 mfaktc_barrett215_F160_191gs]
found 1 factor for k*2^180+1 in k range: 300M to 400M (209-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett215_F160_191gs]
F184 has a factor: 22953190542224652377639611826608942557783370967811443134226759681 [TF:213:214*:mmff 0.28 mfaktc_barrett215_F160_191gs]
found 1 factor for k*2^187+1 in k range: 67108864 to 134217727 (214-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett215_F160_191gs]
F195 has a factor: 9761213910603494986281795830720869047027739722070601061612088452553113601 [TF:242:243*:mmff 0.28 mfaktc_barrett247_F192_223gs]
found 1 factor for k*2^197+1 in k range: 48594G to 48596G (243-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett247_F192_223gs]
F205 has a factor: 47905779865361936656012887182939964920375512098173614759150973091841 [TF:224:225*:mmff 0.28 mfaktc_barrett236_F192_223gs]
found 1 factor for k*2^207+1 in k range: 131072 to 262143 (225-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett236_F192_223gs]
F215 has a factor: 6763365995538079644113691573900682504384080816814065022974359599316993 [TF:231:232*:mmff 0.28 mfaktc_barrett236_F192_223gs]
found 1 factor for k*2^217+1 in k range: 16384 to 32767 (232-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett236_F192_223gs]
Some double mersennes test cases:
Code:
MMFactor=31,64,65
MMFactor=61,549e9,550e9
MMFactor=31,56e9,57e9
MMFactor=31,54e9,55e9
MMFactor=31,414.5e11,415e11
MMFactor=31,414e11,415e11
MMFactor=31,416e11,417e11
The results are as expected without problems:
Code:
no factor for MM31 in k range: 4294967298 to 8589934595 (65-bit factors) [mmff 0.28 mfaktc_barrett89_M31gs]
no factor for MM61 in k range: 549000000000 to 549755813887 (101-bit factors) [mmff 0.28 mfaktc_barrett108_M61gs]
no factor for MM61 in k range: 549755813888 to 550000000000 (102-bit factors) [mmff 0.28 mfaktc_barrett108_M61gs]
MM31 has a factor: 242557615644693265201 [TF:67:68*:mmff 0.28 mfaktc_barrett89_M31gs]
found 1 factor for MM31 in k range: 56G to 57G (68-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_M31gs]
no factor for MM31 in k range: 54G to 55G (68-bit factors) [mmff 0.28 mfaktc_barrett89_M31gs]
no factor for MM31 in k range: 41450G to 41500G (78-bit factors) [mmff 0.28 mfaktc_barrett89_M31gs]
MM31 has a factor: 178021379228511215367151 [TF:77:78*:mmff 0.28 mfaktc_barrett89_M31gs]
found 1 factor for MM31 in k range: 41400G to 41500G (78-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_M31gs]
no factor for MM31 in k range: 41600G to 41700G (78-bit factors) [mmff 0.28 mfaktc_barrett89_M31gs]
However, when it was compiled for Windows using Visual Studio 2019 it still failed to run (but was not class problems, etc.):
Code:
mmff v0.28 (64bit built)

Compiletime options
  THREADS_PER_BLOCK         256
  MORE_CLASSES              enabled

Runtime options
  GPU Sieving               enabled
WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486)
  GPUSievePrimes            depends on worktodo entry
  GPUSieveSize              16M bits
WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8)
  GPUSieveProcessSize       8K bits
  WorkFile                  worktodo.txt
  Checkpoints               enabled
  CheckpointDelay           30s
  StopAfterFactor           class
  PrintMode                 full
  V5UserID                  (none)
  ComputerID                (none)
WARNING, no ProgressFormat specified in mmff.ini, using default
  TimeStampInResults        no

CUDA version info
  binary compiled for CUDA  10.10
  CUDA runtime version      10.10
  CUDA driver version       10.20

CUDA device info
  name                      GeForce GTX 1660
  compute capability        7.5
  maximum threads per block 1024
  number of mutliprocessors 22 (unknown number of shader cores)
  clock rate                1800MHz

got assignment: MM127, k range 116500000000000000 to 117000000000000000 (185-bit
 factors)
Starting trial factoring of MM127 in k range: 116500T to 117000T (185-bit factor
s)
 k_min = 116500000000000000
 k_max = 117000000000000000
Using GPU kernel "mfaktc_barrett185_M127gs"
    class | candidates |    time |    ETA | raw  rate | SievePrimes | CPU wait
   5/4620 |    108.23G | 11.263s |  3h00m | 9608.91M/s |      810549
ERROR: cudaGetLastError() returned 98: invalid device function
The invalid device function error is usually problems when a kernel was not compiled for correct CC architecture or didn't exist.
I tried to get the attribute for target kernel but it also returns error. So that's not because kernels are not compiled with correct CC architecture, but didn't exist.
I wrote a test kernel and it also raised the same problem. Yes, the program failed to recognize it, simply thought it didn't exist (and will not be executed).
I don't know what went wrong for MSVC 2019 compiler to cause the programs can't recongnize the existance of any kernels, since the file size are normal. The older Visual Studio 2012 version should work, but I haven't use it now since I already uninstalled it.
However, some problems must existed since all newer versions of MSVC compiler (2017 or later, I don't know about 2013 or 2015) can cause the problem.
I really have no idea about that...
The compiling process for Windows binary using Visual Studio follows the post by TheJudger somewhere in the forum.
I haven't test the normal CUDA compiling process using Visual Studio, since it needs to adjust some including relations of header files in many source files of mmff, which is a little bit unconvenient.
Attached Files
File Type: zip mmff-linux-CUDA10.1-colab.zip (3.04 MB, 211 views)

Last fiddled with by Fan Ming on 2020-01-23 at 07:33
Fan Ming is offline   Reply With Quote