mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Operazione Doppi Mersennes

Reply
 
Thread Tools
Old 2020-01-26, 12:30   #364
Fan Ming
 
Oct 2019

67 Posts
Default

Same problems for MM107, but MM89 is normal:
Code:
/content/drive/My Drive/mmff-test
mmff v0.28 (64bit built)

Compiletime options
  THREADS_PER_BLOCK         256
  MORE_CLASSES              enabled

Runtime options
  GPU Sieving               enabled
WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486)
  GPUSievePrimes            depends on worktodo entry
  GPUSieveSize              128M bits
WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8)
  GPUSieveProcessSize       8K bits
  WorkFile                  worktodo.txt
  Checkpoints               enabled
  CheckpointDelay           30s
  StopAfterFactor           class
  PrintMode                 full
  V5UserID                  (none)
  ComputerID                (none)
  GPUProgressHeader         "    class | candidates |    time |    ETA | raw  rate | SievePrimes | CPU wait"
  GPUProgressFormat            "%C/4620 |    %n | %ts | %e | %rM/s |     %s |  %W%%"
  TimeStampInResults        no

CUDA version info
  binary compiled for CUDA  10.10
  CUDA runtime version      10.10
  CUDA driver version       10.10

CUDA device info
  name                      Tesla P100-PCIE-16GB
  compute capability        6.0
  maximum threads per block 1024
  number of mutliprocessors 56 (unknown number of shader cores)
  clock rate                1328MHz

got assignment: MM107, k range 41400000000000 to 41500000000000 (154-bit factors)
Starting trial factoring of MM107 in k range: 41400G to 41500G (154-bit factors)
 k_min = 41400000000000
 k_max = 41500000000000
Using GPU kernel "mfaktc_barrett160_M107gs"
Verifying (2^(2^107)) % 13435069371854815219033511685499715361952762321 = 974520303404695347505301237807931102140431668099
ERROR: Verifying on CPU failed.	Remainder didn't match. Possible problems exist.
MM89 works properly:
Code:
/content/drive/My Drive/mmff-test
mmff v0.28 (64bit built)

Compiletime options
  THREADS_PER_BLOCK         256
  MORE_CLASSES              enabled

Runtime options
  GPU Sieving               enabled
WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486)
  GPUSievePrimes            depends on worktodo entry
  GPUSieveSize              128M bits
WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8)
  GPUSieveProcessSize       8K bits
  WorkFile                  worktodo.txt
  Checkpoints               enabled
  CheckpointDelay           30s
  StopAfterFactor           class
  PrintMode                 full
  V5UserID                  (none)
  ComputerID                (none)
  GPUProgressHeader         "    class | candidates |    time |    ETA | raw  rate | SievePrimes | CPU wait"
  GPUProgressFormat            "%C/4620 |    %n | %ts | %e | %rM/s |     %s |  %W%%"
  TimeStampInResults        no

CUDA version info
  binary compiled for CUDA  10.10
  CUDA runtime version      10.10
  CUDA driver version       10.10

CUDA device info
  name                      Tesla P100-PCIE-16GB
  compute capability        6.0
  maximum threads per block 1024
  number of mutliprocessors 56 (unknown number of shader cores)
  clock rate                1328MHz

got assignment: MM89, k range 41400000000000 to 41500000000000 (136-bit factors)
Starting trial factoring of MM89 in k range: 41400G to 41500G (136-bit factors)
 k_min = 41400000000000
 k_max = 41500000000000
Using GPU kernel "mfaktc_barrett140_M89gs"
Verifying (2^(2^89)) % 51250722476366711691515168579592911982721 = 37671549122511752130292866601915335328068
    class | candidates |    time |    ETA | raw  rate | SievePrimes | CPU wait
   0/4620 |     21.65M |  0.029s |   n.a. | 746.60M/s |      649781 |   n.a.%
Verifying (2^(2^89)) % 51250720280168236496304157387929107838071 = 35746096159163930640949829473693574340078
   5/4620 |     21.65M |  0.029s |   n.a. | 746.60M/s |      649781 |   n.a.%
Verifying (2^(2^89)) % 51250719954174058311049257317093331713479 = 22759295645343611258946139802672470959760
   9/4620 |     21.65M |  0.029s |   n.a. | 746.60M/s |      649781 |   n.a.%
Verifying (2^(2^89)) % 51250720852115103746739125095447985265401 = 41842644712508723081556126612950349320116
  20/4620 |     21.65M |  0.028s |   n.a. | 773.27M/s |      649781 |   n.a.%
Verifying (2^(2^89)) % 51250721744324486800537682201019693669463 = 13062456361537928045073778273891658192745
  21/4620 |     21.65M |  0.028s |   n.a. | 773.27M/s |      649781 |   n.a.%
Verifying (2^(2^89)) % 51250722110368501136753204925391936624199 = 11766302253559315831356912138896967481965
  29/4620 |     21.65M |  0.028s |   n.a. | 773.27M/s |      649781 |   n.a.%
Verifying (2^(2^89)) % 51250720709149122429788413288172826239287 = 14816860850408810792926573186880149802296
  33/4620 |     21.65M |  0.028s |   n.a. | 773.27M/s |      649781 |   n.a.%
Verifying (2^(2^89)) % 51250721052309815139813681631034757950353 = 41152310359413274585223328516751757168125
  36/4620 |     21.65M |  0.028s |   n.a. | 773.27M/s |      649781 |   n.a.%
Verifying (2^(2^89)) % 51250721263933188975570868864490245452809 = 44183317763900802218115380969512121058940
  44/4620 |     21.65M |  0.027s |   n.a. | 801.91M/s |      649781 |   n.a.%
Verifying (2^(2^89)) % 51250721378323800365697147786268920062497 = 18692344536121868666837048177982998180467
  48/4620 |     21.65M |  0.027s |   n.a. | 801.91M/s |      649781 |   n.a.%
Verifying (2^(2^89)) % 51250721395487839010388945297745277400527 = 3166578919721146857552725561773689514712
  53/4620 |     21.65M |  0.026s |   n.a. | 832.75M/s |      649781 |   n.a.%
Verifying (2^(2^89)) % 51250722287699697944266073163866784053033 = 37430319078903975242289720426417282202568
  56/4620 |     21.65M |  0.027s |   n.a. | 801.91M/s |      649781 |   n.a.%
Verifying (2^(2^89)) % 51250721481285749313140796010178879854681 = 13236591153213340344689881456839734478969
  60/4620 |     21.65M |  0.026s |   n.a. | 832.75M/s |      649781 |   n.a.%
Verifying (2^(2^89)) % 51250721086661413289943698879210555986631 = 29373416315097083858424261053021555658515
  65/4620 |     21.65M |  0.026s |   n.a. | 832.75M/s |      649781 |   n.a.%
Verifying (2^(2^89)) % 51250720960840901517095503879288267435217 = 28658988341202110234172669662839524833844
  68/4620 |     21.65M |  0.026s |   n.a. | 832.75M/s |      649781 |   n.a.%
...
Fan Ming is offline   Reply With Quote
Old 2020-02-25, 20:06   #365
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

32×383 Posts
Default GPUSieveSize limit

Various builds of mmff v0.28 have been posted. Do any of these support GPUSieveSize from 128 to 2047, like the recent increase in mfaktc? There seems to be an advantage all the way up to 128 and a bit of underutilization left yet there, on a GTX1650, and there likely is on other fast gpus also.

win 7 x64 gtx1650 mmff tune
mm127, 120000T to 120500T

GPUSievePrimes 810549 GPUSieveSize 16 GpuSieveProcessSize 32 367.75 66W 95% utilization
GPUSievePrimes 810549 GPUSieveSize 32 GpuSieveProcessSize 32 380.41
GPUSievePrimes 810549 GPUSieveSize 64 GpuSieveProcessSize 32 387.10 99%
GPUSievePrimes 810549 GPUSieveSize 128 GpuSieveProcessSize 32 389.59 * 66W 99%
GPUSievePrimes 810549 GPUSieveSize 256 GpuSieveProcessSize 32 GPUSieveSize capped at 128
kriesel is offline   Reply With Quote
Old 2020-02-26, 02:02   #366
Fan Ming
 
Oct 2019

67 Posts
Default

Quote:
Originally Posted by kriesel View Post
Various builds of mmff v0.28 have been posted. Do any of these support GPUSieveSize from 128 to 2047, like the recent increase in mfaktc? There seems to be an advantage all the way up to 128 and a bit of underutilization left yet there, on a GTX1650, and there likely is on other fast gpus also.
I've ever tried to enlarge the upper limit to 2047, however, the speed gain seems no significant. I experimented it on colab T4.
Fan Ming is offline   Reply With Quote
Old 2020-02-26, 13:55   #367
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

32·383 Posts
Default

Quote:
Originally Posted by Fan Ming View Post
I've ever tried to enlarge the upper limit to 2047, however, the speed gain seems no significant. I experimented it on colab T4.
Thanks for your response. Please post any T4 throughput data versus GPUSieveSize that you have collected.
It appears to me after graphing the GTX1650 data I've collected, to offer about 0.6% additional throughput on that gpu model, or 2 to 2.5 days per year, depending on a 2047 or 4095 revised limit. Based on mfaktc experience, the effect is likely larger for faster gpus, and there are considerably faster than the GTX1650, such as the RTX2080 and similar, or the Tesla T4.
Attached Files
File Type: pdf mmff v0.28 tune for gtx1650.pdf (11.6 KB, 15 views)

Last fiddled with by kriesel on 2020-02-26 at 13:59
kriesel is offline   Reply With Quote
Old 2020-02-26, 14:22   #368
Fan Ming
 
Oct 2019

10000112 Posts
Default

Quote:
Originally Posted by kriesel View Post
Thanks for your response. Please post any T4 throughput data versus GPUSieveSize that you have collected.
It appears to me after graphing the GTX1650 data I've collected, to offer about 0.6% additional throughput on that gpu model, or 2 to 2.5 days per year, depending on a 2047 or 4095 revised limit. Based on mfaktc experience, the effect is likely larger for faster gpus, and there are considerably faster than the GTX1650, such as the RTX2080 and similar, or the Tesla T4.
Sorry I didn't keep the detailed data. I tested MM89, and Raw rate is about 1340? when GPUSieveSize is 128, and still ~1340 when GPUSieveSize is 2047. Since the change was not too significant, I'm not impressed with that and didn't keep the data.

Last fiddled with by Fan Ming on 2020-02-26 at 14:23
Fan Ming is offline   Reply With Quote
Old 2020-02-26, 19:23   #369
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

32×383 Posts
Default 2047 GPUSieveSize limit Windows build requested

Please make and post a Windows 7 x64 through Windows 10 x64 CUDA 10.x compatible build allowing GPUSieveSize up to 2047. Switching to unsigned int for 4095 would be more work.
kriesel is offline   Reply With Quote
Old 2020-02-27, 09:39   #370
Fan Ming
 
Oct 2019

67 Posts
Default

Compiled fixed mmff 0.28 (in this post: https://www.mersenneforum.org/showpo...&postcount=360) CUDA 10.1 version for Windows 64bit using Microsoft Visual Studio 2012. This time all test cases should pass now(though some Exp failure problem described in this post: https://www.mersenneforum.org/showpo...&postcount=362 still remain unsolved for specific card). The 2047 version will be posted later.
Attached Files
File Type: zip mmff-win-64.zip (2.18 MB, 13 views)

Last fiddled with by Fan Ming on 2020-02-27 at 09:46
Fan Ming is offline   Reply With Quote
Old 2020-02-27, 09:44   #371
Fan Ming
 
Oct 2019

67 Posts
Default

Compiled fixed mmff 0.28 for Windows 64 with GPUSievesizemax enlarged to 2047. It seems some code in the gpusieve.cu require to negate the GPUSievesize and involves arithmetic for signed 32 bit integer, so I didn't make change for further 4095. Only 2047 version here.
Attached Files
File Type: zip mmff-win-64_2047.zip (2.18 MB, 12 views)
Fan Ming is offline   Reply With Quote
Old 2020-02-28, 22:25   #372
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

32×383 Posts
Default Going to 2047

Thanks for the builds, Fan Ming!

As before, Win7x64, GTX1650, etc
128-2047 variation tune feb 28:
Code:
GPUSievePrimes 810549 GPUSieveSize 128 GpuSieveProcessSize 32 384.14 62W/75 99%
GPUSievePrimes 810549 GPUSieveSize 256 GpuSieveProcessSize 32 386.14 66w 100%
GPUSievePrimes 810549 GPUSieveSize 512 GpuSieveProcessSize 32 386.24 65w 100%
GPUSievePrimes 810549 GPUSieveSize 1024 GpuSieveProcessSize 32 386.65 63w 100%
GPUSievePrimes 810549 GPUSieveSize 2047 GpuSieveProcessSize 32 386.66 *
 386.66/384.14= 1.00656 gain from 2047 over 128 GPUSieveSize
I would expect somewhat more gain than that ratio, on faster gpus.
kriesel is offline   Reply With Quote
Old 2020-03-24, 00:26   #373
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

65678 Posts
Default Build request

For mmff v0.28, I see here,
CUDA ? OS? source only? https://mersenneforum.org/showpost.p...&postcount=317
CUDA 6 Win x86 and x64 https://mersenneforum.org/mmff/
CUDA 8.0 linux https://mersenneforum.org/showpost.p...&postcount=329
CUDA 8.0 linux https://mersenneforum.org/showpost.p...&postcount=331
CUDA 8.0 linux x64 https://mersenneforum.org/showpost.p...&postcount=333
CUDA 10. win 64 https://mersenneforum.org/showpost.p...&postcount=335
CUDA 10.1 linux https://mersenneforum.org/showpost.p...&postcount=360
CUDA 10.1 Win https://mersenneforum.org/showpost.p...&postcount=370
CUDA 10.1 GpuSieveSize 2047 max Win https://mersenneforum.org/showpost.p...&postcount=371

Could we also get a CUDA 8.0 Win 64 build with GpuSieveSize 2047 max, posted here? That would suit GTX10xx.
kriesel is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Mersenne trial division implementation mathPuzzles Math 8 2017-04-21 07:21
trial division over a factor base Peter Hackman Factoring 7 2009-10-26 18:27
P95 trial division strategy SPWorley Math 8 2009-08-24 23:26
Trial division software for Mersenne SPWorley Factoring 7 2009-08-16 00:23
Need GMP trial-division timings ewmayer Factoring 7 2008-12-11 22:12

All times are UTC. The time now is 22:30.

Fri Apr 3 22:30:59 UTC 2020 up 9 days, 20:04, 1 user, load averages: 1.56, 1.57, 1.44

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.