20211203, 22:36  #89  
P90 years forever!
Aug 2002
Yeehaw, FL
5·7·227 Posts 
Quote:
We create a polynomial with 120 coefficients that must be evaluated at multiples of D. Montgomery/Silverman/Kruppa show how to evaluate the polynomial at multiple points using polynomial multiplication. The 403 is the number of polynomial coefficients I can allocate for the second polynomial. FFT size and available memory dictate this number. A single polynomial multiply evaluates the first polynomial at 4032*120+1 points. Thus advancing toward B2 in steps of 1050 * 164 = 172200. 

20211203, 22:39  #90 
P90 years forever!
Aug 2002
Yeehaw, FL
5·7·227 Posts 
The number of transforms is only part of the stage 2 cost. The other significant cost is the polynomial multiplies. At present, there is no data output on the number of polymults or how expensive they were.

20211204, 02:24  #91  
Jun 2003
2×2,693 Posts 
Quote:
Gotcha. 

20211204, 11:43  #92 
"Seth"
Apr 2019
19·23 Posts 
With `MaxHighMemoryWorkers=1` 30.8v2 will resume two high memory workers at the same time.
Code:
$ cat worktodo.txt [Worker #1] Pminus1=1,2,50111,1,3000000,1000000000 [Worker #2] Pminus1=1,2,50227,1,6000000,10000000000 [Worker #3] Pminus1=1,2,50263,1,9000000,100000000000 Code:
five:~/Downloads/GIMPS/p95$ ./mprimev308b2 m d [Main thread Dec 4 03:39] Mersenne number primality test program version 30.8 [Main thread Dec 4 03:39] Optimizing for CPU architecture: AMD Zen, L2 cache size: 12x512 KB, L3 cache size: 4x16 MB Your choice: 4 Worker to start, 0=all (0): 0 Your choice: [Main thread Dec 4 03:39] Starting workers. [Worker #2 Dec 4 03:39] Waiting 5 seconds to stagger worker starts. [Worker #3 Dec 4 03:39] Waiting 10 seconds to stagger worker starts. [Worker #1 Dec 4 03:39] P1 on M50111 with B1=3000000, B2=1000000000 [Worker #2 Dec 4 03:39] P1 on M50227 with B1=6000000, B2=10000000000 [Worker #3 Dec 4 03:39] P1 on M50263 with B1=9000000, B2=100000000000 [Worker #1 Dec 4 03:39] M50111 stage 1 complete. 8656318 transforms. Total time: 22.501 sec. [Worker #1 Dec 4 03:39] Conversion of stage 1 result complete. 5 transforms, 1 modular inverse. Time: 0.002 sec. [Worker #1 Dec 4 03:39] Available memory is 7916MB. [Worker #1 Dec 4 03:39] Using 7916MB of memory. D: 510510, 46080x279844 polynomial multiplication. ... [Worker #2 Dec 4 03:40] M50227 stage 1 complete. 17311478 transforms. Total time: 45.504 sec. [Worker #2 Dec 4 03:40] Exceeded limit on number of workers that can use lots of memory. [Worker #2 Dec 4 03:40] Looking for work that uses less memory. [Worker #2 Dec 4 03:40] No work to do at the present time. Waiting. ... [Worker #3 Dec 4 03:40] M50263 stage 1 complete. 25971112 transforms. Total time: 68.424 sec. [Worker #3 Dec 4 03:40] Exceeded limit on number of workers that can use lots of memory. [Worker #3 Dec 4 03:40] Looking for work that uses less memory. [Worker #3 Dec 4 03:40] No work to do at the present time. Waiting. ... [Worker #1 Dec 4 03:41] Stage 2 GCD complete. Time: 0.001 sec. [Worker #1 Dec 4 03:41] M50111 completed P1, B1=3000000, B2=95867651880, Wi8: 53020C14 [Worker #1 Dec 4 03:41] No work to do at the present time. Waiting. [Worker #2 Dec 4 03:41] Restarting worker with new memory settings. [Worker #3 Dec 4 03:41] Restarting worker with new memory settings. [Worker #2 Dec 4 03:41] Resuming. [Worker #3 Dec 4 03:41] Resuming. ... [Worker #2 Dec 4 03:41] P1 on M50227 with B1=6000000, B2=10000000000 [Worker #3 Dec 4 03:41] P1 on M50263 with B1=9000000, B2=100000000000 Segmentation fault (core dumped) 
20211205, 02:29  #93  
Oct 2021
U. S. / Maine
2×73 Posts 
Quote:
By the logic of your suggestion, we might recompute the TF credit formula, since the current one is still from when TF was done by CPU even though today's TF is run on GPUs with vastly greater throughput. While superficially reasonable, this probably doesn't make sense because we can see that having "inflated" credit on offer incentivizes GPU owners to run the more efficient TF and not the less efficient primality testing. Last fiddled with by techn1ciaN on 20211205 at 02:30 Reason: Clarifying adjective 

20211205, 03:02  #94 
Aug 2002
Buenos Aires, Argentina
1454_{10} Posts 
It appears that 30.8 runs faster than previous versions on P1 not only when there are large amounts on RAM, but also on small exponents.
In my case (using 8GB of RAM in an I5 3470) Prime95 required 5 days to get the following: Code:
processing: P1 nofactor for M9325159 (B1=50,000,000, B2=50,001,265,860) CPU credit is 1312.7590 GHzdays. The difference between 1 hour and 5 days (to get half the credit) cannot be explained only by the amount of RAM in the system. 
20211205, 04:36  #95 
P90 years forever!
Aug 2002
Yeehaw, FL
1F09_{16} Posts 

20211205, 04:49  #96  
Jun 2003
2·2,693 Posts 
Quote:
Anyway, whenever you release build 3(?) (with this and other bug fixes), i'll switch over from build 1 which so far seems to be working fine for my use case. 

20211205, 04:55  #97 
P90 years forever!
Aug 2002
Yeehaw, FL
1111100001001_{2} Posts 
Build 3
This version adds SSE2, FMA, AVX512 support. Nonpoweroftwo FFTs in polymult. Stage 2 now takes advantage of an FFT's ability to do circular convolution. The upshot is stage 2 is now faster.
Fixed some bugs. Linux version required upgrade to GCC 8 for AVX512 support. This could pose GCC library issues for some users. To address the overaggressive B2 calculations, I added option Pm1CostFudge=n to prime.txt. Default value is 2.5. This option says multiple the stage 2 cost estimate by n. This option may disappear when I get around to writitng a more accurate costing function. Added Stage2ExtraThreads=n to prime.txt. Hyperthreading might help polymult. This gives polymult more threads to chew on. Untested. Highest priority next is save files, interruptability, some status reporting. And major bug fixes. Should you wish to try 30.8, same warnings as before. Links are below.
Windows 64bit: https://mersenne.org/ftp_root/gimps/p95v308b3.win64.zip Linux 64bit: https://mersenne.org/ftp_root/gimps/...linux64.tar.gz Last fiddled with by Prime95 on 20211205 at 05:57 
20211205, 07:04  #98 
Jun 2003
2·2,693 Posts 
Wow! 330s > 212s

20211205, 07:44  #99  
Oct 2021
Germany
167_{8} Posts 
Quote:
With this setting enabled, stage 2 went from 745s to 725s on a 25.6M exponent (B1/B2 = 700k and 450M). Stage 2 init went from ~90s to ~60s though Last fiddled with by Luminescence on 20211205 at 07:58 Reason: Last line 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Do not post your results here!  kar_bon  Prime Wiki  40  20220403 19:05 
what should I post ?  science_man_88  science_man_88  24  20181019 23:00 
Where to post job ad?  xilman  Linux  2  20101215 16:39 
Moderated Post  kar_bon  Forum Feedback  3  20100928 08:01 
Something that I just had to post/buy  dave_0273  Lounge  1  20050227 18:36 