mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2021-12-03, 22:36   #89
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

2·4,013 Posts
Default

Quote:
Originally Posted by R. Gerbicz View Post
What is the third number on this line: "D: 1050, 120x403 polynomial multiplication."
just guessing that the 2nd is eulerphi(1050)/2=120, but what is the 403 ?
D is the traditional step size incrementing from B1 to B2. 120 is eulerphi(1050)/2.
We create a polynomial with 120 coefficients that must be evaluated at multiples of D.

Montgomery/Silverman/Kruppa show how to evaluate the polynomial at multiple points using polynomial multiplication.

The 403 is the number of polynomial coefficients I can allocate for the second polynomial. FFT size and available memory dictate this number.

A single polynomial multiply evaluates the first polynomial at 403-2*120+1 points. Thus advancing toward B2 in steps of 1050 * 164 = 172200.
Prime95 is offline   Reply With Quote
Old 2021-12-03, 22:39   #90
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

2×4,013 Posts
Default

Quote:
Originally Posted by axn View Post
Something's not quite right here. The 24GB option shows about 20% less transforms, yet sees no significant improvement in elapsed time.
The number of transforms is only part of the stage 2 cost. The other significant cost is the polynomial multiplies. At present, there is no data output on the number of polymults or how expensive they were.
Prime95 is offline   Reply With Quote
Old 2021-12-04, 02:24   #91
axn
 
axn's Avatar
 
Jun 2003

22·7·193 Posts
Default

Quote:
Originally Posted by petrw1 View Post
Would folder 1 prime.txt have UsePrimenet=0?
I'd prefer it send in both stages as 1 result.
Sure. I, in fact, use that setting in /both/ folders, and report them manually.

Quote:
Originally Posted by Prime95 View Post
The number of transforms is only part of the stage 2 cost. The other significant cost is the polynomial multiplies. At present, there is no data output on the number of polymults or how expensive they were.
Gotcha.
axn is offline   Reply With Quote
Old 2021-12-04, 11:43   #92
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

1B516 Posts
Default

With `MaxHighMemoryWorkers=1` 30.8v2 will resume two high memory workers at the same time.

Code:
$ cat worktodo.txt

[Worker #1]
Pminus1=1,2,50111,-1,3000000,1000000000

[Worker #2]
Pminus1=1,2,50227,-1,6000000,10000000000

[Worker #3]
Pminus1=1,2,50263,-1,9000000,100000000000
Code:
five:~/Downloads/GIMPS/p95$ ./mprimev308b2 -m -d
[Main thread Dec 4 03:39] Mersenne number primality test program version 30.8
[Main thread Dec 4 03:39] Optimizing for CPU architecture: AMD Zen, L2 cache size: 12x512 KB, L3 cache size: 4x16 MB
Your choice: 4
Worker to start, 0=all (0): 0
Your choice: [Main thread Dec 4 03:39] Starting workers.
[Worker #2 Dec 4 03:39] Waiting 5 seconds to stagger worker starts.
[Worker #3 Dec 4 03:39] Waiting 10 seconds to stagger worker starts.
[Worker #1 Dec 4 03:39] P-1 on M50111 with B1=3000000, B2=1000000000
[Worker #2 Dec 4 03:39] P-1 on M50227 with B1=6000000, B2=10000000000
[Worker #3 Dec 4 03:39] P-1 on M50263 with B1=9000000, B2=100000000000
[Worker #1 Dec 4 03:39] M50111 stage 1 complete. 8656318 transforms. Total time: 22.501 sec.
[Worker #1 Dec 4 03:39] Conversion of stage 1 result complete. 5 transforms, 1 modular inverse. Time: 0.002 sec.
[Worker #1 Dec 4 03:39] Available memory is 7916MB.
[Worker #1 Dec 4 03:39] Using 7916MB of memory.  D: 510510, 46080x279844 polynomial multiplication.
...
[Worker #2 Dec 4 03:40] M50227 stage 1 complete. 17311478 transforms. Total time: 45.504 sec.
[Worker #2 Dec 4 03:40] Exceeded limit on number of workers that can use lots of memory.
[Worker #2 Dec 4 03:40] Looking for work that uses less memory.
[Worker #2 Dec 4 03:40] No work to do at the present time.  Waiting.
...
[Worker #3 Dec 4 03:40] M50263 stage 1 complete. 25971112 transforms. Total time: 68.424 sec.
[Worker #3 Dec 4 03:40] Exceeded limit on number of workers that can use lots of memory.
[Worker #3 Dec 4 03:40] Looking for work that uses less memory.
[Worker #3 Dec 4 03:40] No work to do at the present time.  Waiting.
...
[Worker #1 Dec 4 03:41] Stage 2 GCD complete. Time: 0.001 sec.
[Worker #1 Dec 4 03:41] M50111 completed P-1, B1=3000000, B2=95867651880, Wi8: 53020C14
[Worker #1 Dec 4 03:41] No work to do at the present time.  Waiting.
[Worker #2 Dec 4 03:41] Restarting worker with new memory settings.
[Worker #3 Dec 4 03:41] Restarting worker with new memory settings.
[Worker #2 Dec 4 03:41] Resuming.
[Worker #3 Dec 4 03:41] Resuming.
...
[Worker #2 Dec 4 03:41] P-1 on M50227 with B1=6000000, B2=10000000000
[Worker #3 Dec 4 03:41] P-1 on M50263 with B1=9000000, B2=100000000000
Segmentation fault (core dumped)
SethTro is offline   Reply With Quote
Old 2021-12-05, 02:29   #93
techn1ciaN
 
techn1ciaN's Avatar
 
Oct 2021
U. S. / New York, NY

3·72 Posts
Default

Quote:
Originally Posted by lisanderke View Post
14923 I received 2888 GHzDs P-1 credit for this workload while it took me probably less than an hr to complete, perhaps credit given should be recalculated after a full release of 30.8 or higher versions!
A counterpoint: Systems with very large RAM allocations are scarce. Since 30.8's wildly impressive / "headline" improvements only seem possible with lots of RAM allocated, leaving the credit formula where it is might offer a good incentive for the owners of RAM-rich systems to run what their hardware would be most valuable for, i.e. P-1.

By the logic of your suggestion, we might recompute the TF credit formula, since the current one is still from when TF was done by CPU even though today's TF is run on GPUs with vastly greater throughput. While superficially reasonable, this probably doesn't make sense because we can see that having "inflated" credit on offer incentivizes GPU owners to run the more efficient TF and not the less efficient primality testing.

Last fiddled with by techn1ciaN on 2021-12-05 at 02:30 Reason: Clarifying adjective
techn1ciaN is offline   Reply With Quote
Old 2021-12-05, 03:02   #94
alpertron
 
alpertron's Avatar
 
Aug 2002
Buenos Aires, Argentina

2·17·43 Posts
Default

It appears that 30.8 runs faster than previous versions on P-1 not only when there are large amounts on RAM, but also on small exponents.

In my case (using 8GB of RAM in an I5 3470) Prime95 required 5 days to get the following:

Code:
processing: P-1 no-factor for M9325159 (B1=50,000,000, B2=50,001,265,860)
CPU credit is 1312.7590 GHz-days.
Notice that the file worktodo.txt already had the known factors, but no new factors were found.

The difference between 1 hour and 5 days (to get half the credit) cannot be explained only by the amount of RAM in the system.
alpertron is offline   Reply With Quote
Old 2021-12-05, 04:36   #95
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11111010110102 Posts
Default

Quote:
Originally Posted by axn View Post
This is repeatable. Multiple restarts with build 2 all yielded same behavior - top shows consistently at 200% instead of the expected high 500%
Found it. Somehow I accidentally overwrote the affinity changes that were in build 1
Prime95 is offline   Reply With Quote
Old 2021-12-05, 04:49   #96
axn
 
axn's Avatar
 
Jun 2003

22·7·193 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Found it. Somehow I accidentally overwrote the affinity changes that were in build 1
Phew! Was this only linux build affected?

Anyway, whenever you release build 3(?) (with this and other bug fixes), i'll switch over from build 1 which so far seems to be working fine for my use case.
axn is offline   Reply With Quote
Old 2021-12-05, 04:55   #97
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

802610 Posts
Default Build 3

This version adds SSE2, FMA, AVX-512 support. Non-power-of-two FFTs in polymult. Stage 2 now takes advantage of an FFT's ability to do circular convolution. The upshot is stage 2 is now faster.

Fixed some bugs.

Linux version required upgrade to GCC 8 for AVX512 support. This could pose GCC library issues for some users.

To address the over-aggressive B2 calculations, I added option Pm1CostFudge=n to prime.txt. Default value is 2.5. This option says multiple the stage 2 cost estimate by n. This option may disappear when I get around to writitng
a more accurate costing function.

Added Stage2ExtraThreads=n to prime.txt. Hyperthreading might help polymult. This gives polymult more threads to chew on. Untested.

Highest priority next is save files, interruptability, some status reporting. And major bug fixes.


Should you wish to try 30.8, same warnings as before. Links are below.
  • Use this version only for P-1 work on Mersenne numbers. This really is pre-beta!
  • Please rerun your last 3 or 4 successful P-1 runs to QA that the new P-1 stage 2 code finds those factors.
  • Use much more aggressive B2 bounds. While the optimal B2 calculations may not be perfect I recommend using them anyway.
  • Turn on roundoff error checking
  • Give stage 2 as much memory as you can. Only run one worker with high memory. The default value for MaxHighMemWorkers is now one.
  • Save files during P-1 stage 2 cannot be created.
  • There is no progress reporting during P-1 stage 2.
  • P-1 stage 2 is untested on 100M+ exponents. I am not sure the code can accurately gauge when the new code is faster than the old code.
  • MaxStage0Prime in undoc.txt has changed.
  • Archive your completed P-1 save files in case there are bugs found that require re-running stage 2.

Windows 64-bit: https://mersenne.org/ftp_root/gimps/p95v308b3.win64.zip
Linux 64-bit: https://mersenne.org/ftp_root/gimps/...linux64.tar.gz

Last fiddled with by Prime95 on 2021-12-05 at 05:57
Prime95 is offline   Reply With Quote
Old 2021-12-05, 07:04   #98
axn
 
axn's Avatar
 
Jun 2003

540410 Posts
Default

Wow! 330s -> 212s
axn is offline   Reply With Quote
Old 2021-12-05, 07:44   #99
Luminescence
 
Oct 2021
Germany

13510 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Added Stage2ExtraThreads=n to prime.txt. Hyperthreading might help polymult. This gives polymult more threads to chew on. Untested.
Not sure if this is just a visual bug, but when running a 12-core CPU (Ryzen 9 5900X) with all cores on one worker and setting this option to 12, Prime95 (at least visually) claims to assign all 12 extra polymult helper threads to core 1.

With this setting enabled, stage 2 went from 745s to 725s on a 25.6M exponent (B1/B2 = 700k and 450M).
Stage 2 init went from ~90s to ~60s though

Last fiddled with by Luminescence on 2021-12-05 at 07:58 Reason: Last line
Luminescence is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Do not post your results here! kar_bon Prime Wiki 40 2022-04-03 19:05
what should I post ? science_man_88 science_man_88 24 2018-10-19 23:00
Where to post job ad? xilman Linux 2 2010-12-15 16:39
Moderated Post kar_bon Forum Feedback 3 2010-09-28 08:01
Something that I just had to post/buy dave_0273 Lounge 1 2005-02-27 18:36

All times are UTC. The time now is 07:56.


Tue Sep 27 07:56:49 UTC 2022 up 40 days, 5:25, 0 users, load averages: 0.94, 1.06, 1.11

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔