mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2022-07-14, 15:59   #23
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22×52×71 Posts
Default

The optimal number of cores/worker depends on fft size. Very small fft size may be optimal with a single core/worker. Very large fft size may be optimal with all cores available even in high-core-count systems. The general rule is to default at 4 cores/worker, but that is for DC & first test wavefront size ffts (currently ~3-6M fft size).

Last fiddled with by kriesel on 2022-07-14 at 16:01
kriesel is online now   Reply With Quote
Old 2022-07-14, 16:35   #24
lycorn
 
lycorn's Avatar
 
"GIMFS"
Sep 2002
Oeiras, Portugal

1,567 Posts
Default

Yes, that is certainly the case for ECM Stage 1 on FFTs this small, so P95 runs single threaded. For stage 2, the program uses 3 helpers although the FFT size is probably about the same size, I think it has to do with the polynomial multiplication.
lycorn is offline   Reply With Quote
Old 2022-07-14, 17:12   #25
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

23·5·139 Posts
Default

Quote:
Originally Posted by lycorn View Post
So my point is: does this really work this way? I mean, is the time to complete the t55, as given by the product of number of curves * B2 runtime, the only criterion to take in to account when choosing an optimal B2 value?
No, one should use number of curves * [total of stage 1 and stage 2 run time], which measures the total time it will take to achieve a t-level.

However, the timing curve is quite broad near the peak, so I choose the largest B1/B2 that are within, say, 5% of the best expected time for a T-level, to maximize the amount of work done for T60 while I'm doing the T55. To me, it makes sense to give up a bit of efficiency on smaller factors to gain a larger chance to find a bigger factor.

For instance, B1 = 6e7 is faster to run a T50 than 43e6 when using GMP-ECM with default B2 values, and also improve the chance to find 52+ digit factors when compared to B1=43e6. I wonder if this is true with the new P95 as well.
VBCurtis is offline   Reply With Quote
Old 2022-07-14, 21:39   #26
lycorn
 
lycorn's Avatar
 
"GIMFS"
Sep 2002
Oeiras, Portugal

1,567 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
No, one should use number of curves * [total of stage 1 and stage 2 run time], which measures the total time it will take to achieve a t-level.
Yes, that is right; I just didn´t make myself clear enough: I meant to say that for a given value of B1, in this case 110M, just reducing the value of B2 didn´t seem to be a valid approach. I gave the extreme example of B1=110M and B2 = 105 * B1 yielding the lowest time to complete t55, whereas it didn´t seem a sensible move to use such a small value for B2. In fact, Prime95 itself chose a larger value for B2 even though the time to complete t55 was larger than using B2 = 105 * B1, as described in my post.

Last fiddled with by lycorn on 2022-07-14 at 21:43
lycorn is offline   Reply With Quote
Old 2022-07-15, 02:10   #27
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

1101010101112 Posts
Default

I was starting some timing tests as well, just running on 1 core with 24GB RAM, but I got several SUMOUT errors during stage 1:
ECM2=1,2,2267,-1,800000000,80000000000,1

It did finish stage 1 at least 1 time so far with 1 SUMOUT error, now 3 SUMOUT errors so far in curve #2.

I have these in prime.txt since I just copied my normal file:
SumInputsErrorCheck=1
OutputRoundoff=1

I'm trying now to force FFT 160 instead of 128 and see if that helps.

Edit: It seems 128 FFT is too large for M2267, and 96 FFT is too small. Trying M2719 at 128 FFT instead.
Attached Thumbnails
Click image for larger version

Name:	M2267.JPG
Views:	44
Size:	52.6 KB
ID:	27106  

Last fiddled with by ATH on 2022-07-15 at 03:13
ATH is offline   Reply With Quote
Old 2022-07-15, 02:49   #28
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

808410 Posts
Default

Quote:
Originally Posted by lycorn View Post
I did some more tests with 30.9.

Exponent: 4567
B1 bound: 110M (t55). Average runtime for Stage 1: 950 sec (just under 16 minutes).
B2 bounds: several large bounds (1.5e13, 3e13, 6e13), then 1e11 (1000 * B1), and some smaller bounds down to 105 * B2. Finally I tried B2 = 100 * B1 to see what P95 would choose.
Can you post a table containing B1, B2, runtime, t5 curves needed?
Example: 110M, 105*B1, 950+5.3, 42000
I'm surprised at your preliminary results. Sounds like prime95's optimal B2 guess needs work.

Quote:
Now the time taken to run stage 2 was just 20-25% more than with just one worker, that would get 3 helper threads. Is that the expected, or isn´t the program taking enough advantage of more helper threads during stage 2, meaning that some multithreading "tweaking" would be a plus?
There are some multithreading optimizations to be had. Multithreading effciency might also be impacted by the small FFT size.
Prime95 is offline   Reply With Quote
Old 2022-07-15, 07:47   #29
lycorn
 
lycorn's Avatar
 
"GIMFS"
Sep 2002
Oeiras, Portugal

1,567 Posts
Default

Summary of results:
The number of curves to run was given by GMP-ECM.
The B1 runtime is an average value. Runtimes in seconds.
Tests done using 1 worker with 4 physical cores allowed to run Prime95.

Exponent: 4567

B1 B2 runtime curves to run
110 M 1000 * B1 950 + 17.9 25849
110 M 500 * B1 950 + 11.9 29306
110 M 200 * B1 950 + 7.1 35419
110 M 105 * B1 950 + 5.3 40485
110 M 100 * B1 950 + 110.7 14396 (actual B2 = 28217 *B1, computed by Prime95)

For larger values of B2, stage 2 runtime would grow accordingly:

110 M 1.5e13 950 + 293.4 11285
110 M 3.0e13 950 + 500.7 10211
110 M 6.0e13 950 + 793 9307
lycorn is offline   Reply With Quote
Old 2022-07-15, 09:43   #30
lycorn
 
lycorn's Avatar
 
"GIMFS"
Sep 2002
Oeiras, Portugal

1,567 Posts
Default

Additionally, the amount of stage 2 memory used (in MB) for the different B2 values was:

105 * B1 ---- 738
200 * B1 ---- 949
500 * B1 ---- 1159
1000 * B1 ----1778
100 * B1 ---- 9813 (actual B2 chosen by Prime95 = 28217 * B1)
1.5e13 ---- 18498
3.0e13 ---- 18498 (yes, it was the same value)
6.0e13 ---- 26359
lycorn is offline   Reply With Quote
Old 2022-07-15, 11:13   #31
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

5×683 Posts
Default

M2719 had many SUMOUT errors as well, so I set:
SumInputsErrorCheck=0

and now it seems to run fine. Does that mean the SUMOUT errors are just hidden now but still there or are they false?
ATH is offline   Reply With Quote
Old 2022-07-15, 14:02   #32
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

176248 Posts
Default

Quote:
Originally Posted by ATH View Post
Does that mean the SUMOUT errors are just hidden now but still there or are they false?
Just hidden. SUMOUT checks only available in SSE2 FFTs (old computer?). SUMOUT checks were the first error checks prime95 used. They are "fuzzy". Two floating point check values are supposed to be equal, but since floats are inexact prime95 checks the two values are "really close" to equal. You probably have some outliers that were just beyond "really close".
Prime95 is offline   Reply With Quote
Old 2022-07-15, 17:25   #33
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

D5716 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Just hidden. SUMOUT checks only available in SSE2 FFTs (old computer?). SUMOUT checks were the first error checks prime95 used. They are "fuzzy". Two floating point check values are supposed to be equal, but since floats are inexact prime95 checks the two values are "really close" to equal. You probably have some outliers that were just beyond "really close".
It IS an old computer (Core i7-5960X from 2014, bought mine in Oct 2015), but it is using FMA3 FFT.

I will just continue with SumInputsErrorCheck=0 and hope it is just "almost really close" values. I'm not trying to find factors anyway, just testing the stage 2 speed for different values of B2.

Last fiddled with by ATH on 2022-07-15 at 17:30
ATH is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
That's a Lot of Users!!! jinydu Lounge 9 2006-11-10 00:14
Beta version 24.6 - Athlon users wanted Prime95 Software 139 2005-03-30 12:13
For Old Users Citrix Prime Sierpinski Project 15 2004-08-22 16:43
Opportunity! Retaining new users post-M40 GP2 Lounge 55 2003-11-21 21:08
AMD USERS ET_ Lounge 3 2003-10-11 16:52

All times are UTC. The time now is 10:29.


Thu Dec 8 10:29:21 UTC 2022 up 112 days, 7:57, 0 users, load averages: 0.73, 0.67, 0.72

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔