20220714, 15:59  #23 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2·29·127 Posts 
The optimal number of cores/worker depends on fft size. Very small fft size may be optimal with a single core/worker. Very large fft size may be optimal with all cores available even in highcorecount systems. The general rule is to default at 4 cores/worker, but that is for DC & first test wavefront size ffts (currently ~36M fft size).
Last fiddled with by kriesel on 20220714 at 16:01 
20220714, 16:35  #24 
"GIMFS"
Sep 2002
Oeiras, Portugal
1570_{10} Posts 
Yes, that is certainly the case for ECM Stage 1 on FFTs this small, so P95 runs single threaded. For stage 2, the program uses 3 helpers although the FFT size is probably about the same size, I think it has to do with the polynomial multiplication.

20220714, 17:12  #25  
"Curtis"
Feb 2005
Riverside, CA
3·1,877 Posts 
Quote:
However, the timing curve is quite broad near the peak, so I choose the largest B1/B2 that are within, say, 5% of the best expected time for a Tlevel, to maximize the amount of work done for T60 while I'm doing the T55. To me, it makes sense to give up a bit of efficiency on smaller factors to gain a larger chance to find a bigger factor. For instance, B1 = 6e7 is faster to run a T50 than 43e6 when using GMPECM with default B2 values, and also improve the chance to find 52+ digit factors when compared to B1=43e6. I wonder if this is true with the new P95 as well. 

20220714, 21:39  #26 
"GIMFS"
Sep 2002
Oeiras, Portugal
2×5×157 Posts 
Yes, that is right; I just didn´t make myself clear enough: I meant to say that for a given value of B1, in this case 110M, just reducing the value of B2 didn´t seem to be a valid approach. I gave the extreme example of B1=110M and B2 = 105 * B1 yielding the lowest time to complete t55, whereas it didn´t seem a sensible move to use such a small value for B2. In fact, Prime95 itself chose a larger value for B2 even though the time to complete t55 was larger than using B2 = 105 * B1, as described in my post.
Last fiddled with by lycorn on 20220714 at 21:43 
20220715, 02:10  #27 
Einyen
Dec 2003
Denmark
2×17×101 Posts 
I was starting some timing tests as well, just running on 1 core with 24GB RAM, but I got several SUMOUT errors during stage 1:
ECM2=1,2,2267,1,800000000,80000000000,1 It did finish stage 1 at least 1 time so far with 1 SUMOUT error, now 3 SUMOUT errors so far in curve #2. I have these in prime.txt since I just copied my normal file: SumInputsErrorCheck=1 OutputRoundoff=1 I'm trying now to force FFT 160 instead of 128 and see if that helps. Edit: It seems 128 FFT is too large for M2267, and 96 FFT is too small. Trying M2719 at 128 FFT instead. Last fiddled with by ATH on 20220715 at 03:13 
20220715, 02:49  #28  
P90 years forever!
Aug 2002
Yeehaw, FL
2×4,079 Posts 
Quote:
Example: 110M, 105*B1, 950+5.3, 42000 I'm surprised at your preliminary results. Sounds like prime95's optimal B2 guess needs work. Quote:


20220715, 07:47  #29 
"GIMFS"
Sep 2002
Oeiras, Portugal
2·5·157 Posts 
Summary of results:
The number of curves to run was given by GMPECM. The B1 runtime is an average value. Runtimes in seconds. Tests done using 1 worker with 4 physical cores allowed to run Prime95. Exponent: 4567 B1 B2 runtime curves to run 110 M 1000 * B1 950 + 17.9 25849 110 M 500 * B1 950 + 11.9 29306 110 M 200 * B1 950 + 7.1 35419 110 M 105 * B1 950 + 5.3 40485 110 M 100 * B1 950 + 110.7 14396 (actual B2 = 28217 *B1, computed by Prime95) For larger values of B2, stage 2 runtime would grow accordingly: 110 M 1.5e13 950 + 293.4 11285 110 M 3.0e13 950 + 500.7 10211 110 M 6.0e13 950 + 793 9307 
20220715, 09:43  #30 
"GIMFS"
Sep 2002
Oeiras, Portugal
2×5×157 Posts 
Additionally, the amount of stage 2 memory used (in MB) for the different B2 values was:
105 * B1  738 200 * B1  949 500 * B1  1159 1000 * B1 1778 100 * B1  9813 (actual B2 chosen by Prime95 = 28217 * B1) 1.5e13  18498 3.0e13  18498 (yes, it was the same value) 6.0e13  26359 
20220715, 11:13  #31 
Einyen
Dec 2003
Denmark
6552_{8} Posts 
M2719 had many SUMOUT errors as well, so I set:
SumInputsErrorCheck=0 and now it seems to run fine. Does that mean the SUMOUT errors are just hidden now but still there or are they false? 
20220715, 14:02  #32 
P90 years forever!
Aug 2002
Yeehaw, FL
2·4,079 Posts 
Just hidden. SUMOUT checks only available in SSE2 FFTs (old computer?). SUMOUT checks were the first error checks prime95 used. They are "fuzzy". Two floating point check values are supposed to be equal, but since floats are inexact prime95 checks the two values are "really close" to equal. You probably have some outliers that were just beyond "really close".

20220715, 17:25  #33  
Einyen
Dec 2003
Denmark
2×17×101 Posts 
Quote:
I will just continue with SumInputsErrorCheck=0 and hope it is just "almost really close" values. I'm not trying to find factors anyway, just testing the stage 2 speed for different values of B2. Last fiddled with by ATH on 20220715 at 17:30 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
That's a Lot of Users!!!  jinydu  Lounge  9  20061110 00:14 
Beta version 24.6  Athlon users wanted  Prime95  Software  139  20050330 12:13 
For Old Users  Citrix  Prime Sierpinski Project  15  20040822 16:43 
Opportunity! Retaining new users postM40  GP2  Lounge  55  20031121 21:08 
AMD USERS  ET_  Lounge  3  20031011 16:52 