![]() |
![]() |
#23 |
Mar 2009
22×5 Posts |
![]()
Here's 14 workers, 1 core each:
Prime95 64-bit version 30.7, RdtscTiming=1 FFTlen=48K, Type=3, Arch=4, Pass1=256, Pass2=192, clm=4 (14 cores, 14 workers): 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22 ms. Throughput: 63688.19 iter/sec. FFTlen=48K, Type=3, Arch=4, Pass1=256, Pass2=192, clm=2 (14 cores, 14 workers): 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21 ms. Throughput: 66648.24 iter/sec. FFTlen=48K, Type=3, Arch=4, Pass1=256, Pass2=192, clm=1 (14 cores, 14 workers): 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.22, 0.21, 0.21 ms. Throughput: 65654.27 iter/sec. FFTlen=48K, Type=3, Arch=4, Pass1=768, Pass2=64, clm=4 (14 cores, 14 workers): 0.27, 0.27, 0.27, 0.27, 0.27, 0.27, 0.27, 0.27, 0.27, 0.27, 0.27, 0.27, 0.27, 0.27 ms. Throughput: 52101.51 iter/sec. FFTlen=48K, Type=3, Arch=4, Pass1=768, Pass2=64, clm=2 (14 cores, 14 workers): 0.24, 0.23, 0.24, 0.23, 0.24, 0.23, 0.23, 0.24, 0.23, 0.24, 0.23, 0.24, 0.24, 0.24 ms. Throughput: 59580.44 iter/sec. FFTlen=48K, Type=3, Arch=4, Pass1=768, Pass2=64, clm=1 (14 cores, 14 workers): 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.22, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23 ms. Throughput: 61841.43 iter/sec. It appears mprime keeps selecting pass1=768, pass2=64, clm=2 for some reason. It's the 2nd slowest one. |
![]() |
![]() |
![]() |
#24 |
Mar 2009
22·5 Posts |
![]()
And here's 30.8b15:
Prime95 64-bit version 30.8, RdtscTiming=1 FFTlen=48K, Type=3, Arch=4, Pass1=256, Pass2=192, clm=4 (14 cores, 14 workers): 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22 ms. Throughput: 63942.02 iter/sec. FFTlen=48K, Type=3, Arch=4, Pass1=256, Pass2=192, clm=2 (14 cores, 14 workers): 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21 ms. Throughput: 66820.75 iter/sec. FFTlen=48K, Type=3, Arch=4, Pass1=256, Pass2=192, clm=1 (14 cores, 14 workers): 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21 ms. Throughput: 66012.58 iter/sec. FFTlen=48K, Type=3, Arch=4, Pass1=768, Pass2=64, clm=4 (14 cores, 14 workers): 0.27, 0.27, 0.27, 0.27, 0.27, 0.27, 0.27, 0.27, 0.27, 0.27, 0.27, 0.27, 0.27, 0.27 ms. Throughput: 52146.49 iter/sec. FFTlen=48K, Type=3, Arch=4, Pass1=768, Pass2=64, clm=2 (14 cores, 14 workers): 0.23, 0.23, 0.24, 0.23, 0.24, 0.23, 0.24, 0.23, 0.23, 0.23, 0.24, 0.24, 0.24, 0.24 ms. Throughput: 59571.93 iter/sec. FFTlen=48K, Type=3, Arch=4, Pass1=768, Pass2=64, clm=1 (14 cores, 14 workers): 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.22, 0.23, 0.23, 0.23, 0.23, 0.23 ms. Throughput: 62050.87 iter/sec. Also attached. |
![]() |
![]() |
![]() |
#25 |
P90 years forever!
Aug 2002
Yeehaw, FL
2×3×52×53 Posts |
![]() |
![]() |
![]() |
![]() |
#26 |
Mar 2009
22×5 Posts |
![]() Looks like all 14 workers are using Pass1=256, Pass2=192, clm=2. Please see attached prime_log_307.txt. Ok, according to the results.bench.txt for 14 workers, 1 core each, this is indeed the fastest implementation. But for other configurations, which may not be ideal (running 14 workers 1 core each, stage 2 ECM would take ~8GB each, and my system only has 32 GB total), is there anything else I can do? Last fiddled with by timbit on 2022-06-30 at 16:48 Reason: more comments... |
![]() |
![]() |
![]() |
#27 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
23×72×17 Posts |
![]()
Try 7 workers x 2 cores each.
If that alone is not satisfactory, try limiting memory used per worker. In undoc.txt: Code:
The Memory=n setting in local.txt refers to the total amount of memory the program can use. You can also put this in the [Worker #n] section to place a maximum amount of memory that one particular worker can use. You can set MaxHighMemWorkers=n in local.txt. This tells the program how wany workers are allowed to use lots of memory. This occurs doing stage 2 of P-1, P+1, or ECM on medium-to-large numbers. Default is available memory / 1GB. You can set a threshold for what is considered lots of memory in MaxHighMemWorkers calculations. In local.txt, set: HighMemThreshold=n (default is 50) The value n is in MB. Last fiddled with by kriesel on 2022-06-30 at 18:27 |
![]() |
![]() |
![]() |
#28 |
Mar 2009
22×5 Posts |
![]()
I just noticed now that on the mersenne.org "Download software" page, it links to 30.8 b15.
On the main page, the latest announcement lists version 30.7 released, so I never bothered to check for latest stable software release. I guess I need to check the downloads page more often. I'm still experimenting with the timings, but I have taken the advice and for small FFTs only using 1 or 2 threads max per worker. Throughput has increased, thanks. |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Running fstrim on SSD while mprime is running might cause errors in mprime | AwesomeMachine | Software | 4 | 2021-10-07 23:49 |
Radeon VII on a mining-like bench | Viliam Furik | Viliam Furik | 17 | 2021-01-14 08:12 |
mprime from git | SELROC | Software | 2 | 2018-10-30 10:16 |
2 x AMD Opteron 2427 @ 2.39 GHz - prime95 bench- | joblack | Hardware | 2 | 2010-03-12 19:38 |
Problem with mprime (Fixed with mprime -d) | antiroach | Software | 2 | 2004-07-19 04:07 |