![]() |
![]() |
#1 |
Dec 2016
23·7 Posts |
![]()
While there's some good information about gmp-ecm step 1 performance on a single core, and many postings about running multiple gmp-ecm instances, I could not find any information about whether it benefits from CPU-threads (a.k.a SMT, a.k.a. Hyperthreading). So I decided to test myself.
I tried out how gmp-ecm stage 1 performs when using all physical CPU cores compared to using all threads of the CPU. Specifically, I checked how stage 1 performs with B1=1e8 for M1489. I chose these parameters because
AMD Ryzen 9 3950X 16 cores 570 seconds 32 threads 589 seconds throughput change: +91% Intel Core i7 6600U 2 cores 858 seconds 4 threads 984 seconds throughput change: +74% I also cross-tested against mprime, which is said to be faster for stage 1. Interestingly, mprime working on M1489 with one thread and B1=1e9 on the Ryzen takes ~6400s, compared to the ~5850s with gmp-ecm. I assume this happens because mprime does not use the known factors when doing ECM. So for this particular number, gmp-ecm is actually a bit faster in step 1 than mprime. |
![]() |
![]() |
![]() |
#2 |
Dec 2016
1110002 Posts |
![]()
For reference, the benchmark was performed on Linux like this:
Code:
#!/bin/bash function launch_ecm { echo '(2^1489-1) / 71473 / 27201739919 / 51028917464688167 / 13822844053570368983 / 122163266112900081138309323835006063277267764895871 / 95909518295775374166321292697000685895150503357477127' | \ time taskset --cpu-list $1 ecm -v -modmuln 1e8 0 >> log_$i & } for i in $(seq 0 2 31); do launch_ecm $i done wait echo -e '\n\n\n\n\n\n' sleep 30 for i in $(seq 0 31); do launch_ecm $i done wait I'm using "taskset" to ensure that in the first loop, each physical core gets one ecm process. Without that, some cores would get 2, some would get none. |
![]() |
![]() |
![]() |
#3 |
"Curtis"
Feb 2005
Riverside, CA
52×11×17 Posts |
![]()
GMP-ECM is quite kind to HT, yes. It's also rather kind to memory bandwidth, so it makes a nice hyperthread-partner to LLR or P95 work.
mprime's stage 1 ECM uses FFT arithmetic, similar to the LL test; so it's not as memory-kind. This is another reason that for workloads involving hyperthreads, it may be faster to run GMP-ECM rather than P95-ECM on these small mersenne composites. |
![]() |
![]() |
![]() |
#4 | |
"Oliver"
Sep 2017
Porta Westfalica, DE
3·5·29 Posts |
![]()
You may want to test with the "known factors format":
Quote:
Code:
ECM2=1,2,1489,-1,1000000000,0,1,"71473,27201739919,51028917464688167,13822844053570368983,122163266112900081138309323835006063277267764895871,95909518295775374166321292697000685895150503357477127" According to undoc.txt, MaxStage0Prime does not change the behavior of ECM's first stage. But maybe the same optimization as in P-1 could be applied here as well? Last fiddled with by kruoli on 2020-12-15 at 14:54 Reason: Too much brackets. |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Don“t step on my toes... please! | lycorn | Lone Mersenne Hunters | 37 | 2011-10-09 16:09 |
GMP-ECM fails on step 2 with B2 > 96M | CRGreathouse | Factoring | 9 | 2010-10-30 16:38 |
Optimizing step 2 of ECM on Prime95 | alpertron | Software | 4 | 2006-01-11 17:27 |
Wildly differing times for step 2 with ecm-6.0.1 | Jushi | GMP-ECM | 7 | 2005-09-12 01:30 |
One-step Hangman | Ken_g6 | Puzzles | 1 | 2005-01-16 15:03 |