Go Back > Factoring Projects > GMP-ECM

Thread Tools
Old 2020-12-14, 23:18   #1
Dec 2016

2·32·5 Posts
Default gmp-ecm step 1 performance with hyperthreading

While there's some good information about gmp-ecm step 1 performance on a single core, and many postings about running multiple gmp-ecm instances, I could not find any information about whether it benefits from CPU-threads (a.k.a SMT, a.k.a. Hyperthreading). So I decided to test myself.

I tried out how gmp-ecm stage 1 performs when using all physical CPU cores compared to using all threads of the CPU. Specifically, I checked how stage 1 performs with B1=1e8 for M1489. I chose these parameters because
  • stage 1 takes much more time than stage 2, so its performance is more relevant
  • B2=1e8 gets results reasonably quickly but still runs long enough for reliable results
  • M1489 has so many known factors that <1000 bits remain to be factored
The surprising result was that using all CPU-threads almost doubled the throughput! Time needed for 1 curve for each gmp-ecm process:

AMD Ryzen 9 3950X
16 cores 570 seconds
32 threads 589 seconds
throughput change: +91%

Intel Core i7 6600U
2 cores 858 seconds
4 threads 984 seconds
throughput change: +74%

I also cross-tested against mprime, which is said to be faster for stage 1. Interestingly, mprime working on M1489 with one thread and B1=1e9 on the Ryzen takes ~6400s, compared to the ~5850s with gmp-ecm. I assume this happens because mprime does not use the known factors when doing ECM. So for this particular number, gmp-ecm is actually a bit faster in step 1 than mprime.
nordi is offline   Reply With Quote
Old 2020-12-14, 23:22   #2
Dec 2016

2×32×5 Posts

For reference, the benchmark was performed on Linux like this:

function launch_ecm {
    echo '(2^1489-1) / 71473 / 27201739919 / 51028917464688167 / 13822844053570368983 / 122163266112900081138309323835006063277267764895871 / 95909518295775374166321292697000685895150503357477127' | \
    time taskset --cpu-list $1 ecm -v -modmuln 1e8 0 >> log_$i &

for i in $(seq 0 2 31); do
    launch_ecm $i

echo -e '\n\n\n\n\n\n'
sleep 30

for i in $(seq 0 31); do
    launch_ecm $i
Replace the "31" in the for-loops with your thread count minus 1.

I'm using "taskset" to ensure that in the first loop, each physical core gets one ecm process. Without that, some cores would get 2, some would get none.
nordi is offline   Reply With Quote
Old 2020-12-15, 01:07   #3
VBCurtis's Avatar
Feb 2005
Riverside, CA

506710 Posts

GMP-ECM is quite kind to HT, yes. It's also rather kind to memory bandwidth, so it makes a nice hyperthread-partner to LLR or P95 work.
mprime's stage 1 ECM uses FFT arithmetic, similar to the LL test; so it's not as memory-kind. This is another reason that for workloads involving hyperthreads, it may be faster to run GMP-ECM rather than P95-ECM on these small mersenne composites.
VBCurtis is online now   Reply With Quote
Old 2020-12-15, 14:54   #4
kruoli's Avatar
Sep 2017
Porta Westfalica, DE

2×383 Posts

You may want to test with the "known factors format":
Originally Posted by readme.txt
That would result in:
You should also have a look at a exponent with fewer known factors. It could change the outcome of GMPECM vs. Prime95.

According to undoc.txt, MaxStage0Prime does not change the behavior of ECM's first stage. But maybe the same optimization as in P-1 could be applied here as well?

Last fiddled with by kruoli on 2020-12-15 at 14:54 Reason: Too much brackets.
kruoli is online now   Reply With Quote
Old 2021-11-07, 23:14   #5
Dec 2016

1328 Posts

I also benchmarked Step 2 on my AMD Ryzen 9 3950X, using M1217 and B2=1e13 for Step 2 to answer two questions:
  1. does it make sense to run Step 2 on every CPU thread?
  2. does it make sense to run Steps 1 and Step 2 in parallel on a physical core, using its two threads?

For question 1, I got
16 physical cores with Step 2: 357.5 seconds per curve
32 CPU threads with Step 2: 631.5 seconds per curve
throughput: 357.5/631.5*2 = 113.2%
which is 13.2% more throughput.

For question 2, I got
Step 2 takes 611.8 seconds
Step 2 throughput: 357.5/611.8 = 58.4%
Step 1 while Step 2 is running 599.6
Step 1 without Step 2 running: 354.0
Step 1 throughput: 354/599.6 = 59.0%
overall throughput: 58.4% + 59.0% = 117.4%
which is 17.4% more throughput.

The additional throughput is not as significant as for step 1 and comes at the expense of either doubled RAM requirements (case 1) or a longer time during which the RAM is used (case 2). But if you have enough RAM, it makes sense to use all CPU threads.
nordi is offline   Reply With Quote

Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
DonĀ“t step on my toes... please! lycorn Lone Mersenne Hunters 37 2011-10-09 16:09
GMP-ECM fails on step 2 with B2 > 96M CRGreathouse Factoring 9 2010-10-30 16:38
Optimizing step 2 of ECM on Prime95 alpertron Software 4 2006-01-11 17:27
Wildly differing times for step 2 with ecm-6.0.1 Jushi GMP-ECM 7 2005-09-12 01:30
One-step Hangman Ken_g6 Puzzles 1 2005-01-16 15:03

All times are UTC. The time now is 07:32.

Tue Nov 30 07:32:32 UTC 2021 up 130 days, 2:01, 0 users, load averages: 0.94, 0.95, 0.94

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.