mersenneforum.org gmp-ecm step 1 performance with hyperthreading
 Register FAQ Search Today's Posts Mark Forums Read

 2020-12-14, 23:22 #2 nordi   Dec 2016 6916 Posts For reference, the benchmark was performed on Linux like this: Code: #!/bin/bash function launch_ecm { echo '(2^1489-1) / 71473 / 27201739919 / 51028917464688167 / 13822844053570368983 / 122163266112900081138309323835006063277267764895871 / 95909518295775374166321292697000685895150503357477127' | \ time taskset --cpu-list $1 ecm -v -modmuln 1e8 0 >> log_$i & } for i in $(seq 0 2 31); do launch_ecm$i done wait echo -e '\n\n\n\n\n\n' sleep 30 for i in $(seq 0 31); do launch_ecm$i done wait Replace the "31" in the for-loops with your thread count minus 1. I'm using "taskset" to ensure that in the first loop, each physical core gets one ecm process. Without that, some cores would get 2, some would get none.
 2020-12-15, 01:07 #3 VBCurtis     "Curtis" Feb 2005 Riverside, CA 10100001011102 Posts GMP-ECM is quite kind to HT, yes. It's also rather kind to memory bandwidth, so it makes a nice hyperthread-partner to LLR or P95 work. mprime's stage 1 ECM uses FFT arithmetic, similar to the LL test; so it's not as memory-kind. This is another reason that for workloads involving hyperthreads, it may be faster to run GMP-ECM rather than P95-ECM on these small mersenne composites.
2020-12-15, 14:54   #4
kruoli

"Oliver"
Sep 2017
Porta Westfalica, DE

2×32×47 Posts

You may want to test with the "known factors format":
Quote:
That would result in:
Code:
ECM2=1,2,1489,-1,1000000000,0,1,"71473,27201739919,51028917464688167,13822844053570368983,122163266112900081138309323835006063277267764895871,95909518295775374166321292697000685895150503357477127"
You should also have a look at a exponent with fewer known factors. It could change the outcome of GMPECM vs. Prime95.

According to undoc.txt, MaxStage0Prime does not change the behavior of ECM's first stage. But maybe the same optimization as in P-1 could be applied here as well?

Last fiddled with by kruoli on 2020-12-15 at 14:54 Reason: Too much brackets.

 2021-11-07, 23:14 #5 nordi   Dec 2016 3×5×7 Posts I also benchmarked Step 2 on my AMD Ryzen 9 3950X, using M1217 and B2=1e13 for Step 2 to answer two questions: does it make sense to run Step 2 on every CPU thread? does it make sense to run Steps 1 and Step 2 in parallel on a physical core, using its two threads? For question 1, I got 16 physical cores with Step 2: 357.5 seconds per curve 32 CPU threads with Step 2: 631.5 seconds per curve throughput: 357.5/631.5*2 = 113.2% which is 13.2% more throughput. For question 2, I got Step 2 takes 611.8 seconds Step 2 throughput: 357.5/611.8 = 58.4% Step 1 while Step 2 is running 599.6 Step 1 without Step 2 running: 354.0 Step 1 throughput: 354/599.6 = 59.0% overall throughput: 58.4% + 59.0% = 117.4% which is 17.4% more throughput. The additional throughput is not as significant as for step 1 and comes at the expense of either doubled RAM requirements (case 1) or a longer time during which the RAM is used (case 2). But if you have enough RAM, it makes sense to use all CPU threads.

 Similar Threads Thread Thread Starter Forum Replies Last Post lycorn Lone Mersenne Hunters 37 2011-10-09 16:09 CRGreathouse Factoring 9 2010-10-30 16:38 alpertron Software 4 2006-01-11 17:27 Jushi GMP-ECM 7 2005-09-12 01:30 Ken_g6 Puzzles 1 2005-01-16 15:03

All times are UTC. The time now is 14:04.

Sat Jan 29 14:04:00 UTC 2022 up 190 days, 8:32, 2 users, load averages: 1.36, 1.33, 1.43