View Single Post
Old 2021-11-07, 23:14   #5
nordi
 
Dec 2016

3×5×7 Posts
Default

I also benchmarked Step 2 on my AMD Ryzen 9 3950X, using M1217 and B2=1e13 for Step 2 to answer two questions:
  1. does it make sense to run Step 2 on every CPU thread?
  2. does it make sense to run Steps 1 and Step 2 in parallel on a physical core, using its two threads?

For question 1, I got
16 physical cores with Step 2: 357.5 seconds per curve
32 CPU threads with Step 2: 631.5 seconds per curve
throughput: 357.5/631.5*2 = 113.2%
which is 13.2% more throughput.


For question 2, I got
Step 2 takes 611.8 seconds
Step 2 throughput: 357.5/611.8 = 58.4%
Step 1 while Step 2 is running 599.6
Step 1 without Step 2 running: 354.0
Step 1 throughput: 354/599.6 = 59.0%
overall throughput: 58.4% + 59.0% = 117.4%
which is 17.4% more throughput.


The additional throughput is not as significant as for step 1 and comes at the expense of either doubled RAM requirements (case 1) or a longer time during which the RAM is used (case 2). But if you have enough RAM, it makes sense to use all CPU threads.
nordi is online now   Reply With Quote