![]() |
![]() |
#1 |
"Sam"
Nov 2016
5·67 Posts |
![]()
I know there's a way to run different LLR instances and have them assigned to different designated CPU, making it run significantly faster than if only one instance were used.
I am using a 4 core, 8 thread CPU. In the attachment I sent, one instance of LLR is running with only one thread, and time per bit is 0.576 ms. The CPU affinity is set to 0. After terminating the program, I copy the LLR exectuable to another directory and run a test on a number of similar size to the first run (one thread). The CPU affinity is set to 1. I check on the first run, when I notice a time increase of 1.172 ms. almost twice as running one one LLR application! No speedup whatsoever. My goal is to run 4 instances of LLR with similar time sufficiency as only running one instance of LLR single threaded (4 instances each running with close to 0.576 ms. per bit, so that testing is 4x faster). Does anyone know what I am doing wrong here? I am aware that running a single instance with 8 threads is less productive than running 4 single threaded instances and for some reason I never figured out how to achieve the latter. Thanks for help! |
![]() |
![]() |
![]() |
#2 |
Sep 2002
Database er0rr
110768 Posts |
![]()
Running only one instance has all the cache too itself and will run quicker than running two instances where there will be contention for cache. On a 4c/8t box I run on instance with the -t4 option. I think this approach is cache friendlier.
|
![]() |
![]() |
![]() |
#3 |
"Curtis"
Feb 2005
Riverside, CA
2×32×17×19 Posts |
![]()
In Windows, are cores 0 and 1 hyperthreads of the same physical core? That would explain your timing exactly doubling.
What happens when you assign the second LLR copy to core 2 rather than 1? Have you tried not assigning affinity? I've had decent luck just letting Windows utilize the cores- manually assigning affinity does help sometimes, but for this use case I'm not sure it matters for you. |
![]() |
![]() |
![]() |
#4 |
"Sam"
Nov 2016
5×67 Posts |
![]()
Thanks for the suggestions! I ran 4 subsequent instances of LLR --- assigning affinity to CPUS 0, 2.
The time increased by about 0.120 ms which I guess makes sense given that more cores means slower clock speed. I loaded up 4 instances running on CPUS 0, 2, 4, 6 and the time per bit almost doubled --- a (0.380 ms. increase). I think Paul is right --- running four threads on one instance seems to be faster than running 4 instances single threaded. I would think that with larger number of cores, say 12 or 16, the latter might become slower? |
![]() |
![]() |
![]() |
#5 |
Sep 2002
Database er0rr
467010 Posts |
![]()
I don't know about 12 core chips running LLR, but generally it makes sense to run 1 instance per chip or chiplet.
|
![]() |
![]() |
![]() |
#6 | |
"Curtis"
Feb 2005
Riverside, CA
2·32·17·19 Posts |
![]() Quote:
Once FFT reaches 256K, 2-threaded runs work pretty well. OP- I've run LLR on this size of number on prebuilt machines with slow 2-channel memory, and running 3 instances was just about as fast as 4 but generated quite a bit less heat. That is, 3 is enough to saturate the memory on some quad-core machines. It takes some experimenting with threads-per-process and number of processes to find the sweet spot! |
|
![]() |
![]() |
![]() |
#7 | |
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
22×3×7×73 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#8 |
"Mark"
Apr 2003
Between here and the
13·557 Posts |
![]()
Back to the original question. With the new Intel CPUs, Windows seems to be run llr on the efficiency cores by default, not the performance cores. I would like to run llr only on the performance cores. Note that I am using PRPNet, so llr is run once for each PRP/primality test. PRPNet sets Affinity= in the llr.ini file, but that is not being respected.
|
![]() |
![]() |
![]() |
#9 |
"Oliver"
Sep 2017
Porta Westfalica, DE
7×223 Posts |
![]()
For this problem, I have written a simple program. It is attached with source code (since the executable must be started as administrator).
|
![]() |
![]() |
![]() |
#10 |
"Mark"
Apr 2003
Between here and the
1C4916 Posts |
![]()
That isn't helpful because I cannot run that every time I start llr. llr should respect the affinity, but maybe it requires llr to be run as administrator to set affinity.
|
![]() |
![]() |
![]() |
#11 |
"Oliver"
Sep 2017
Porta Westfalica, DE
7×223 Posts |
![]()
For your request that LLR will respect it by itself, I definitely concur. If a process wants to limit its own affinity, it can do it without further privileges (as of Windows 10, at least).
My program does only need to be started once. It will monitor all started processes and apply affinity in accordance with the parameters the program was started with. |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Prime95 and cpu affinity | pepi37 | Software | 4 | 2019-04-25 05:51 |
Unexplained slowdown (affinity problem?) | Siegmund | Software | 6 | 2017-06-03 05:31 |
[Patch] CPU affinity prompt problem in mprime Linux / OS X build | Explorer09 | Software | 1 | 2017-03-01 02:34 |
Set affinity does not work | g33py | Software | 3 | 2016-07-27 05:26 |
Processor Affinity | R.D. Silverman | Programming | 19 | 2015-04-24 22:46 |