mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2020-10-03, 21:16   #1
carpetpool
 
carpetpool's Avatar
 
"Sam"
Nov 2016

13E16 Posts
Post LLR Affinity Problem

I know there's a way to run different LLR instances and have them assigned to different designated CPU, making it run significantly faster than if only one instance were used.

I am using a 4 core, 8 thread CPU. In the attachment I sent, one instance of LLR is running with only one thread, and time per bit is 0.576 ms. The CPU affinity is set to 0.

After terminating the program, I copy the LLR exectuable to another directory and run a test on a number of similar size to the first run (one thread). The CPU affinity is set to 1.

I check on the first run, when I notice a time increase of 1.172 ms. almost twice as running one one LLR application! No speedup whatsoever.

My goal is to run 4 instances of LLR with similar time sufficiency as only running one instance of LLR single threaded (4 instances each running with close to 0.576 ms. per bit, so that testing is 4x faster). Does anyone know what I am doing wrong here?

I am aware that running a single instance with 8 threads is less productive than running 4 single threaded instances and for some reason I never figured out how to achieve the latter.

Thanks for help!
Attached Thumbnails
Click image for larger version

Name:	LLR_GUI_affinity.PNG
Views:	47
Size:	21.2 KB
ID:	23469  
carpetpool is offline   Reply With Quote
Old 2020-10-03, 21:38   #2
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

2×17×103 Posts
Default

Running only one instance has all the cache too itself and will run quicker than running two instances where there will be contention for cache. On a 4c/8t box I run on instance with the -t4 option. I think this approach is cache friendlier.
paulunderwood is offline   Reply With Quote
Old 2020-10-03, 22:06   #3
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

22·19·59 Posts
Default

In Windows, are cores 0 and 1 hyperthreads of the same physical core? That would explain your timing exactly doubling.

What happens when you assign the second LLR copy to core 2 rather than 1?

Have you tried not assigning affinity? I've had decent luck just letting Windows utilize the cores- manually assigning affinity does help sometimes, but for this use case I'm not sure it matters for you.
VBCurtis is offline   Reply With Quote
Old 2020-10-04, 22:39   #4
carpetpool
 
carpetpool's Avatar
 
"Sam"
Nov 2016

2·3·53 Posts
Post

Thanks for the suggestions! I ran 4 subsequent instances of LLR --- assigning affinity to CPUS 0, 2.

The time increased by about 0.120 ms which I guess makes sense given that more cores means slower clock speed.

I loaded up 4 instances running on CPUS 0, 2, 4, 6 and the time per bit almost doubled --- a (0.380 ms. increase).

I think Paul is right --- running four threads on one instance seems to be faster than running 4 instances single threaded.

I would think that with larger number of cores, say 12 or 16, the latter might become slower?
carpetpool is offline   Reply With Quote
Old 2020-10-04, 23:30   #5
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

2×17×103 Posts
Default

I don't know about 12 core chips running LLR, but generally it makes sense to run 1 instance per chip or chiplet.
paulunderwood is offline   Reply With Quote
Old 2020-10-05, 05:32   #6
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

22·19·59 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
I don't know about 12 core chips running LLR, but generally it makes sense to run 1 instance per chip or chiplet.
My experience, mostly on Haswell-era desktops, is that LLR doesn't benefit much from splitting small FFTs on to multiple threads. 128K per thread seems to be a good cutoff- so for OP's example 192K FFT, I doubt running two 2-threaded instances would be faster than four 1-threaded.

Once FFT reaches 256K, 2-threaded runs work pretty well.

OP- I've run LLR on this size of number on prebuilt machines with slow 2-channel memory, and running 3 instances was just about as fast as 4 but generated quite a bit less heat. That is, 3 is enough to saturate the memory on some quad-core machines. It takes some experimenting with threads-per-process and number of processes to find the sweet spot!
VBCurtis is offline   Reply With Quote
Old 2020-10-05, 08:27   #7
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

34·71 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
OP- I've run LLR on this size of number on prebuilt machines with slow 2-channel memory, and running 3 instances was just about as fast as 4 but generated quite a bit less heat. That is, 3 is enough to saturate the memory on some quad-core machines. It takes some experimenting with threads-per-process and number of processes to find the sweet spot!
Better still might be to reduce the cpu speed to match the memory throughput and still use all cores. Generally lower speeds need less power/cycle. Experimentation might be needed there.
henryzz is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Prime95 and cpu affinity pepi37 Software 4 2019-04-25 05:51
Unexplained slowdown (affinity problem?) Siegmund Software 6 2017-06-03 05:31
[Patch] CPU affinity prompt problem in mprime Linux / OS X build Explorer09 Software 1 2017-03-01 02:34
Set affinity does not work g33py Software 3 2016-07-27 05:26
Processor Affinity R.D. Silverman Programming 19 2015-04-24 22:46

All times are UTC. The time now is 11:45.

Sat Nov 28 11:45:52 UTC 2020 up 79 days, 8:56, 3 users, load averages: 1.03, 1.08, 1.13

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.