mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2008-10-30, 23:42   #1
dan3ny
 
Oct 2008

416 Posts
Default v25.7 not using much cpu

I just upgraded from v24 to 25.7, build 3.

Running an Intel T2500 @ 2Ghz, 2 core WinXPSP2

Upgraded from 2 instances of p95 to one, with 2 worker threads. Where my old instances used to keep the CPU pegged near 100% between the 2 of them, the new version isn't using very much at all. I found that I had a Throttle command in my prime95.txt from before, but I removed it.

Why isn't p95 using my CPU??
dan3ny is offline   Reply With Quote
Old 2008-10-31, 01:57   #2
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

278716 Posts
Default

What numbers (range) and type of work are you doing?
Are both workers running on 1 number, or are each doing their own?
I found that 2 workers on factoring a single number has ~55% the throughput of 2 workers factoring 2 different numbers.
Uncwilly is online now   Reply With Quote
Old 2008-10-31, 06:20   #3
Oleg V.Cat
 
Oct 2008
Riga, Latvia

11 Posts
Default

Quote:
Originally Posted by Uncwilly View Post
Are both workers running on 1 number, or are each doing their own?
I found that 2 workers on factoring a single number has ~55% the throughput of 2 workers factoring 2 different numbers.
I can say, that sometimes "Smart affinity" is very stupid. I do not have problems on single CPU double Core, bur, for example on 4*XeonMP if I put "4 workers 2 threads each" - two of workers go on #CPU 0,4 and two - on #CPU 1,5, so, I have only 50% cpu used for 8 threads.

On other box I have 2*L5420 (Quad core), and i have same throughput for:
4*V24, with affinity 0,3,4,6 and 2*V24+1*25(2 threads), making double check, but if I use any other core combination (not 0,3,4,6) - througput is dropped by ~30% at least :(
Oleg V.Cat is offline   Reply With Quote
Old 2008-11-01, 23:43   #4
dan3ny
 
Oct 2008

48 Posts
Default

Quote:
Originally Posted by Uncwilly View Post
What numbers (range) and type of work are you doing?
Are both workers running on 1 number, or are each doing their own?
I found that 2 workers on factoring a single number has ~55% the throughput of 2 workers factoring 2 different numbers.
Factoring 2 numbers in the 4X,XXX,XXX range, each doing their own.
dan3ny is offline   Reply With Quote
Old 2008-11-02, 16:15   #5
dan3ny
 
Oct 2008

22 Posts
Default

Hmm. It looks like the Throttle command came back or I didn't kill it completely somehow. Removed it again, and now things seem fine.
dan3ny is offline   Reply With Quote
Old 2008-11-02, 19:59   #6
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

5×29×53 Posts
Default

Quote:
Originally Posted by Oleg V.Cat View Post
I can say, that sometimes "Smart affinity" is very stupid. I do not have problems on single CPU double Core, bur, for example on 4*XeonMP if I put "4 workers 2 threads each" - two of workers go on #CPU 0,4 and two - on #CPU 1,5, so, I have only 50% cpu used for 8 threads.

On other box I have 2*L5420 (Quad core), and i have same throughput for:
4*V24, with affinity 0,3,4,6 and 2*V24+1*25(2 threads), making double check, but if I use any other core combination (not 0,3,4,6) - througput is dropped by ~30% at least :(
You lost me completely! I'm guessing you think the 4*XeonMP should be CPU 0,1 and CPU 2,3, etc. I can't figure out what you are doing on the 2*L5420 and what you think it should be doing.
Prime95 is offline   Reply With Quote
Old 2008-11-03, 08:07   #7
Oleg V.Cat
 
Oct 2008
Riga, Latvia

1110 Posts
Default

Quote:
Originally Posted by Prime95 View Post
You lost me completely! I'm guessing you think the 4*XeonMP should be CPU 0,1 and CPU 2,3, etc. I can't figure out what you are doing on the 2*L5420 and what you think it should be doing.
No ;). 4*XeonMP shoud be CPU 0,2,4,6 (or 1,3,5,7) for best performance and lower CPU usage (ThreadsPerTest=1). In this case I have ~0.048s per iteration (all timings are for LL with FFT1280K) with 50% CPU. If 8 workers are running, I have ~0.085s per iteration, okay, it's a bit better that 0.048*2, but no so much, and i prefer to use only 4 workers.

If I try to set (ThreadsPerTest=2) - tu use all 8 logical CPU for 4 workers I need to set affinity to 0,1,4,5 (prime95 then select 2,3,6,7 for helper threads) - in this case I have 100% CPU and, approx, ~0.047s per iteration, so, no advantage from helper thread. I want to try set main thread to CPU0 and helper thread to CPU1, but can't see, how.

On 2*L5420 picture is more complex :). Again, i do not know why, but if I run just one worker with ThreadsPerTest=1 - i have ~0.028 per iteration. If one (or more) workers added - throughput go down, but very differently, depending on affinity. 4 worker thread combinations can be down to ~0.055s in "bad" combination, or can be ~0.030 for "good". At this moment I have 3 workers with foillowing params:

Worker No. Threads CPU timing
1 1 0 0.029
2 2 3,4 0.016
3 3 6 0.031

Any other combination give ~0.042 per iteration for, at least, 2 workers. I do not know, why, "it's magic" ;)
Oleg V.Cat is offline   Reply With Quote
Old 2008-11-09, 09:16   #8
Freightyard
 
Nov 2008
San Luis Obispo CA

27 Posts
Default

The Xeon 5400 Series Quad-Core CPUs are all twin dual-core dies. Each die (or pair of cores) contains 6 MB of the total 12 MB L2 cache:

http://download.intel.com/design/xeo...pdt/318585.pdf

Quad-core desktop CPUs are similar.

Hence, threads on cores 0 and 1 compete for the same cache, but do not compete for the cache shared by cores 2 and 3.

Threads running (as you state) on cores 0/3/4/6 would all have unique cache and would not compete. But 0/2/4/6 should equally not compete.

The unknown is your OS and other processes, which may be cache- or CPU- intensive. Without knowing all these unknowns it certainly could seem like "magic". You may be able to set affinity for some of the system threads to increase performance.
Freightyard is offline   Reply With Quote
Old 2008-11-21, 23:53   #9
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

170058 Posts
Default

Quote:
Originally Posted by Oleg V.Cat View Post
I can say, that sometimes "Smart affinity" is very stupid. I do not have problems on single CPU double Core, bur, for example on 4*XeonMP if I put "4 workers 2 threads each" - two of workers go on #CPU 0,4 and two - on #CPU 1,5, so, I have only 50% cpu used for 8 threads.(
Please try this again on 25.8 when it is available. BTW, all documentation I've read says that 0,1,2,3 are your "real" CPUs and 4,5,6,7 are the matching hyper-threaded CPUs. Thus 25.8 should put your 4 workers on CPUs 0,5 and 1,5 and 2,6 and 3,7.
Prime95 is offline   Reply With Quote
Old 2008-11-22, 01:45   #10
Meikel
 
Nov 2008

910 Posts
Default

Hi George,

then I can, by trial, prove your documentation wrong - at least on Win Vista 64Bit and Core i7 940, and I agree with Oleg V.Cat. That's what I did:

First, tell Prime95 to run on any cpu. Then start an LL-Test with 4 threads.
Use Task Manager to bind Prime95 manually to CPUs.

Binding to CPUs 0,1,2,3: Best per iteration time: 11.435ms
Binding to CPUs 0,2,4,6: Best per iteration time: 6.833ms

Seems quite obvious to me, no? CPUs 0,2,4,6 are "real". Definitively. Maybe Intel uses a different numbering scheme than Microsoft? Whatever: The measurements don't lie.

IMHO best performance for a Core i7 could be reached by assigning 4 LL tests to it, starting two threads for each LL test and assigning the first test to the first real CPU-HT-pair (0,1), the second to (2,3), and so on. I also agree with Oleg on that.

So, I'm pretty convinced that your proposed assignment strategy for 25.8 would totally NOT work well :-)

Last fiddled with by Meikel on 2008-11-22 at 01:57
Meikel is offline   Reply With Quote
Old 2008-11-22, 03:28   #11
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

170058 Posts
Default

Quote:
Originally Posted by Meikel View Post
So, I'm pretty convinced that your proposed assignment strategy for 25.8 would totally NOT work well :-)
Alright, we'll go with 0,1 and 2,3 and 4,5 and 6,7.
Prime95 is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 01:35.


Sat Dec 4 01:35:03 UTC 2021 up 133 days, 20:04, 0 users, load averages: 1.17, 1.41, 1.45

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.