mersenneforum.org  

Go Back   mersenneforum.org > New To GIMPS? Start Here! > Information & Answers

Reply
 
Thread Tools
Old 2013-01-14, 08:33   #1
Unregistered
 

3·1,931 Posts
Default How many workers?

Noobie questions but I couldn't work it out from the FAQs...

I have a new I3 3220 with two cores (4 virtual CPUs). I thought it would be cool to do
some Prime95 on it and so I joined early in the new year.

I told it to choose whatever work and it is doing LL.

If I tell it to use 2 worker windows (and no multithreading) it gets to 0.026 sec per iteration (ish) and uses about 50% CPU. This seems to make sense.

If I switch on multithreading (2 CPUs to Use) the speed doesn't seem to go up but it does go up to about 100% CPU. This seems a bit odd - it's using more CPU but not doing more work. Or am I misunderstanding the stats?

If I tell it to use 4 worker windows (CPUs to use) then it uses 100% CPU but the speed drops (presumably limited by memory bandwidth?). However, the time per iteration does not double - it goes to about 0.046.

So I suppose it is best for the project if I do 4. But more fun if do 2 - I'll get an answer quicker.

Are these numbers expected? Am I setting it up optimally? Whatever that might mean...

Ben
  Reply With Quote
Old 2013-01-14, 09:20   #2
axn
 
axn's Avatar
 
Jun 2003

2·33·7·13 Posts
Default

Things are as expected. P95 is one of those rare well-optimized programs that does not benefit much from Hyperthreading.

Whether to run 2 or 4 is up to you. My personal advice would be to run 2, so that it is "more fun", and also keep the heat output down; plus having two virtual cores free will help with you regular computer usage (if any).

EDIT:- Make sure that you are running the latest (27.x) version which supports AVX.

Last fiddled with by axn on 2013-01-14 at 09:24
axn is online now   Reply With Quote
Old 2013-01-14, 09:45   #3
Unregistered
 

122716 Posts
Wink

Quote:
Originally Posted by axn View Post
Things are as expected.

Thanks!

Actually I rechecked the multi-threading version - 2 CPUs per worker - and it IS slightly faster than 1 - in fact it is pretty much twice as fast as the four worker version - about 0.023 secs per iteration.

So that is even more clearly showing what's going on - when two testing processes use the same physical core they run almost but not quite half as fast, whether they are multi-threading or different workers.

I think I'll stick with two as you suggest.

Thanks again!

Ben
  Reply With Quote
Old 2013-01-14, 10:12   #4
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

249F16 Posts
Default

@OP: what axn says, is to use 2 workers, SINGLE threaded (no HT). As opposite to 2 workers with HT. Your CPU occupancy in task manager will show 50%, that is ok. As he said, P95 is very well optimized program, and does not benefit from HT, beside of producing more heat when HT is enabled. The reason for existence of HT is the fact that programs are - generally - not very optimized. During a program is running, it needs to swap data with the memory, do some computing, accessing different peripherals (like the display card), etc, and those tasks need to wait for each-other, resulting in "dead time", "waiting time", etc, minuscule periods of time when the CPU is waiting, doing nothing. Therefore, HT means that the CPU can run 2 tasks in the same time, switching "instantly" to the other task, when the former one is waiting. Statistically, the two tasks will access the memory and peripherals at different moments of times, and their context is "switched" many times per second, resulting in more optimized work for the CPU (less waiting state, seeming like the two tasks physically runs in parallel, in the same time, with no delay for any of them).

If you have a single program/instance/worker/task that runs in 30 minutes with no HT, single core, and two copies of it run in 40 minutes with HT, both in a single core, that would mean the program needs 20 minutes to run one instance (as it runs 2 in 40), so when it is run single, it WASTED 33% of the time.

One can design software to waste no time. P95 is such a software, when you do LL (not the same case when you do ECM, or so) it is only wasting about 1% to 3% of the time. So, if running one worker takes 30 minutes, running two workers in parallel on the same core, using HT, will take 58 or 59 minutes. Too small benefit for the additional heat it produces, and additional power consumption it takes (see the link above, hyper-threading IS power-inefficient, it needs power to always optimally switch and combine those tasks).

Looking to the video below, HT is based on the fact that on the most programs the "black balls" that come through each of the eight pipes have "enough" spaces between them. If the balls coming through a pipe are close each-other, touching each-other, then there is no place to run two pipes through the same core, unless the total time is double.



You can read more on the forum if you search it for "hyperthreading" or so. General opinion is that HT is not good when one do LL/DC with P95.

Last fiddled with by LaurV on 2013-01-14 at 10:43 Reason: link
LaurV is offline   Reply With Quote
Old 2013-01-14, 13:28   #5
Jellyfish420
 
Jan 2013

1316 Posts
Default

when I enabled multithreading it took my per iteration time from 0.055sec down to 0.038. why does your time go up and mine go down???

edit: ok just looked. i'm running 2 threads on one worker, not 2 threads on their own workers. thats why my time went down.....

Last fiddled with by Jellyfish420 on 2013-01-14 at 13:48
Jellyfish420 is offline   Reply With Quote
Old 2013-01-14, 19:09   #6
Unregistered
 

24·32·52 Posts
Default

When I enable multithreading it does get faster - just not very much. It goes from .026 to .023 roughly.

So not worth it for twice the CPU, I think.

Ben
  Reply With Quote
Old 2013-01-15, 01:38   #7
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3×29×83 Posts
Default

The CPU usage that you see in the Task Manager isn't really an accurate representation how much silicon is put in use -- assuming you have hyperthreading, which is what it sounds like. What CPU do you have?
Dubslow is offline   Reply With Quote
Old 2013-01-15, 05:00   #8
bcp19
 
bcp19's Avatar
 
Oct 2011

12478 Posts
Default

Quote:
Originally Posted by Dubslow View Post
The CPU usage that you see in the Task Manager isn't really an accurate representation how much silicon is put in use -- assuming you have hyperthreading, which is what it sounds like. What CPU do you have?
He mentioned it in the original post:

Quote:
I have a new I3 3220 with two cores (4 virtual CPUs). I thought it would be cool to do
some Prime95 on it and so I joined early in the new year.
bcp19 is offline   Reply With Quote
Old 2013-01-15, 05:33   #9
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

100100100111112 Posts
Default

Quote:
Originally Posted by bcp19 View Post
He mentioned it in the original post:
Dubslow was addressing Ben, more exactly the "twice the cpu" affirmation. This is what task manager shows, which of course is not real, you know what I mean, especially for a program like Prime95.

Last fiddled with by LaurV on 2013-01-15 at 05:35
LaurV is offline   Reply With Quote
Old 2013-01-15, 08:36   #10
Unregistered
 

1DF716 Posts
Default

I did a few more experiments and it is pretty clear:

One worker:
------------

1 CPU (one thread) 0.022s per iteration - task manager shows 25%.
This is presumably the maximum rate one thread can achieve on my machine.

2 CPUs (one thread on each of two different physical cores) 0.012s - task manager shows 50%.
Each of the threads is nearly as fast as the first case. But I don't think it is quite twice as fast.

4 CPUs (each core has two threads on it) 0.012s - task manager shows 100%
Threads that share a physical core via hyperthreading interfere with each other and add no detectable speed up (actually there may be a small speed up - but hard to be sure). Task manager shows every CPU as busy but actually each thread is spending a lot of time waiting - presumably for the cache.

Two workers:
-------------

1 CPU (each worker on its own core with a single thread) 0.024s - task manager shows 50%
The two workers run efficiently. But maybe they are each a tiny bit slower than one on its own? Hard to tell.

2 CPU ( each worker uses its own core but with two threads on it) 0.023s - task manager shows 100%
The measurements are not accurate but I think I can see that the two threads cause a slight speedup.

Four workers:
-------------

1 CPU (each core is running two workers) 0.046s - task manager shows 100%
I have not done a carefully controlled experiment. But I think this is getting iterations done faster than the two worker one thread case. But any gain is very small.

-------------

Actually I think I am going to do 1 worker with two threads for a while so I can get one done quicker. Then maybe switch to 2 workers one thread longer term.

Thanks for the interesting discussions.

Ben
  Reply With Quote
Old 2013-01-15, 09:30   #11
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

249F16 Posts
Default

If you get paranoid with those milliseconds ( most of us went through this phase sooner or later) then add the next two lines

TimeStamp=2
TimingOutput=4

to the prime.ini file (see undoc.txt in P95 distribution for details).
LaurV is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Workers galgrnpa Software 5 2016-08-09 14:54
Getting 34M-35M DC assignments for GPU workers UBR47K PrimeNet 3 2015-09-06 06:09
Max # Workers sk8kidamh Software 5 2011-07-16 15:58
Went from 8 workers to 4 workers on v26.6 upgrade dmoran Software 13 2011-05-23 12:36
Workers use same CPU Unregistered Information & Answers 7 2008-11-03 01:49

All times are UTC. The time now is 02:00.

Sun Apr 18 02:00:39 UTC 2021 up 9 days, 20:41, 0 users, load averages: 1.06, 1.36, 1.39

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.