mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2015-08-28, 01:17   #1
Birddylicious
 
Aug 2015

3 Posts
Default Worker Threads - use all on same test

Hello All,

Can't seem to figure this out, or find any options. I have a 17-4930MX, looking to use all 8 threads of this processor on one exponent test. Is this possible? I see that 4 of my threads are working on exponents, but they say completion time is over a year. I would rather work on one exponent, and have it done in 1.5 months, than 4 exponents done in one year.

Thanks for your help in advance
Birddylicious is offline   Reply With Quote
Old 2015-08-28, 01:58   #2
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

2×7×757 Posts
Default

Be fore warned that the time to complete an assignment does not scale linearly with the number of threads.
An example taken from this thread:
Cores : Years : Loss
1 : 5.23
2 : 2.84 : 8.6%
3 : 2.37 : 35.8%
4 : 1.97 : 50.6%

Prime95 (or Mprime) will generally do better running 4 threads on the actual cores, rather than 8 threads using Hyperthreading.
If you want to get the most number of exponents done per time period, it is most efficient to run each on their own core.

You say that the prediction is for a year per assignment. Are you running exponents in the 33x,000,000 range?
Uncwilly is online now   Reply With Quote
Old 2015-08-28, 05:04   #3
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

9,973 Posts
Default

Quote:
Originally Posted by Uncwilly View Post
You say that the prediction is for a year per assignment. Are you running exponents in the 33x,000,000 range?
If not:
a) can you run a benchmark (from the "options" menu) and post the output here, at least we can have a look why the time is so long.
b) is the CPU set to run 24/7? Or only during "production hours"/few hours per day?

For your CPU, only 4 physical cores, totally 8 hyper-threaded cores, the best theoretical output is with 4 workers, each worker using a single core and doing work for its own exponent, this assuming your have both channels of memory populated and they are fast enough. Otherwise, you may get better with just 2 workers (each doing a different exponent), every worker using 2 physical cores.

Last fiddled with by LaurV on 2015-08-28 at 05:07 Reason: link
LaurV is offline   Reply With Quote
Old 2015-08-28, 07:06   #4
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

64048 Posts
Default

Quote:
Originally Posted by Birddylicious View Post
Hello All,

Can't seem to figure this out, or find any options. I have a 17-4930MX, looking to use all 8 threads of this processor on one exponent test. Is this possible? I see that 4 of my threads are working on exponents, but they say completion time is over a year. I would rather work on one exponent, and have it done in 1.5 months, than 4 exponents done in one year.

Thanks for your help in advance
Ditto what LaurV said, remember that your 8 cores are really 4 cores + hyperthreading, so don't count those HT cores...they won't help when trying to do FP stuff.

If you're running 4 workers, and each one of those is using 1 core, you'd think that would be perfectly fine except for one little problem: Memory.

Those 4 workers will all be stampeding through your memory channel and fighting for memory access. The end result is probably what you're seeing... running 4 at once is VERY slow... each one of your workers slows down incredibly thanks to all that increased latency. It would be faster in most cases to shut down all but one worker and just let it do it's thing without fighting for resources.

But just using one worker might leave some unused memory bandwidth, so that's when you could see some pretty good gains (not linear though, as noted in other replies) by using more than one core on your single worker.

Adding a 2nd core might increase your speed by 80% (not 100% unfortunately). Getting 4 cores going, ideally you'd want 4x the performance of a single core, but in truth you'll probably have 3x the performance.

Myself and others use our massively cored Xeon chips this way... I put all of the cores on one CPU into each worker (which means on a dual CPU system, there are 2 workers, each one using all of the cores on their respective chip).

I'm maybe more OCD about it and I set my affinity scramble masks manually to make 100% sure it's skipping over my HT cores okay. The auto-detect code when Prime95 starts isn't foolproof and I've seen it do some weird things, so I just avoid the guesswork there.

For your 8-core chip running Windows, your AffinityScramble2 setting would look something like 02461357. ...Ah heck, here's a sample of what the "local.txt" file would look like (the relevant parts...):
Code:
...
AffinityScramble2=02461357
WorkerThreads=1
ThreadsPerTest=4

[Worker #1]
Affinity=0
In your "prime.txt" file you can add this to disable all of the stuff it shows at startup where it tries to figure out which cores are physical/HT pairs. If you're manually setting the affinty scramble, you don't need to see all that:
Code:
DebugAffinityScramble=2
I think setting that to 1 will show extra info if you're letting Prime95 figure it out on it's own... get more info on which cores it thinks are physical/virtual pairs and see if it looks correct.

I kind of think it could use some calls in Windows itself that have that info (assuming Windows does it correctly). For me, on a system that might have been doing something else at the time that messed with the timings, it wouldn't calculate the pairs right and it would assume "0/1" are a pair, "2/3" are a pair, etc. but then when setting the worker "Affinity=whatever" you have to remember to skip around if you have more than one. Ah heck, it's all just a confusing mess. I got it working and stopped messing.

Summary though, if you allocate 4 cores to one worker, you'll probably see that your 4 assignments that are all estimating a year to complete will finish MUCH faster than that even though you're just doing one at a time. Trust me on that.

The purists out there who believe in squeezing the most out of the system will say you should run one LL test on one core and then something like ECM or P-1 on the others.
Madpoo is offline   Reply With Quote
Old 2015-08-28, 07:46   #5
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

2·3,251 Posts
Default

I think that talking about HT as being a separate core is confusing (it is also not correct BTW). Two logical threads (each identical to each other in every way (except for the thread code they run)) feed into a single physical core. Four cores x two threads per core = 8 threads (not 8 cores; not 4 physical cores + 4 HT cores).

BTW: Using more than one thread per core can cause some priority issues also. The CPU has no way to assign a different priority to each thread that feeds the core so sometimes even with P95 running at the lowest OS priority you will find the CPU gives it equal priority with another non-P95 thread running in the same core. So, depending upon your settings, disabling HT mode can provide a benefit in speed for non-P95 tasks.
retina is online now   Reply With Quote
Old 2015-08-28, 17:38   #6
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

22·72·17 Posts
Default

Quote:
Originally Posted by retina View Post
I think that talking about HT as being a separate core is confusing (it is also not correct BTW). Two logical threads (each identical to each other in every way (except for the thread code they run)) feed into a single physical core. Four cores x two threads per core = 8 threads (not 8 cores; not 4 physical cores + 4 HT cores).
I know, I know... But for simplicity sake, it's easy to just say it's a virtual CPU that shares some resources with the physical one. I can call it a model of the real thing that helps describe how it works, if that makes you feel better. LOL

Quote:
Originally Posted by retina View Post
BTW: Using more than one thread per core can cause some priority issues also. The CPU has no way to assign a different priority to each thread that feeds the core so sometimes even with P95 running at the lowest OS priority you will find the CPU gives it equal priority with another non-P95 thread running in the same core. So, depending upon your settings, disabling HT mode can provide a benefit in speed for non-P95 tasks.
Interesting... I haven't heard about or seen that particular issue. I've had one worker using all of the cores on a chip, and if there's a higher priority process that needs more CPU, it always gets it.

For example, I've had occasion to be on some servers running Prime95 when a web server experiences a large boost in traffic. The web server does some CPU spiking with the increased load as some threads spin up (I'm oversimplifying... suffice to say that CPU traffic can jump to nearly 100% for a short time). During that time, Prime95 usage drops precipitously... far more than if only a single core were being throttled, and more like you'd expect where all of the Prime95 work is basically stalled.

Sometimes it's harder to tell for me with dual CPU systems. Prime95 will run 2 workers, one on each chip, whereas the virtual servers running whatever are not NUMA specific, so it's virtual CPUs (and this time, I mean "virtual" quite literally... virtual machine) may span cores across both chips. But anyway, yeah, I've never seen an issue with Prime95 throttling WAY down when a higher priority task is ramping up.

Maybe what you're saying is OS dependent, or relates to some older version of Windows perhaps?
Madpoo is offline   Reply With Quote
Old 2015-08-28, 18:05   #7
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2·3·1,693 Posts
Default

There is a long-standing problem with P95 and large Adobe applications, at least including Photoshop. I have experienced this with P95 running full out. Starting Photoshop took a very long time, to the extent that borders and windows were drawn at visible, irregular rates. It was speculated that there is some Adobe subsystem which did not flag P95 to back off. I have no idea what the real situation is or was, but I have P95 set to stop a worker or two when Photoshop starts.
kladner is offline   Reply With Quote
Old 2015-08-29, 01:27   #8
Birddylicious
 
Aug 2015

310 Posts
Default

Code:
CPU speed: 3436.14 MHz, 4 hyperthreaded cores
CPU features: Prefetch, SSE, SSE2, SSE4, AVX, AVX2, FMA
L1 cache size: 32 KB
L2 cache size: 256 KB, L3 cache size: 8 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
Prime95 64-bit version 28.5, RdtscTiming=1
Best time for 1024K FFT length: 4.105 ms., avg: 4.367 ms.
Best time for 1280K FFT length: 5.496 ms., avg: 5.703 ms.
Best time for 1536K FFT length: 6.631 ms., avg: 6.839 ms.
Best time for 1792K FFT length: 7.917 ms., avg: 8.223 ms.
Best time for 2048K FFT length: 9.272 ms., avg: 9.659 ms.
Best time for 2560K FFT length: 12.089 ms., avg: 12.354 ms.
Best time for 3072K FFT length: 14.358 ms., avg: 14.607 ms.
Best time for 3584K FFT length: 17.268 ms., avg: 17.789 ms.
Best time for 4096K FFT length: 19.419 ms., avg: 20.307 ms.
Best time for 5120K FFT length: 24.584 ms., avg: 25.466 ms.
Best time for 6144K FFT length: 29.627 ms., avg: 31.023 ms.
Best time for 7168K FFT length: 35.273 ms., avg: 36.624 ms.
Best time for 8192K FFT length: 41.070 ms., avg: 41.980 ms.
Timing FFTs using 2 threads on 1 physical CPU.
Best time for 1024K FFT length: 4.180 ms., avg: 4.250 ms.
Best time for 1280K FFT length: 5.543 ms., avg: 5.666 ms.
Best time for 1536K FFT length: 6.829 ms., avg: 6.924 ms.
Best time for 1792K FFT length: 8.213 ms., avg: 8.399 ms.
Best time for 2048K FFT length: 9.333 ms., avg: 10.220 ms.
Best time for 2560K FFT length: 11.759 ms., avg: 12.268 ms.
Best time for 3072K FFT length: 14.674 ms., avg: 14.918 ms.
Best time for 3584K FFT length: 17.161 ms., avg: 17.904 ms.
Best time for 4096K FFT length: 20.231 ms., avg: 21.748 ms.
Best time for 5120K FFT length: 25.670 ms., avg: 26.093 ms.
Best time for 6144K FFT length: 31.023 ms., avg: 31.929 ms.
Best time for 7168K FFT length: 36.834 ms., avg: 37.617 ms.
Best time for 8192K FFT length: 41.792 ms., avg: 43.987 ms.
Timing FFTs using 2 threads on 2 physical CPUs.
Best time for 1024K FFT length: 2.211 ms., avg: 2.353 ms.
Best time for 1280K FFT length: 2.884 ms., avg: 3.050 ms.
Best time for 1536K FFT length: 3.468 ms., avg: 3.636 ms.
Best time for 1792K FFT length: 4.245 ms., avg: 4.552 ms.
Best time for 2048K FFT length: 4.898 ms., avg: 5.070 ms.
Best time for 2560K FFT length: 6.324 ms., avg: 6.443 ms.
Best time for 3072K FFT length: 7.519 ms., avg: 8.636 ms.
Best time for 3584K FFT length: 8.891 ms., avg: 9.257 ms.
Best time for 4096K FFT length: 10.325 ms., avg: 11.399 ms.
Best time for 5120K FFT length: 13.240 ms., avg: 13.416 ms.
Best time for 6144K FFT length: 15.509 ms., avg: 16.327 ms.
Best time for 7168K FFT length: 18.579 ms., avg: 18.889 ms.
Best time for 8192K FFT length: 21.657 ms., avg: 22.401 ms.
Timing FFTs using 3 threads on 3 physical CPUs.
Best time for 1024K FFT length: 1.571 ms., avg: 1.626 ms.
Best time for 1280K FFT length: 2.029 ms., avg: 2.089 ms.
Best time for 1536K FFT length: 2.531 ms., avg: 2.609 ms.
Best time for 1792K FFT length: 3.127 ms., avg: 3.292 ms.
Best time for 2048K FFT length: 3.693 ms., avg: 3.811 ms.
Best time for 2560K FFT length: 4.759 ms., avg: 5.068 ms.
Best time for 3072K FFT length: 5.748 ms., avg: 5.889 ms.
Best time for 3584K FFT length: 6.883 ms., avg: 8.076 ms.
Best time for 4096K FFT length: 7.861 ms., avg: 7.996 ms.
Best time for 5120K FFT length: 10.023 ms., avg: 10.217 ms.
Best time for 6144K FFT length: 11.915 ms., avg: 12.067 ms.
Best time for 7168K FFT length: 14.243 ms., avg: 14.537 ms.
Best time for 8192K FFT length: 16.937 ms., avg: 18.752 ms.
Timing FFTs using 4 threads on 4 physical CPUs.
Best time for 1024K FFT length: 1.358 ms., avg: 1.462 ms.
Best time for 1280K FFT length: 1.731 ms., avg: 1.850 ms.
Best time for 1536K FFT length: 2.263 ms., avg: 2.311 ms.
Best time for 1792K FFT length: 2.827 ms., avg: 2.895 ms.
Best time for 2048K FFT length: 3.409 ms., avg: 3.654 ms.
Best time for 2560K FFT length: 4.429 ms., avg: 4.558 ms.
Best time for 3072K FFT length: 5.363 ms., avg: 5.496 ms.
Best time for 3584K FFT length: 6.333 ms., avg: 6.475 ms.
Best time for 4096K FFT length: 7.399 ms., avg: 7.526 ms.
Best time for 5120K FFT length: 9.230 ms., avg: 9.306 ms.
Best time for 6144K FFT length: 11.132 ms., avg: 11.300 ms.
Best time for 7168K FFT length: 13.148 ms., avg: 14.234 ms.
Best time for 8192K FFT length: 15.413 ms., avg: 15.537 ms.
Timing FFTs using 8 threads on 4 physical CPUs.
Best time for 1024K FFT length: 1.302 ms., avg: 1.422 ms.
Best time for 1280K FFT length: 1.999 ms., avg: 2.082 ms.
Best time for 1536K FFT length: 2.405 ms., avg: 2.459 ms.
Best time for 1792K FFT length: 3.023 ms., avg: 3.085 ms.
Best time for 2048K FFT length: 3.590 ms., avg: 3.656 ms.
Best time for 2560K FFT length: 4.603 ms., avg: 4.642 ms.
Best time for 3072K FFT length: 5.626 ms., avg: 5.742 ms.
Best time for 3584K FFT length: 6.606 ms., avg: 6.662 ms.
Best time for 4096K FFT length: 7.724 ms., avg: 8.725 ms.
Best time for 5120K FFT length: 9.673 ms., avg: 9.887 ms.
Best time for 6144K FFT length: 11.605 ms., avg: 11.750 ms.
Best time for 7168K FFT length: 13.653 ms., avg: 13.814 ms.
Best time for 8192K FFT length: 16.007 ms., avg: 17.233 ms.

Timings for 1024K FFT length (1 cpu, 1 worker):  4.29 ms.  Throughput: 232.84 iter/sec.
Timings for 1024K FFT length (2 cpus, 2 workers):  4.67,  4.55 ms.  Throughput: 434.06 iter/sec.
Timings for 1024K FFT length (3 cpus, 3 workers):  5.63,  5.53,  5.56 ms.  Throughput: 538.39 iter/sec.
Timings for 1024K FFT length (4 cpus, 4 workers):  7.46,  7.16,  7.25,  7.32 ms.  Throughput: 548.41 iter/sec.
Timings for 1024K FFT length (1 cpu hyperthreaded, 1 worker):  4.29 ms.  Throughput: 233.10 iter/sec.
Timings for 1024K FFT length (2 cpus hyperthreaded, 2 workers):  4.78,  4.78 ms.  Throughput: 418.05 iter/sec.
Timings for 1024K FFT length (3 cpus hyperthreaded, 3 workers):  5.83,  5.73,  5.74 ms.  Throughput: 520.16 iter/sec.
Timings for 1024K FFT length (4 cpus hyperthreaded, 4 workers):  7.83,  7.21,  7.75,  7.16 ms.  Throughput: 535.03 iter/sec.
Timings for 1280K FFT length (1 cpu, 1 worker):  5.76 ms.  Throughput: 173.56 iter/sec.
Timings for 1280K FFT length (2 cpus, 2 workers):  5.96,  5.79 ms.  Throughput: 340.60 iter/sec.
Timings for 1280K FFT length (3 cpus, 3 workers):  7.11,  6.96,  7.06 ms.  Throughput: 425.98 iter/sec.
Timings for 1280K FFT length (4 cpus, 4 workers):  9.25,  8.99,  8.97,  9.13 ms.  Throughput: 440.39 iter/sec.
Timings for 1280K FFT length (1 cpu hyperthreaded, 1 worker):  5.75 ms.  Throughput: 174.00 iter/sec.
[Fri Aug 28 21:05:04 2015]
Timings for 1280K FFT length (2 cpus hyperthreaded, 2 workers):  6.13,  6.26 ms.  Throughput: 322.89 iter/sec.
Timings for 1280K FFT length (3 cpus hyperthreaded, 3 workers):  7.41,  7.26,  7.47 ms.  Throughput: 406.43 iter/sec.
Timings for 1280K FFT length (4 cpus hyperthreaded, 4 workers):  9.69,  9.12,  9.73,  9.08 ms.  Throughput: 425.79 iter/sec.
Timings for 1536K FFT length (1 cpu, 1 worker):  6.82 ms.  Throughput: 146.69 iter/sec.
Timings for 1536K FFT length (2 cpus, 2 workers):  7.25,  7.14 ms.  Throughput: 278.02 iter/sec.
Timings for 1536K FFT length (3 cpus, 3 workers):  8.66,  8.49,  8.46 ms.  Throughput: 351.39 iter/sec.
Timings for 1536K FFT length (4 cpus, 4 workers): 11.14, 10.81, 10.88, 10.98 ms.  Throughput: 365.21 iter/sec.
Timings for 1536K FFT length (1 cpu hyperthreaded, 1 worker):  7.04 ms.  Throughput: 142.14 iter/sec.
Timings for 1536K FFT length (2 cpus hyperthreaded, 2 workers):  7.80,  7.64 ms.  Throughput: 259.11 iter/sec.
Timings for 1536K FFT length (3 cpus hyperthreaded, 3 workers):  8.94,  9.48,  8.98 ms.  Throughput: 328.73 iter/sec.
Timings for 1536K FFT length (4 cpus hyperthreaded, 4 workers): 11.79, 10.99, 11.52, 10.91 ms.  Throughput: 354.29 iter/sec.
Timings for 1792K FFT length (1 cpu, 1 worker):  8.08 ms.  Throughput: 123.74 iter/sec.
Timings for 1792K FFT length (2 cpus, 2 workers):  8.59,  8.43 ms.  Throughput: 235.02 iter/sec.
Timings for 1792K FFT length (3 cpus, 3 workers): 10.22, 10.08, 10.06 ms.  Throughput: 296.51 iter/sec.
Timings for 1792K FFT length (4 cpus, 4 workers): 13.08, 12.72, 13.05, 12.85 ms.  Throughput: 309.52 iter/sec.
Timings for 1792K FFT length (1 cpu hyperthreaded, 1 worker):  8.47 ms.  Throughput: 118.12 iter/sec.
Timings for 1792K FFT length (2 cpus hyperthreaded, 2 workers):  9.09,  9.11 ms.  Throughput: 219.84 iter/sec.
Timings for 1792K FFT length (3 cpus hyperthreaded, 3 workers): 10.53, 10.62, 10.63 ms.  Throughput: 283.26 iter/sec.
Timings for 1792K FFT length (4 cpus hyperthreaded, 4 workers): 13.45, 13.04, 13.79, 12.65 ms.  Throughput: 302.56 iter/sec.
Timings for 2048K FFT length (1 cpu, 1 worker):  9.46 ms.  Throughput: 105.66 iter/sec.
Timings for 2048K FFT length (2 cpus, 2 workers): 10.72, 10.46 ms.  Throughput: 188.88 iter/sec.
Timings for 2048K FFT length (3 cpus, 3 workers): 12.13, 11.94, 12.01 ms.  Throughput: 249.43 iter/sec.
Timings for 2048K FFT length (4 cpus, 4 workers): 15.91, 14.87, 14.84, 15.14 ms.  Throughput: 263.54 iter/sec.
Timings for 2048K FFT length (1 cpu hyperthreaded, 1 worker):  9.61 ms.  Throughput: 104.03 iter/sec.
Timings for 2048K FFT length (2 cpus hyperthreaded, 2 workers): 10.70, 10.37 ms.  Throughput: 189.87 iter/sec.
Timings for 2048K FFT length (3 cpus hyperthreaded, 3 workers): 12.37, 12.36, 12.27 ms.  Throughput: 243.26 iter/sec.
Timings for 2048K FFT length (4 cpus hyperthreaded, 4 workers): 15.80, 14.81, 16.00, 15.39 ms.  Throughput: 258.32 iter/sec.
Timings for 2560K FFT length (1 cpu, 1 worker): 12.09 ms.  Throughput: 82.73 iter/sec.
Timings for 2560K FFT length (2 cpus, 2 workers): 13.09, 12.62 ms.  Throughput: 155.61 iter/sec.
[Fri Aug 28 21:10:09 2015]
Timings for 2560K FFT length (3 cpus, 3 workers): 14.99, 14.98, 15.02 ms.  Throughput: 200.05 iter/sec.
Timings for 2560K FFT length (4 cpus, 4 workers): 19.90, 18.90, 18.55, 19.05 ms.  Throughput: 209.59 iter/sec.
Timings for 2560K FFT length (1 cpu hyperthreaded, 1 worker): 12.29 ms.  Throughput: 81.34 iter/sec.
Timings for 2560K FFT length (2 cpus hyperthreaded, 2 workers): 13.27, 12.83 ms.  Throughput: 153.28 iter/sec.
Timings for 2560K FFT length (3 cpus hyperthreaded, 3 workers): 15.90, 15.39, 15.66 ms.  Throughput: 191.72 iter/sec.
Timings for 2560K FFT length (4 cpus hyperthreaded, 4 workers): 20.59, 18.32, 20.07, 18.51 ms.  Throughput: 206.98 iter/sec.
Timings for 3072K FFT length (1 cpu, 1 worker): 14.61 ms.  Throughput: 68.45 iter/sec.
Timings for 3072K FFT length (2 cpus, 2 workers): 15.56, 14.89 ms.  Throughput: 131.44 iter/sec.
Timings for 3072K FFT length (3 cpus, 3 workers): 18.13, 17.92, 18.20 ms.  Throughput: 165.87 iter/sec.
Timings for 3072K FFT length (4 cpus, 4 workers): 22.94, 22.64, 22.63, 22.19 ms.  Throughput: 177.01 iter/sec.
Timings for 3072K FFT length (1 cpu hyperthreaded, 1 worker): 15.36 ms.  Throughput: 65.09 iter/sec.
Timings for 3072K FFT length (2 cpus hyperthreaded, 2 workers): 16.26, 15.83 ms.  Throughput: 124.66 iter/sec.
Timings for 3072K FFT length (3 cpus hyperthreaded, 3 workers): 18.84, 18.30, 18.52 ms.  Throughput: 161.74 iter/sec.
Timings for 3072K FFT length (4 cpus hyperthreaded, 4 workers): 24.19, 22.74, 24.01, 22.37 ms.  Throughput: 171.68 iter/sec.
Timings for 3584K FFT length (1 cpu, 1 worker): 17.27 ms.  Throughput: 57.91 iter/sec.
Timings for 3584K FFT length (2 cpus, 2 workers): 18.35, 18.11 ms.  Throughput: 109.69 iter/sec.
Timings for 3584K FFT length (3 cpus, 3 workers): 21.83, 21.00, 20.88 ms.  Throughput: 141.32 iter/sec.
Timings for 3584K FFT length (4 cpus, 4 workers): 25.88, 25.71, 26.39, 26.78 ms.  Throughput: 152.77 iter/sec.
Timings for 3584K FFT length (1 cpu hyperthreaded, 1 worker): 17.25 ms.  Throughput: 57.96 iter/sec.
Timings for 3584K FFT length (2 cpus hyperthreaded, 2 workers): 18.52, 18.26 ms.  Throughput: 108.74 iter/sec.
Timings for 3584K FFT length (3 cpus hyperthreaded, 3 workers): 21.63, 21.71, 21.90 ms.  Throughput: 137.96 iter/sec.
Timings for 3584K FFT length (4 cpus hyperthreaded, 4 workers): 27.93, 25.83, 28.71, 26.23 ms.  Throughput: 147.46 iter/sec.
Timings for 4096K FFT length (1 cpu, 1 worker): 19.84 ms.  Throughput: 50.41 iter/sec.
Timings for 4096K FFT length (2 cpus, 2 workers): 21.47, 21.63 ms.  Throughput: 92.80 iter/sec.
Timings for 4096K FFT length (3 cpus, 3 workers): 24.84, 24.18, 23.50 ms.  Throughput: 124.16 iter/sec.
Timings for 4096K FFT length (4 cpus, 4 workers): 30.70, 30.25, 29.78, 30.44 ms.  Throughput: 132.06 iter/sec.
Timings for 4096K FFT length (1 cpu hyperthreaded, 1 worker): 20.77 ms.  Throughput: 48.15 iter/sec.
Timings for 4096K FFT length (2 cpus hyperthreaded, 2 workers): 22.04, 21.97 ms.  Throughput: 90.88 iter/sec.
[Fri Aug 28 21:15:16 2015]
Timings for 4096K FFT length (3 cpus hyperthreaded, 3 workers): 36.50, 27.63, 38.09 ms.  Throughput: 89.84 iter/sec.
Timings for 4096K FFT length (4 cpus hyperthreaded, 4 workers): 41.66, 42.85, 39.97, 31.30 ms.  Throughput: 104.30 iter/sec.
Timings for 5120K FFT length (1 cpu, 1 worker): 28.74 ms.  Throughput: 34.80 iter/sec.
Timings for 5120K FFT length (2 cpus, 2 workers): 31.59, 27.27 ms.  Throughput: 68.32 iter/sec.
Timings for 5120K FFT length (3 cpus, 3 workers): 37.80, 31.48, 29.95 ms.  Throughput: 91.61 iter/sec.
Timings for 5120K FFT length (4 cpus, 4 workers): 46.04, 38.27, 37.19, 38.30 ms.  Throughput: 100.86 iter/sec.
Timings for 5120K FFT length (1 cpu hyperthreaded, 1 worker): 26.79 ms.  Throughput: 37.33 iter/sec.
Timings for 5120K FFT length (2 cpus hyperthreaded, 2 workers): 28.65, 27.91 ms.  Throughput: 70.74 iter/sec.
Timings for 5120K FFT length (3 cpus hyperthreaded, 3 workers): 35.94, 31.99, 34.82 ms.  Throughput: 87.80 iter/sec.
Timings for 5120K FFT length (4 cpus hyperthreaded, 4 workers): 54.45, 41.24, 45.21, 36.52 ms.  Throughput: 92.12 iter/sec.
Timings for 6144K FFT length (1 cpu, 1 worker): 33.82 ms.  Throughput: 29.57 iter/sec.
Timings for 6144K FFT length (2 cpus, 2 workers): 34.87, 32.14 ms.  Throughput: 59.79 iter/sec.
Timings for 6144K FFT length (3 cpus, 3 workers): 46.43, 39.10, 37.54 ms.  Throughput: 73.75 iter/sec.
6144K FFT length (4 cpus, 4 workers): 54.49, 53.70, 49.48, 50.84 ms.  Throughput: 76.86 iter/sec.
6144K FFT length (1 cpu hyperthreaded, 1 worker): 32.53 ms.  Throughput: 30.74 iter/sec.
6144K FFT length (2 cpus hyperthreaded, 2 workers): 34.61, 34.08 ms.  Throughput: 58.24 iter/sec.
6144K FFT length (3 cpus hyperthreaded, 3 workers): 43.10, 38.38, 40.53 ms.  Throughput: 73.93 iter/sec.
6144K FFT length (4 cpus hyperthreaded, 4 workers): 58.12, 47.84, 51.01, 45.40 ms.  Throughput: 79.74 iter/sec.
7168K FFT length (1 cpu, 1 worker): 38.62 ms.  Throughput: 25.89 iter/sec.
7168K FFT length (2 cpus, 2 workers): 40.57, 38.05 ms.  Throughput: 50.93 iter/sec.
7168K FFT length (3 cpus, 3 workers): 44.76, 43.37, 42.89 ms.  Throughput: 68.72 iter/sec.
7168K FFT length (4 cpus, 4 workers): 53.71, 52.74, 53.06, 53.46 ms.  Throughput: 75.13 iter/sec.
7168K FFT length (1 cpu hyperthreaded, 1 worker): 37.91 ms.  Throughput: 26.38 iter/sec.
7168K FFT length (2 cpus hyperthreaded, 2 workers): 41.07, 40.48 ms.  Throughput: 49.06 iter/sec.
7168K FFT length (3 cpus hyperthreaded, 3 workers): 53.86, 49.04, 60.30 ms.  Throughput: 55.54 iter/sec.
7168K FFT length (4 cpus hyperthreaded, 4 workers): 58.72, 52.18, 68.27, 53.92 ms.  Throughput: 69.39 iter/sec.
[Fri Aug 28 21:20:24 2015]
for 8192K FFT length (1 cpu, 1 worker): 42.24 ms.  Throughput: 23.68 iter/sec.
for 8192K FFT length (2 cpus, 2 workers): 45.14, 44.28 ms.  Throughput: 44.74 iter/sec.
for 8192K FFT length (3 cpus, 3 workers): 51.57, 50.43, 49.55 ms.  Throughput: 59.40 iter/sec.
for 8192K FFT length (4 cpus, 4 workers): 67.28, 63.16, 62.90, 62.93 ms.  Throughput: 62.48 iter/sec.
for 8192K FFT length (1 cpu hyperthreaded, 1 worker): 42.74 ms.  Throughput: 23.39 iter/sec.
for 8192K FFT length (2 cpus hyperthreaded, 2 workers): 45.37, 44.76 ms.  Throughput: 44.38 iter/sec.
for 8192K FFT length (3 cpus hyperthreaded, 3 workers): 51.86, 51.84, 52.10 ms.  Throughput: 57.77 iter/sec.
for 8192K FFT length (4 cpus hyperthreaded, 4 workers): 67.28, 61.45, 70.04, 62.58 ms.  Throughput: 61.39 iter/sec.
Birddylicious is offline   Reply With Quote
Old 2015-08-29, 01:29   #9
Birddylicious
 
Aug 2015

310 Posts
Default

4930 MX
32 GB Ram
Twin 780M Nivida Cards in SLI
If it makes a difference, both cards were running CUDA at the time this bench was done. I forgot to shut them down. If Req'd i can re-run.

Set to run 27/7, with 29000mb of ram

8 Worker Windows

and "What ever makes the Most sense" for assignments for all 8 workers.

Recommendations?

Thanks for all the replies and help

Last fiddled with by Birddylicious on 2015-08-29 at 01:35 Reason: The Thanks!
Birddylicious is offline   Reply With Quote
Old 2015-08-29, 01:49   #10
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

22×72×17 Posts
Default

Quote:
Originally Posted by kladner View Post
There is a long-standing problem with P95 and large Adobe applications, at least including Photoshop. I have experienced this with P95 running full out. Starting Photoshop took a very long time, to the extent that borders and windows were drawn at visible, irregular rates. It was speculated that there is some Adobe subsystem which did not flag P95 to back off. I have no idea what the real situation is or was, but I have P95 set to stop a worker or two when Photoshop starts.
Weird. I wonder if that could be from Photoshop setting a low/idle priority on some of it's threads. As if Adobe didn't consider that there could be something else running at idle priority and using lots of CPU. I guess for most systems that would be a good assumption, but still kind of a weird assumption to make.

It'd be easy enough to check... set it up with Prime95 running and then launch Photoshop . Use something like Process Explorer to find the threads Photoshop is using and look at their priority levels.

Prime95.exe itself runs at priority 8 (for the UI) and the spawns low priority threads for the workers, which all run at a priority of 1. (this is why you should use something like Process Explorer which lets you look at *thread* priorities, not just process).

If the Photoshop stuff is also running with priorities of 1, then "there's your problem".
Madpoo is offline   Reply With Quote
Old 2015-08-29, 01:52   #11
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

2·3,251 Posts
Default

Quote:
Originally Posted by Madpoo View Post
It'd be easy enough to check... set it up with Prime95 running and then launch Photoshop . Use something like Process Explorer to find the threads Photoshop is using and look at their priority levels.
Another test is to turn off HT and see if the problems persists.
retina is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to use more threads? physicist PrimeNet 2 2018-01-10 17:07
Worker #5 and Worker#7 not running (Error ILLEGAL SUMOUT skrupian08 Information & Answers 9 2016-08-23 16:35
[Prime95] How to choose number of worker threads? Sohjin Information & Answers 1 2014-03-09 14:14
Two worker threads, only one is saved Unregistered Information & Answers 7 2011-03-31 04:57
Workers, Threads, Helper Threads, Cores, Affinity. lorgix Information & Answers 12 2011-01-13 22:31

All times are UTC. The time now is 15:41.


Tue Jun 28 15:41:40 UTC 2022 up 75 days, 13:42, 2 users, load averages: 1.10, 1.28, 1.23

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔